Initial commit: NOVA - Neuro-Optimizing Versatile Agent
Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
13
nova_data/__init__.py
Normal file
13
nova_data/__init__.py
Normal file
@@ -0,0 +1,13 @@
|
||||
"""
|
||||
NOVA Data - Legal dataset acquisition and processing
|
||||
"""
|
||||
|
||||
from .pipeline import DataPipeline
|
||||
from .legal_sources import LegalDatasetRegistry
|
||||
from .preprocessing import TextPreprocessor
|
||||
|
||||
__all__ = [
|
||||
'DataPipeline',
|
||||
'LegalDatasetRegistry',
|
||||
'TextPreprocessor',
|
||||
]
|
Reference in New Issue
Block a user