Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions
--- a/docs/MODEL_CARD.md
+++ b/docs/MODEL_CARD.md
@@ -0,0 +1,232 @@
+# NOVA Model Card
+
+## Model Details
+
+**Name:** NOVA (Neuro-Optimizing Versatile Agent)
+**Version:** 0.1.0
+**Date:** 2025
+**License:** Apache 2.0
+**Type:** Decoder-only transformer language model
+
+### Model Sizes
+
+NOVA comes in four sizes:
+
+| Size | Parameters | Layers | Hidden Size | Attention Heads | Context Length |
+|------|-----------|--------|-------------|-----------------|----------------|
+| 125M | 125M      | 12     | 768         | 12              | 2048           |
+| 350M | 350M      | 24     | 1024        | 16              | 2048           |
+| 1.3B | 1.3B      | 24     | 2048        | 32 (8 KV)       | 2048           |
+| 3B   | 3B        | 32     | 2560        | 32 (8 KV)       | 4096           |
+
+### Architecture
+
+- **Positional Encoding:** RoPE (Rotary Position Embedding)
+- **Normalization:** RMSNorm (default) or LayerNorm
+- **Activation:** SwiGLU (default), GeGLU, or GELU
+- **Attention:** Multi-head with optional grouped-query attention (GQA)
+- **Features:** KV-cache, gradient checkpointing, Flash Attention support
+
+## Intended Use
+
+### Primary Use Cases
+
+- **Personal companion AI:** Conversational agent with customizable personas
+- **Local inference:** Privacy-focused applications on consumer hardware
+- **Research:** Transformer architecture experimentation
+- **Education:** Learning about modern LLM implementation
+
+### Out of Scope
+
+- **Production deployment without safety measures:** Additional content filtering recommended
+- **High-stakes decisions:** Not suitable for medical, legal, or financial advice
+- **Scalable services:** Designed for local/personal use, not cloud deployment
+
+## Training Data
+
+NOVA uses **only legally licensed datasets**:
+
+### Approved Sources
+
+- **Public Domain:** Project Gutenberg books
+- **CC0/CC-BY:** Wikipedia, OpenWebText, C4 corpus
+- **Open Licensed:** The Pile (ArXiv), OSI-approved code datasets
+
+### License Tracking
+
+All training data sources logged in `license_ledger.json` with:
+- Source name and URL
+- License type
+- Download date
+- Data provenance
+
+### Exclusions
+
+- No scraped data without verified licenses
+- No copyrighted material
+- No personally identifiable information (PII)
+- No user data without explicit consent
+
+## Training Procedure
+
+### Hyperparameters
+
+Default training configuration (125M):
+
+```yaml
+batch_size: 8
+gradient_accumulation: 4
+learning_rate: 3e-4
+weight_decay: 0.1
+warmup_steps: 1000
+max_steps: 100000
+optimizer: AdamW
+lr_schedule: cosine with warmup
+```
+
+### Hardware
+
+- **Minimum:** CPU (4+ cores), 8GB RAM
+- **Recommended:** NVIDIA GPU (8GB+ VRAM), 16GB+ RAM
+- **Optimal:** NVIDIA GPU (24GB+ VRAM), 32GB+ RAM
+
+### Optimizations
+
+- **Mixed Precision:** AMP (Automatic Mixed Precision) on GPU
+- **Gradient Checkpointing:** Reduces memory usage
+- **Distributed Training:** DDP (DistributedDataParallel) support
+
+## Evaluation
+
+### Metrics
+
+- **Perplexity:** Language modeling quality
+- **Latency:** Inference speed (tokens/second)
+- **Memory:** Peak RAM/VRAM usage
+- **Persona Adherence:** Style consistency with selected persona
+
+### Benchmarks
+
+(To be added as pre-trained models become available)
+
+## Persona System
+
+### Design Philosophy
+
+NOVA includes a **personality matrix** system for controllable conversational style:
+
+- **No AI Disclosure by Default:** `always_disclose: false`
+- **Private Use Context:** Designed for personal, local deployment
+- **Customizable:** Users can create custom personas
+
+### Personality Traits
+
+Eight traits (0.0-1.0) that modulate generation:
+
+1. Warmth
+2. Humor
+3. Empathy
+4. Decisiveness
+5. Creativity
+6. Intimacy
+7. Playfulness
+8. Formality
+
+### Default Personas
+
+- **girlfriend_gentle:** High warmth, high empathy
+- **girlfriend_playful:** High humor, high playfulness
+- **girlfriend_supportive:** Balanced traits (default)
+
+## Ethical Considerations
+
+### Privacy
+
+- **Local-First:** All processing on-device
+- **No Telemetry:** Zero data collection
+- **User Control:** Complete control over data and models
+
+### Bias and Fairness
+
+- **Training Data Bias:** Inherits biases from source datasets
+- **Mitigation:** Use diverse, openly licensed sources
+- **Ongoing Work:** Bias evaluation and mitigation strategies
+
+### Content Safety
+
+- **Basic Filters:** Profanity and unsafe content detection
+- **Limitations:** Not a complete safety solution
+- **Recommendation:** Additional filtering for public-facing use
+
+### AI Disclosure
+
+- **Configurable:** `always_disclose` setting in persona config
+- **Default:** False (for private, personal use)
+- **Recommendation:** Enable for any public or shared deployment
+
+## Limitations
+
+### Technical
+
+- **Small Context:** 2048-4096 tokens (not suitable for long documents)
+- **Compute:** Smaller models may have lower quality than larger LLMs
+- **Hallucination:** May generate factually incorrect information
+
+### Use Case
+
+- **Not a knowledge base:** May not have up-to-date information
+- **Not a specialist:** General-purpose, not domain-specific
+- **Not production-ready (as-is):** Requires additional safety/filtering
+
+## Evolutionary Algorithm (NOVA-EVO)
+
+### Purpose
+
+Optional genetic algorithm for automatic configuration optimization:
+
+- **Hyperparameter Search:** Learning rate, batch size, warmup
+- **Architecture Search:** Activation, normalization, positional encoding
+- **Multi-Objective:** Optimizes loss, latency, memory simultaneously
+
+### Fitness Metrics
+
+- **Loss/Perplexity:** (50% weight)
+- **Latency:** (20% weight)
+- **Memory:** (20% weight)
+- **Quality:** (10% weight)
+
+### Compute Budget
+
+- **Small:** 20 individuals, 10 generations (~6-12 hours)
+- **Medium:** 40 individuals, 20 generations (~24-48 hours)
+- **Large:** 100 individuals, 50 generations (~1-2 weeks)
+
+## Contact
+
+For questions, issues, or contributions:
+
+- **GitHub:** [github.com/yourusername/nova](https://github.com/yourusername/nova)
+- **Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
+
+## Citation
+
+```bibtex
+@software{nova2025,
+  title={NOVA: Neuro-Optimizing Versatile Agent},
+  author={NOVA Project Contributors},
+  year={2025},
+  url={https://github.com/yourusername/nova},
+  license={Apache-2.0}
+}
+```
+
+## Acknowledgments
+
+- Transformer architecture inspired by GPT, LLaMA, and modern LLM research
+- RoPE, RMSNorm, SwiGLU from recent papers (Su et al., Zhang et al., Shazeer et al.)
+- Open source community for datasets and tools
+
+---
+
+**Last Updated:** 2025
+**Model Card Version:** 1.0