Initial commit: NOVA - Neuro-Optimizing Versatile Agent
Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
232
docs/MODEL_CARD.md
Normal file
232
docs/MODEL_CARD.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# NOVA Model Card
|
||||
|
||||
## Model Details
|
||||
|
||||
**Name:** NOVA (Neuro-Optimizing Versatile Agent)
|
||||
**Version:** 0.1.0
|
||||
**Date:** 2025
|
||||
**License:** Apache 2.0
|
||||
**Type:** Decoder-only transformer language model
|
||||
|
||||
### Model Sizes
|
||||
|
||||
NOVA comes in four sizes:
|
||||
|
||||
| Size | Parameters | Layers | Hidden Size | Attention Heads | Context Length |
|
||||
|------|-----------|--------|-------------|-----------------|----------------|
|
||||
| 125M | 125M | 12 | 768 | 12 | 2048 |
|
||||
| 350M | 350M | 24 | 1024 | 16 | 2048 |
|
||||
| 1.3B | 1.3B | 24 | 2048 | 32 (8 KV) | 2048 |
|
||||
| 3B | 3B | 32 | 2560 | 32 (8 KV) | 4096 |
|
||||
|
||||
### Architecture
|
||||
|
||||
- **Positional Encoding:** RoPE (Rotary Position Embedding)
|
||||
- **Normalization:** RMSNorm (default) or LayerNorm
|
||||
- **Activation:** SwiGLU (default), GeGLU, or GELU
|
||||
- **Attention:** Multi-head with optional grouped-query attention (GQA)
|
||||
- **Features:** KV-cache, gradient checkpointing, Flash Attention support
|
||||
|
||||
## Intended Use
|
||||
|
||||
### Primary Use Cases
|
||||
|
||||
- **Personal companion AI:** Conversational agent with customizable personas
|
||||
- **Local inference:** Privacy-focused applications on consumer hardware
|
||||
- **Research:** Transformer architecture experimentation
|
||||
- **Education:** Learning about modern LLM implementation
|
||||
|
||||
### Out of Scope
|
||||
|
||||
- **Production deployment without safety measures:** Additional content filtering recommended
|
||||
- **High-stakes decisions:** Not suitable for medical, legal, or financial advice
|
||||
- **Scalable services:** Designed for local/personal use, not cloud deployment
|
||||
|
||||
## Training Data
|
||||
|
||||
NOVA uses **only legally licensed datasets**:
|
||||
|
||||
### Approved Sources
|
||||
|
||||
- **Public Domain:** Project Gutenberg books
|
||||
- **CC0/CC-BY:** Wikipedia, OpenWebText, C4 corpus
|
||||
- **Open Licensed:** The Pile (ArXiv), OSI-approved code datasets
|
||||
|
||||
### License Tracking
|
||||
|
||||
All training data sources logged in `license_ledger.json` with:
|
||||
- Source name and URL
|
||||
- License type
|
||||
- Download date
|
||||
- Data provenance
|
||||
|
||||
### Exclusions
|
||||
|
||||
- No scraped data without verified licenses
|
||||
- No copyrighted material
|
||||
- No personally identifiable information (PII)
|
||||
- No user data without explicit consent
|
||||
|
||||
## Training Procedure
|
||||
|
||||
### Hyperparameters
|
||||
|
||||
Default training configuration (125M):
|
||||
|
||||
```yaml
|
||||
batch_size: 8
|
||||
gradient_accumulation: 4
|
||||
learning_rate: 3e-4
|
||||
weight_decay: 0.1
|
||||
warmup_steps: 1000
|
||||
max_steps: 100000
|
||||
optimizer: AdamW
|
||||
lr_schedule: cosine with warmup
|
||||
```
|
||||
|
||||
### Hardware
|
||||
|
||||
- **Minimum:** CPU (4+ cores), 8GB RAM
|
||||
- **Recommended:** NVIDIA GPU (8GB+ VRAM), 16GB+ RAM
|
||||
- **Optimal:** NVIDIA GPU (24GB+ VRAM), 32GB+ RAM
|
||||
|
||||
### Optimizations
|
||||
|
||||
- **Mixed Precision:** AMP (Automatic Mixed Precision) on GPU
|
||||
- **Gradient Checkpointing:** Reduces memory usage
|
||||
- **Distributed Training:** DDP (DistributedDataParallel) support
|
||||
|
||||
## Evaluation
|
||||
|
||||
### Metrics
|
||||
|
||||
- **Perplexity:** Language modeling quality
|
||||
- **Latency:** Inference speed (tokens/second)
|
||||
- **Memory:** Peak RAM/VRAM usage
|
||||
- **Persona Adherence:** Style consistency with selected persona
|
||||
|
||||
### Benchmarks
|
||||
|
||||
(To be added as pre-trained models become available)
|
||||
|
||||
## Persona System
|
||||
|
||||
### Design Philosophy
|
||||
|
||||
NOVA includes a **personality matrix** system for controllable conversational style:
|
||||
|
||||
- **No AI Disclosure by Default:** `always_disclose: false`
|
||||
- **Private Use Context:** Designed for personal, local deployment
|
||||
- **Customizable:** Users can create custom personas
|
||||
|
||||
### Personality Traits
|
||||
|
||||
Eight traits (0.0-1.0) that modulate generation:
|
||||
|
||||
1. Warmth
|
||||
2. Humor
|
||||
3. Empathy
|
||||
4. Decisiveness
|
||||
5. Creativity
|
||||
6. Intimacy
|
||||
7. Playfulness
|
||||
8. Formality
|
||||
|
||||
### Default Personas
|
||||
|
||||
- **girlfriend_gentle:** High warmth, high empathy
|
||||
- **girlfriend_playful:** High humor, high playfulness
|
||||
- **girlfriend_supportive:** Balanced traits (default)
|
||||
|
||||
## Ethical Considerations
|
||||
|
||||
### Privacy
|
||||
|
||||
- **Local-First:** All processing on-device
|
||||
- **No Telemetry:** Zero data collection
|
||||
- **User Control:** Complete control over data and models
|
||||
|
||||
### Bias and Fairness
|
||||
|
||||
- **Training Data Bias:** Inherits biases from source datasets
|
||||
- **Mitigation:** Use diverse, openly licensed sources
|
||||
- **Ongoing Work:** Bias evaluation and mitigation strategies
|
||||
|
||||
### Content Safety
|
||||
|
||||
- **Basic Filters:** Profanity and unsafe content detection
|
||||
- **Limitations:** Not a complete safety solution
|
||||
- **Recommendation:** Additional filtering for public-facing use
|
||||
|
||||
### AI Disclosure
|
||||
|
||||
- **Configurable:** `always_disclose` setting in persona config
|
||||
- **Default:** False (for private, personal use)
|
||||
- **Recommendation:** Enable for any public or shared deployment
|
||||
|
||||
## Limitations
|
||||
|
||||
### Technical
|
||||
|
||||
- **Small Context:** 2048-4096 tokens (not suitable for long documents)
|
||||
- **Compute:** Smaller models may have lower quality than larger LLMs
|
||||
- **Hallucination:** May generate factually incorrect information
|
||||
|
||||
### Use Case
|
||||
|
||||
- **Not a knowledge base:** May not have up-to-date information
|
||||
- **Not a specialist:** General-purpose, not domain-specific
|
||||
- **Not production-ready (as-is):** Requires additional safety/filtering
|
||||
|
||||
## Evolutionary Algorithm (NOVA-EVO)
|
||||
|
||||
### Purpose
|
||||
|
||||
Optional genetic algorithm for automatic configuration optimization:
|
||||
|
||||
- **Hyperparameter Search:** Learning rate, batch size, warmup
|
||||
- **Architecture Search:** Activation, normalization, positional encoding
|
||||
- **Multi-Objective:** Optimizes loss, latency, memory simultaneously
|
||||
|
||||
### Fitness Metrics
|
||||
|
||||
- **Loss/Perplexity:** (50% weight)
|
||||
- **Latency:** (20% weight)
|
||||
- **Memory:** (20% weight)
|
||||
- **Quality:** (10% weight)
|
||||
|
||||
### Compute Budget
|
||||
|
||||
- **Small:** 20 individuals, 10 generations (~6-12 hours)
|
||||
- **Medium:** 40 individuals, 20 generations (~24-48 hours)
|
||||
- **Large:** 100 individuals, 50 generations (~1-2 weeks)
|
||||
|
||||
## Contact
|
||||
|
||||
For questions, issues, or contributions:
|
||||
|
||||
- **GitHub:** [github.com/yourusername/nova](https://github.com/yourusername/nova)
|
||||
- **Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@software{nova2025,
|
||||
title={NOVA: Neuro-Optimizing Versatile Agent},
|
||||
author={NOVA Project Contributors},
|
||||
year={2025},
|
||||
url={https://github.com/yourusername/nova},
|
||||
license={Apache-2.0}
|
||||
}
|
||||
```
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- Transformer architecture inspired by GPT, LLaMA, and modern LLM research
|
||||
- RoPE, RMSNorm, SwiGLU from recent papers (Su et al., Zhang et al., Shazeer et al.)
|
||||
- Open source community for datasets and tools
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025
|
||||
**Model Card Version:** 1.0
|
Reference in New Issue
Block a user