Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
233 lines
6.3 KiB
Markdown
233 lines
6.3 KiB
Markdown
# NOVA Model Card
|
|
|
|
## Model Details
|
|
|
|
**Name:** NOVA (Neuro-Optimizing Versatile Agent)
|
|
**Version:** 0.1.0
|
|
**Date:** 2025
|
|
**License:** Apache 2.0
|
|
**Type:** Decoder-only transformer language model
|
|
|
|
### Model Sizes
|
|
|
|
NOVA comes in four sizes:
|
|
|
|
| Size | Parameters | Layers | Hidden Size | Attention Heads | Context Length |
|
|
|------|-----------|--------|-------------|-----------------|----------------|
|
|
| 125M | 125M | 12 | 768 | 12 | 2048 |
|
|
| 350M | 350M | 24 | 1024 | 16 | 2048 |
|
|
| 1.3B | 1.3B | 24 | 2048 | 32 (8 KV) | 2048 |
|
|
| 3B | 3B | 32 | 2560 | 32 (8 KV) | 4096 |
|
|
|
|
### Architecture
|
|
|
|
- **Positional Encoding:** RoPE (Rotary Position Embedding)
|
|
- **Normalization:** RMSNorm (default) or LayerNorm
|
|
- **Activation:** SwiGLU (default), GeGLU, or GELU
|
|
- **Attention:** Multi-head with optional grouped-query attention (GQA)
|
|
- **Features:** KV-cache, gradient checkpointing, Flash Attention support
|
|
|
|
## Intended Use
|
|
|
|
### Primary Use Cases
|
|
|
|
- **Personal companion AI:** Conversational agent with customizable personas
|
|
- **Local inference:** Privacy-focused applications on consumer hardware
|
|
- **Research:** Transformer architecture experimentation
|
|
- **Education:** Learning about modern LLM implementation
|
|
|
|
### Out of Scope
|
|
|
|
- **Production deployment without safety measures:** Additional content filtering recommended
|
|
- **High-stakes decisions:** Not suitable for medical, legal, or financial advice
|
|
- **Scalable services:** Designed for local/personal use, not cloud deployment
|
|
|
|
## Training Data
|
|
|
|
NOVA uses **only legally licensed datasets**:
|
|
|
|
### Approved Sources
|
|
|
|
- **Public Domain:** Project Gutenberg books
|
|
- **CC0/CC-BY:** Wikipedia, OpenWebText, C4 corpus
|
|
- **Open Licensed:** The Pile (ArXiv), OSI-approved code datasets
|
|
|
|
### License Tracking
|
|
|
|
All training data sources logged in `license_ledger.json` with:
|
|
- Source name and URL
|
|
- License type
|
|
- Download date
|
|
- Data provenance
|
|
|
|
### Exclusions
|
|
|
|
- No scraped data without verified licenses
|
|
- No copyrighted material
|
|
- No personally identifiable information (PII)
|
|
- No user data without explicit consent
|
|
|
|
## Training Procedure
|
|
|
|
### Hyperparameters
|
|
|
|
Default training configuration (125M):
|
|
|
|
```yaml
|
|
batch_size: 8
|
|
gradient_accumulation: 4
|
|
learning_rate: 3e-4
|
|
weight_decay: 0.1
|
|
warmup_steps: 1000
|
|
max_steps: 100000
|
|
optimizer: AdamW
|
|
lr_schedule: cosine with warmup
|
|
```
|
|
|
|
### Hardware
|
|
|
|
- **Minimum:** CPU (4+ cores), 8GB RAM
|
|
- **Recommended:** NVIDIA GPU (8GB+ VRAM), 16GB+ RAM
|
|
- **Optimal:** NVIDIA GPU (24GB+ VRAM), 32GB+ RAM
|
|
|
|
### Optimizations
|
|
|
|
- **Mixed Precision:** AMP (Automatic Mixed Precision) on GPU
|
|
- **Gradient Checkpointing:** Reduces memory usage
|
|
- **Distributed Training:** DDP (DistributedDataParallel) support
|
|
|
|
## Evaluation
|
|
|
|
### Metrics
|
|
|
|
- **Perplexity:** Language modeling quality
|
|
- **Latency:** Inference speed (tokens/second)
|
|
- **Memory:** Peak RAM/VRAM usage
|
|
- **Persona Adherence:** Style consistency with selected persona
|
|
|
|
### Benchmarks
|
|
|
|
(To be added as pre-trained models become available)
|
|
|
|
## Persona System
|
|
|
|
### Design Philosophy
|
|
|
|
NOVA includes a **personality matrix** system for controllable conversational style:
|
|
|
|
- **No AI Disclosure by Default:** `always_disclose: false`
|
|
- **Private Use Context:** Designed for personal, local deployment
|
|
- **Customizable:** Users can create custom personas
|
|
|
|
### Personality Traits
|
|
|
|
Eight traits (0.0-1.0) that modulate generation:
|
|
|
|
1. Warmth
|
|
2. Humor
|
|
3. Empathy
|
|
4. Decisiveness
|
|
5. Creativity
|
|
6. Intimacy
|
|
7. Playfulness
|
|
8. Formality
|
|
|
|
### Default Personas
|
|
|
|
- **girlfriend_gentle:** High warmth, high empathy
|
|
- **girlfriend_playful:** High humor, high playfulness
|
|
- **girlfriend_supportive:** Balanced traits (default)
|
|
|
|
## Ethical Considerations
|
|
|
|
### Privacy
|
|
|
|
- **Local-First:** All processing on-device
|
|
- **No Telemetry:** Zero data collection
|
|
- **User Control:** Complete control over data and models
|
|
|
|
### Bias and Fairness
|
|
|
|
- **Training Data Bias:** Inherits biases from source datasets
|
|
- **Mitigation:** Use diverse, openly licensed sources
|
|
- **Ongoing Work:** Bias evaluation and mitigation strategies
|
|
|
|
### Content Safety
|
|
|
|
- **Basic Filters:** Profanity and unsafe content detection
|
|
- **Limitations:** Not a complete safety solution
|
|
- **Recommendation:** Additional filtering for public-facing use
|
|
|
|
### AI Disclosure
|
|
|
|
- **Configurable:** `always_disclose` setting in persona config
|
|
- **Default:** False (for private, personal use)
|
|
- **Recommendation:** Enable for any public or shared deployment
|
|
|
|
## Limitations
|
|
|
|
### Technical
|
|
|
|
- **Small Context:** 2048-4096 tokens (not suitable for long documents)
|
|
- **Compute:** Smaller models may have lower quality than larger LLMs
|
|
- **Hallucination:** May generate factually incorrect information
|
|
|
|
### Use Case
|
|
|
|
- **Not a knowledge base:** May not have up-to-date information
|
|
- **Not a specialist:** General-purpose, not domain-specific
|
|
- **Not production-ready (as-is):** Requires additional safety/filtering
|
|
|
|
## Evolutionary Algorithm (NOVA-EVO)
|
|
|
|
### Purpose
|
|
|
|
Optional genetic algorithm for automatic configuration optimization:
|
|
|
|
- **Hyperparameter Search:** Learning rate, batch size, warmup
|
|
- **Architecture Search:** Activation, normalization, positional encoding
|
|
- **Multi-Objective:** Optimizes loss, latency, memory simultaneously
|
|
|
|
### Fitness Metrics
|
|
|
|
- **Loss/Perplexity:** (50% weight)
|
|
- **Latency:** (20% weight)
|
|
- **Memory:** (20% weight)
|
|
- **Quality:** (10% weight)
|
|
|
|
### Compute Budget
|
|
|
|
- **Small:** 20 individuals, 10 generations (~6-12 hours)
|
|
- **Medium:** 40 individuals, 20 generations (~24-48 hours)
|
|
- **Large:** 100 individuals, 50 generations (~1-2 weeks)
|
|
|
|
## Contact
|
|
|
|
For questions, issues, or contributions:
|
|
|
|
- **GitHub:** [github.com/yourusername/nova](https://github.com/yourusername/nova)
|
|
- **Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@software{nova2025,
|
|
title={NOVA: Neuro-Optimizing Versatile Agent},
|
|
author={NOVA Project Contributors},
|
|
year={2025},
|
|
url={https://github.com/yourusername/nova},
|
|
license={Apache-2.0}
|
|
}
|
|
```
|
|
|
|
## Acknowledgments
|
|
|
|
- Transformer architecture inspired by GPT, LLaMA, and modern LLM research
|
|
- RoPE, RMSNorm, SwiGLU from recent papers (Su et al., Zhang et al., Shazeer et al.)
|
|
- Open source community for datasets and tools
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025
|
|
**Model Card Version:** 1.0
|