Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions

87
scripts/quickstart.sh Normal file
View File

@@ -0,0 +1,87 @@
#!/bin/bash
# NOVA Quickstart Script
# Sets up NOVA for first-time use
set -e
echo "======================================"
echo "NOVA Quickstart"
echo "======================================"
echo ""
# Check Python version
echo "Checking Python version..."
python_version=$(python --version 2>&1 | grep -oP '(?<=Python )\d+\.\d+')
required_version="3.10"
if [ "$(printf '%s\n' "$required_version" "$python_version" | sort -V | head -n1)" != "$required_version" ]; then
echo "❌ Python 3.10+ required. Found: $python_version"
exit 1
fi
echo "✓ Python $python_version"
echo ""
# Create virtual environment
if [ ! -d "venv" ]; then
echo "Creating virtual environment..."
python -m venv venv
echo "✓ Virtual environment created"
else
echo "✓ Virtual environment exists"
fi
echo ""
# Activate virtual environment
echo "Activating virtual environment..."
if [[ "$OSTYPE" == "msys" || "$OSTYPE" == "win32" ]]; then
source venv/Scripts/activate
else
source venv/bin/activate
fi
echo "✓ Virtual environment activated"
echo ""
# Install dependencies
echo "Installing dependencies..."
pip install --upgrade pip > /dev/null
pip install -r requirements.txt
echo "✓ Dependencies installed"
echo ""
# Install NOVA in development mode
echo "Installing NOVA..."
pip install -e .
echo "✓ NOVA installed"
echo ""
# Initialize project
echo "Initializing NOVA project..."
python scripts/cli.py init
echo ""
echo "======================================"
echo "✓ NOVA Setup Complete!"
echo "======================================"
echo ""
echo "Next steps:"
echo ""
echo "1. Train tokenizer:"
echo " python scripts/cli.py tokenizer train --input data/toy_dataset/toy.txt"
echo ""
echo "2. (Optional) Download legal datasets:"
echo " python scripts/cli.py data build --source wikipedia-en"
echo ""
echo "3. Train model:"
echo " python scripts/cli.py train --size 125m"
echo ""
echo "4. Chat:"
echo " python scripts/cli.py chat cli"
echo ""
echo "For more info: cat README.md"
echo ""