Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -0,0 +1,105 @@
+name: NOVA CI
+
+on:
+  push:
+    branches: [ main, dev ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  test:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest]
+        python-version: ['3.10', '3.11']
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -r requirements.txt
+        pip install pytest pytest-cov ruff black mypy
+
+    - name: Lint with ruff
+      run: |
+        ruff check nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/ nova_data/
+
+    - name: Format check with black
+      run: |
+        black --check nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/ nova_data/
+
+    - name: Type check with mypy
+      run: |
+        mypy nova_core/ --ignore-missing-imports || true
+
+    - name: Test with pytest
+      run: |
+        pytest tests/ -v --cov=nova_core --cov=nova_tokenizer --cov=nova_train
+
+    - name: Upload coverage
+      uses: codecov/codecov-action@v3
+      if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10'
+
+  smoke-test:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Set up Python 3.10
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.10'
+
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -r requirements.txt
+
+    - name: Initialize NOVA
+      run: |
+        python scripts/cli.py init
+
+    - name: Train tokenizer (smoke test)
+      run: |
+        python scripts/cli.py tokenizer train \
+          --input data/toy_dataset/toy.txt \
+          --output test_tokenizer \
+          --vocab-size 1000
+
+    - name: Test tokenizer
+      run: |
+        python -c "from nova_tokenizer import NovaTokenizer; t = NovaTokenizer('test_tokenizer.model'); print('Vocab size:', len(t)); print('Encoded:', t.encode('Hello world'))"
+
+    - name: Data pipeline smoke test
+      run: |
+        python -c "from nova_data import DataPipeline; p = DataPipeline(); p.verify_licenses()"
+
+  build-check:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Set up Python 3.10
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.10'
+
+    - name: Build package
+      run: |
+        python -m pip install --upgrade pip build
+        python -m build
+
+    - name: Check package
+      run: |
+        python -m pip install dist/*.whl
+        python -c "import nova_core; import nova_tokenizer; import nova_train"