Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions

105
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,105 @@
name: NOVA CI
on:
push:
branches: [ main, dev ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
python-version: ['3.10', '3.11']
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest pytest-cov ruff black mypy
- name: Lint with ruff
run: |
ruff check nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/ nova_data/
- name: Format check with black
run: |
black --check nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/ nova_data/
- name: Type check with mypy
run: |
mypy nova_core/ --ignore-missing-imports || true
- name: Test with pytest
run: |
pytest tests/ -v --cov=nova_core --cov=nova_tokenizer --cov=nova_train
- name: Upload coverage
uses: codecov/codecov-action@v3
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10'
smoke-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Initialize NOVA
run: |
python scripts/cli.py init
- name: Train tokenizer (smoke test)
run: |
python scripts/cli.py tokenizer train \
--input data/toy_dataset/toy.txt \
--output test_tokenizer \
--vocab-size 1000
- name: Test tokenizer
run: |
python -c "from nova_tokenizer import NovaTokenizer; t = NovaTokenizer('test_tokenizer.model'); print('Vocab size:', len(t)); print('Encoded:', t.encode('Hello world'))"
- name: Data pipeline smoke test
run: |
python -c "from nova_data import DataPipeline; p = DataPipeline(); p.verify_licenses()"
build-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Build package
run: |
python -m pip install --upgrade pip build
python -m build
- name: Check package
run: |
python -m pip install dist/*.whl
python -c "import nova_core; import nova_tokenizer; import nova_train"