Files

Dani a7f091aa45 Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>

2025-10-12 20:56:37 -04:00

7.3 KiB

Raw Permalink Blame History

Privacy and Local Use

NOVA Privacy Statement

NOVA is designed as a local-first, privacy-focused language model. This document explains how NOVA handles your data.

Core Principles

1. Local-First

Everything runs on your device.

Model inference happens locally
Training data stays on your machine
No cloud dependencies
No internet required (except for dataset downloads)

2. Zero Telemetry

NOVA collects zero data.

No usage tracking
No error reporting
No analytics
No phone-home functionality

3. Complete User Control

You own everything.

Your conversations
Your trained models
Your custom personas
Your data

Data Storage

Where Your Data Lives

C:\Development\Nova\
├── memory.db                    # Your conversation history (SQLite)
├── checkpoints/                 # Your trained models
├── data/                        # Your training data
└── configs/persona/             # Your custom personas

All on your device. Never uploaded.

Conversation Memory

Location: memory.db (SQLite database)
Contents: Your chat history
Encryption: Not encrypted by default (it's local)
Deletion: Delete memory.db file to erase all history
Recommendation: Encrypt your drive if sharing the device

Model Checkpoints

Location: checkpoints/ directory
Contents: Model weights and training state
Sharing: Safe to share (contains no personal data)

Network Activity

When NOVA Uses the Internet

NOVA only uses the internet for:

Dataset Downloads: Downloading legal training datasets (opt-in)
Optional: Downloading pre-trained weights (if available)

When NOVA Does NOT Use Internet

Chat inference: 100% offline
Model training: 100% offline
Persona customization: 100% offline
Evolution (NOVA-EVO): 100% offline

Firewall Safety

NOVA is safe to run behind a firewall with no internet access (after initial setup).

AI Disclosure Setting

`always_disclose` Flag

NOVA personas have an always_disclose setting:

always_disclose: false  # Default

What this means:

false (default): NOVA does NOT disclose being AI
- Designed for private, personal use
- Appropriate for local companion scenarios
true: NOVA includes AI disclosure text
- Recommended for shared or public use
- Adds transparency about AI nature

When to Enable Disclosure

✅ Enable always_disclose: true if:

Sharing NOVA with others
Deploying publicly (e.g., website, app)
Any scenario where users might not know it's AI

❌ Keep always_disclose: false if:

Personal, private use on your own device
You're fully aware it's a language model
Testing/development

Default: False (personal use assumption)

Persona System Privacy

Personality Matrix

The personality matrix (warmth, humor, empathy, etc.) is:

Stored: In persona YAML files
Processed: Locally during generation
Shared: Never (unless you share the files)

Custom Personas

Your custom persona configurations:

Location: configs/persona/ directory
Format: YAML (human-readable text)
Privacy: Stored locally, never transmitted

Training Data Privacy

Legal Data Only

NOVA enforces legal-only datasets:

Public domain sources
Openly licensed datasets (CC0, CC-BY, MIT, Apache)
License tracking in license_ledger.json

No private data scraping.

Your Own Data

If you train NOVA on your own data:

Stays local: Never leaves your device
Your responsibility: Ensure you have rights to use it
Recommendation: Don't train on sensitive/private data you don't want in the model

Security Considerations

Running NOVA Safely

✅ Do:

Run on a trusted device
Keep your OS and Python dependencies updated
Use filesystem encryption if device is shared
Review code before running (it's open source!)

⚠️ Don't:

Expose the REST API to the internet without authentication
Train on sensitive data you can't afford to leak
Share memory.db if it contains private conversations

REST API Security

If using the REST API (nova chat serve):

Default: Binds to 0.0.0.0:8000 (all interfaces)
Recommendation: Use --host 127.0.0.1 for local-only
Authentication: Not included (add if exposing externally)
HTTPS: Not included (add if exposing externally)

For personal use: Keep localhost-only. For shared use: Add authentication, HTTPS, rate limiting.

Data Deletion

Clear All Conversations

# Delete conversation database
rm memory.db

# Or programmatically
from nova_chat import ConversationMemory
memory = ConversationMemory()
memory.clear_all()

Remove Models

# Delete checkpoints
rm -rf checkpoints/

Complete Reset

# Remove all data
rm -rf data/ checkpoints/ memory.db

Third-Party Dependencies

NOVA uses standard open-source libraries:

PyTorch: ML framework
SentencePiece: Tokenization
FastAPI/Uvicorn: REST API (optional)
SQLite: Conversation storage

All are open source and widely audited.

Dependency Privacy

PyTorch: No telemetry (when installed normally)
SentencePiece: No telemetry
FastAPI: No telemetry
SQLite: Local database, no telemetry

Comparison to Cloud LLMs

Feature	NOVA	Cloud LLMs
Data Location	Your device	Company servers
Privacy	Complete	Varies by provider
Telemetry	None	Usually tracked
Internet Required	No (after setup)	Yes
Cost	One-time (hardware)	Per-token/monthly
Customization	Full control	Limited
Data Retention	Your choice	Company policy

Transparency

Open Source

NOVA is fully open source under Apache 2.0:

Source code: Fully auditable
No hidden functionality: What you see is what you get
Community review: Anyone can inspect for privacy issues

No Hidden Behavior

NOVA does not:

Phone home
Send analytics
Track usage
Report errors to external services
Auto-update without your action

Recommendations

For Maximum Privacy

Offline Mode: Disable network after downloading dependencies
Encrypt Storage: Use full-disk encryption (BitLocker, FileVault, LUKS)
Regular Cleanup: Clear memory.db periodically if desired
Review Code: Inspect the source before running

For Shared Devices

Enable Disclosure: Set always_disclose: true
Separate Accounts: Use OS user accounts to isolate data
Clear Conversations: Delete history after sessions

For Development

Test Data Only: Don't use real sensitive data for testing
Version Control: Add memory.db and checkpoints/ to .gitignore

Contact for Privacy Concerns

If you find privacy issues:

GitHub Issues: github.com/yourusername/nova/issues
Security: Tag issues with security label

Summary

NOVA is designed for local, private use.

✅ No data collection ✅ No telemetry ✅ No cloud dependencies ✅ Complete user control ✅ Open source and auditable

Your data stays on your device.

Last Updated: 2025 Document Version: 1.0

7.3 KiB Raw Permalink Blame History