Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
7.3 KiB
Privacy and Local Use
NOVA Privacy Statement
NOVA is designed as a local-first, privacy-focused language model. This document explains how NOVA handles your data.
Core Principles
1. Local-First
Everything runs on your device.
- Model inference happens locally
- Training data stays on your machine
- No cloud dependencies
- No internet required (except for dataset downloads)
2. Zero Telemetry
NOVA collects zero data.
- No usage tracking
- No error reporting
- No analytics
- No phone-home functionality
3. Complete User Control
You own everything.
- Your conversations
- Your trained models
- Your custom personas
- Your data
Data Storage
Where Your Data Lives
C:\Development\Nova\
├── memory.db # Your conversation history (SQLite)
├── checkpoints/ # Your trained models
├── data/ # Your training data
└── configs/persona/ # Your custom personas
All on your device. Never uploaded.
Conversation Memory
- Location:
memory.db
(SQLite database) - Contents: Your chat history
- Encryption: Not encrypted by default (it's local)
- Deletion: Delete
memory.db
file to erase all history - Recommendation: Encrypt your drive if sharing the device
Model Checkpoints
- Location:
checkpoints/
directory - Contents: Model weights and training state
- Sharing: Safe to share (contains no personal data)
Network Activity
When NOVA Uses the Internet
NOVA only uses the internet for:
- Dataset Downloads: Downloading legal training datasets (opt-in)
- Optional: Downloading pre-trained weights (if available)
When NOVA Does NOT Use Internet
- Chat inference: 100% offline
- Model training: 100% offline
- Persona customization: 100% offline
- Evolution (NOVA-EVO): 100% offline
Firewall Safety
NOVA is safe to run behind a firewall with no internet access (after initial setup).
AI Disclosure Setting
always_disclose
Flag
NOVA personas have an always_disclose
setting:
always_disclose: false # Default
What this means:
-
false
(default): NOVA does NOT disclose being AI- Designed for private, personal use
- Appropriate for local companion scenarios
-
true
: NOVA includes AI disclosure text- Recommended for shared or public use
- Adds transparency about AI nature
When to Enable Disclosure
✅ Enable always_disclose: true
if:
- Sharing NOVA with others
- Deploying publicly (e.g., website, app)
- Any scenario where users might not know it's AI
❌ Keep always_disclose: false
if:
- Personal, private use on your own device
- You're fully aware it's a language model
- Testing/development
Default: False (personal use assumption)
Persona System Privacy
Personality Matrix
The personality matrix (warmth, humor, empathy, etc.) is:
- Stored: In persona YAML files
- Processed: Locally during generation
- Shared: Never (unless you share the files)
Custom Personas
Your custom persona configurations:
- Location:
configs/persona/
directory - Format: YAML (human-readable text)
- Privacy: Stored locally, never transmitted
Training Data Privacy
Legal Data Only
NOVA enforces legal-only datasets:
- Public domain sources
- Openly licensed datasets (CC0, CC-BY, MIT, Apache)
- License tracking in
license_ledger.json
No private data scraping.
Your Own Data
If you train NOVA on your own data:
- Stays local: Never leaves your device
- Your responsibility: Ensure you have rights to use it
- Recommendation: Don't train on sensitive/private data you don't want in the model
Security Considerations
Running NOVA Safely
✅ Do:
- Run on a trusted device
- Keep your OS and Python dependencies updated
- Use filesystem encryption if device is shared
- Review code before running (it's open source!)
⚠️ Don't:
- Expose the REST API to the internet without authentication
- Train on sensitive data you can't afford to leak
- Share
memory.db
if it contains private conversations
REST API Security
If using the REST API (nova chat serve
):
- Default: Binds to
0.0.0.0:8000
(all interfaces) - Recommendation: Use
--host 127.0.0.1
for local-only - Authentication: Not included (add if exposing externally)
- HTTPS: Not included (add if exposing externally)
For personal use: Keep localhost-only. For shared use: Add authentication, HTTPS, rate limiting.
Data Deletion
Clear All Conversations
# Delete conversation database
rm memory.db
# Or programmatically
from nova_chat import ConversationMemory
memory = ConversationMemory()
memory.clear_all()
Remove Models
# Delete checkpoints
rm -rf checkpoints/
Complete Reset
# Remove all data
rm -rf data/ checkpoints/ memory.db
Third-Party Dependencies
NOVA uses standard open-source libraries:
- PyTorch: ML framework
- SentencePiece: Tokenization
- FastAPI/Uvicorn: REST API (optional)
- SQLite: Conversation storage
All are open source and widely audited.
Dependency Privacy
- PyTorch: No telemetry (when installed normally)
- SentencePiece: No telemetry
- FastAPI: No telemetry
- SQLite: Local database, no telemetry
Comparison to Cloud LLMs
Feature | NOVA | Cloud LLMs |
---|---|---|
Data Location | Your device | Company servers |
Privacy | Complete | Varies by provider |
Telemetry | None | Usually tracked |
Internet Required | No (after setup) | Yes |
Cost | One-time (hardware) | Per-token/monthly |
Customization | Full control | Limited |
Data Retention | Your choice | Company policy |
Transparency
Open Source
NOVA is fully open source under Apache 2.0:
- Source code: Fully auditable
- No hidden functionality: What you see is what you get
- Community review: Anyone can inspect for privacy issues
No Hidden Behavior
NOVA does not:
- Phone home
- Send analytics
- Track usage
- Report errors to external services
- Auto-update without your action
Recommendations
For Maximum Privacy
- Offline Mode: Disable network after downloading dependencies
- Encrypt Storage: Use full-disk encryption (BitLocker, FileVault, LUKS)
- Regular Cleanup: Clear
memory.db
periodically if desired - Review Code: Inspect the source before running
For Shared Devices
- Enable Disclosure: Set
always_disclose: true
- Separate Accounts: Use OS user accounts to isolate data
- Clear Conversations: Delete history after sessions
For Development
- Test Data Only: Don't use real sensitive data for testing
- Version Control: Add
memory.db
andcheckpoints/
to.gitignore
Contact for Privacy Concerns
If you find privacy issues:
- GitHub Issues: github.com/yourusername/nova/issues
- Security: Tag issues with
security
label
Summary
NOVA is designed for local, private use.
✅ No data collection ✅ No telemetry ✅ No cloud dependencies ✅ Complete user control ✅ Open source and auditable
Your data stays on your device.
Last Updated: 2025 Document Version: 1.0