Files
NOVA/docs/PRIVACY_LOCAL.md
Dani a7f091aa45 Initial commit: NOVA - Neuro-Optimizing Versatile Agent
Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 20:56:37 -04:00

7.3 KiB

Privacy and Local Use

NOVA Privacy Statement

NOVA is designed as a local-first, privacy-focused language model. This document explains how NOVA handles your data.


Core Principles

1. Local-First

Everything runs on your device.

  • Model inference happens locally
  • Training data stays on your machine
  • No cloud dependencies
  • No internet required (except for dataset downloads)

2. Zero Telemetry

NOVA collects zero data.

  • No usage tracking
  • No error reporting
  • No analytics
  • No phone-home functionality

3. Complete User Control

You own everything.

  • Your conversations
  • Your trained models
  • Your custom personas
  • Your data

Data Storage

Where Your Data Lives

C:\Development\Nova\
├── memory.db                    # Your conversation history (SQLite)
├── checkpoints/                 # Your trained models
├── data/                        # Your training data
└── configs/persona/             # Your custom personas

All on your device. Never uploaded.

Conversation Memory

  • Location: memory.db (SQLite database)
  • Contents: Your chat history
  • Encryption: Not encrypted by default (it's local)
  • Deletion: Delete memory.db file to erase all history
  • Recommendation: Encrypt your drive if sharing the device

Model Checkpoints

  • Location: checkpoints/ directory
  • Contents: Model weights and training state
  • Sharing: Safe to share (contains no personal data)

Network Activity

When NOVA Uses the Internet

NOVA only uses the internet for:

  1. Dataset Downloads: Downloading legal training datasets (opt-in)
  2. Optional: Downloading pre-trained weights (if available)

When NOVA Does NOT Use Internet

  • Chat inference: 100% offline
  • Model training: 100% offline
  • Persona customization: 100% offline
  • Evolution (NOVA-EVO): 100% offline

Firewall Safety

NOVA is safe to run behind a firewall with no internet access (after initial setup).


AI Disclosure Setting

always_disclose Flag

NOVA personas have an always_disclose setting:

always_disclose: false  # Default

What this means:

  • false (default): NOVA does NOT disclose being AI

    • Designed for private, personal use
    • Appropriate for local companion scenarios
  • true: NOVA includes AI disclosure text

    • Recommended for shared or public use
    • Adds transparency about AI nature

When to Enable Disclosure

Enable always_disclose: true if:

  • Sharing NOVA with others
  • Deploying publicly (e.g., website, app)
  • Any scenario where users might not know it's AI

Keep always_disclose: false if:

  • Personal, private use on your own device
  • You're fully aware it's a language model
  • Testing/development

Default: False (personal use assumption)


Persona System Privacy

Personality Matrix

The personality matrix (warmth, humor, empathy, etc.) is:

  • Stored: In persona YAML files
  • Processed: Locally during generation
  • Shared: Never (unless you share the files)

Custom Personas

Your custom persona configurations:

  • Location: configs/persona/ directory
  • Format: YAML (human-readable text)
  • Privacy: Stored locally, never transmitted

Training Data Privacy

NOVA enforces legal-only datasets:

  • Public domain sources
  • Openly licensed datasets (CC0, CC-BY, MIT, Apache)
  • License tracking in license_ledger.json

No private data scraping.

Your Own Data

If you train NOVA on your own data:

  • Stays local: Never leaves your device
  • Your responsibility: Ensure you have rights to use it
  • Recommendation: Don't train on sensitive/private data you don't want in the model

Security Considerations

Running NOVA Safely

Do:

  • Run on a trusted device
  • Keep your OS and Python dependencies updated
  • Use filesystem encryption if device is shared
  • Review code before running (it's open source!)

⚠️ Don't:

  • Expose the REST API to the internet without authentication
  • Train on sensitive data you can't afford to leak
  • Share memory.db if it contains private conversations

REST API Security

If using the REST API (nova chat serve):

  • Default: Binds to 0.0.0.0:8000 (all interfaces)
  • Recommendation: Use --host 127.0.0.1 for local-only
  • Authentication: Not included (add if exposing externally)
  • HTTPS: Not included (add if exposing externally)

For personal use: Keep localhost-only. For shared use: Add authentication, HTTPS, rate limiting.


Data Deletion

Clear All Conversations

# Delete conversation database
rm memory.db

# Or programmatically
from nova_chat import ConversationMemory
memory = ConversationMemory()
memory.clear_all()

Remove Models

# Delete checkpoints
rm -rf checkpoints/

Complete Reset

# Remove all data
rm -rf data/ checkpoints/ memory.db

Third-Party Dependencies

NOVA uses standard open-source libraries:

  • PyTorch: ML framework
  • SentencePiece: Tokenization
  • FastAPI/Uvicorn: REST API (optional)
  • SQLite: Conversation storage

All are open source and widely audited.

Dependency Privacy

  • PyTorch: No telemetry (when installed normally)
  • SentencePiece: No telemetry
  • FastAPI: No telemetry
  • SQLite: Local database, no telemetry

Comparison to Cloud LLMs

Feature NOVA Cloud LLMs
Data Location Your device Company servers
Privacy Complete Varies by provider
Telemetry None Usually tracked
Internet Required No (after setup) Yes
Cost One-time (hardware) Per-token/monthly
Customization Full control Limited
Data Retention Your choice Company policy

Transparency

Open Source

NOVA is fully open source under Apache 2.0:

  • Source code: Fully auditable
  • No hidden functionality: What you see is what you get
  • Community review: Anyone can inspect for privacy issues

No Hidden Behavior

NOVA does not:

  • Phone home
  • Send analytics
  • Track usage
  • Report errors to external services
  • Auto-update without your action

Recommendations

For Maximum Privacy

  1. Offline Mode: Disable network after downloading dependencies
  2. Encrypt Storage: Use full-disk encryption (BitLocker, FileVault, LUKS)
  3. Regular Cleanup: Clear memory.db periodically if desired
  4. Review Code: Inspect the source before running

For Shared Devices

  1. Enable Disclosure: Set always_disclose: true
  2. Separate Accounts: Use OS user accounts to isolate data
  3. Clear Conversations: Delete history after sessions

For Development

  1. Test Data Only: Don't use real sensitive data for testing
  2. Version Control: Add memory.db and checkpoints/ to .gitignore

Contact for Privacy Concerns

If you find privacy issues:


Summary

NOVA is designed for local, private use.

No data collection No telemetry No cloud dependencies Complete user control Open source and auditable

Your data stays on your device.


Last Updated: 2025 Document Version: 1.0