Files
NOVA/docs/PRIVACY_LOCAL.md
Dani a7f091aa45 Initial commit: NOVA - Neuro-Optimizing Versatile Agent
Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 20:56:37 -04:00

331 lines
7.3 KiB
Markdown

# Privacy and Local Use
## NOVA Privacy Statement
NOVA is designed as a **local-first, privacy-focused** language model. This document explains how NOVA handles your data.
---
## Core Principles
### 1. Local-First
**Everything runs on your device.**
- Model inference happens locally
- Training data stays on your machine
- No cloud dependencies
- No internet required (except for dataset downloads)
### 2. Zero Telemetry
**NOVA collects zero data.**
- No usage tracking
- No error reporting
- No analytics
- No phone-home functionality
### 3. Complete User Control
**You own everything.**
- Your conversations
- Your trained models
- Your custom personas
- Your data
---
## Data Storage
### Where Your Data Lives
```
C:\Development\Nova\
├── memory.db # Your conversation history (SQLite)
├── checkpoints/ # Your trained models
├── data/ # Your training data
└── configs/persona/ # Your custom personas
```
**All on your device. Never uploaded.**
### Conversation Memory
- **Location:** `memory.db` (SQLite database)
- **Contents:** Your chat history
- **Encryption:** Not encrypted by default (it's local)
- **Deletion:** Delete `memory.db` file to erase all history
- **Recommendation:** Encrypt your drive if sharing the device
### Model Checkpoints
- **Location:** `checkpoints/` directory
- **Contents:** Model weights and training state
- **Sharing:** Safe to share (contains no personal data)
---
## Network Activity
### When NOVA Uses the Internet
NOVA **only** uses the internet for:
1. **Dataset Downloads:** Downloading legal training datasets (opt-in)
2. **Optional:** Downloading pre-trained weights (if available)
### When NOVA Does NOT Use Internet
- **Chat inference:** 100% offline
- **Model training:** 100% offline
- **Persona customization:** 100% offline
- **Evolution (NOVA-EVO):** 100% offline
### Firewall Safety
NOVA is safe to run behind a firewall with no internet access (after initial setup).
---
## AI Disclosure Setting
### `always_disclose` Flag
NOVA personas have an `always_disclose` setting:
```yaml
always_disclose: false # Default
```
**What this means:**
- `false` (default): NOVA does NOT disclose being AI
- Designed for **private, personal use**
- Appropriate for local companion scenarios
- `true`: NOVA includes AI disclosure text
- Recommended for **shared or public use**
- Adds transparency about AI nature
### When to Enable Disclosure
**Enable `always_disclose: true` if:**
- Sharing NOVA with others
- Deploying publicly (e.g., website, app)
- Any scenario where users might not know it's AI
**Keep `always_disclose: false` if:**
- Personal, private use on your own device
- You're fully aware it's a language model
- Testing/development
**Default:** False (personal use assumption)
---
## Persona System Privacy
### Personality Matrix
The personality matrix (warmth, humor, empathy, etc.) is:
- **Stored:** In persona YAML files
- **Processed:** Locally during generation
- **Shared:** Never (unless you share the files)
### Custom Personas
Your custom persona configurations:
- **Location:** `configs/persona/` directory
- **Format:** YAML (human-readable text)
- **Privacy:** Stored locally, never transmitted
---
## Training Data Privacy
### Legal Data Only
NOVA enforces **legal-only datasets**:
- Public domain sources
- Openly licensed datasets (CC0, CC-BY, MIT, Apache)
- License tracking in `license_ledger.json`
**No private data scraping.**
### Your Own Data
If you train NOVA on your own data:
- **Stays local:** Never leaves your device
- **Your responsibility:** Ensure you have rights to use it
- **Recommendation:** Don't train on sensitive/private data you don't want in the model
---
## Security Considerations
### Running NOVA Safely
**Do:**
- Run on a trusted device
- Keep your OS and Python dependencies updated
- Use filesystem encryption if device is shared
- Review code before running (it's open source!)
⚠️ **Don't:**
- Expose the REST API to the internet without authentication
- Train on sensitive data you can't afford to leak
- Share `memory.db` if it contains private conversations
### REST API Security
If using the REST API (`nova chat serve`):
- **Default:** Binds to `0.0.0.0:8000` (all interfaces)
- **Recommendation:** Use `--host 127.0.0.1` for local-only
- **Authentication:** Not included (add if exposing externally)
- **HTTPS:** Not included (add if exposing externally)
**For personal use:** Keep localhost-only.
**For shared use:** Add authentication, HTTPS, rate limiting.
---
## Data Deletion
### Clear All Conversations
```bash
# Delete conversation database
rm memory.db
# Or programmatically
from nova_chat import ConversationMemory
memory = ConversationMemory()
memory.clear_all()
```
### Remove Models
```bash
# Delete checkpoints
rm -rf checkpoints/
```
### Complete Reset
```bash
# Remove all data
rm -rf data/ checkpoints/ memory.db
```
---
## Third-Party Dependencies
NOVA uses standard open-source libraries:
- **PyTorch:** ML framework
- **SentencePiece:** Tokenization
- **FastAPI/Uvicorn:** REST API (optional)
- **SQLite:** Conversation storage
**All are open source and widely audited.**
### Dependency Privacy
- PyTorch: No telemetry (when installed normally)
- SentencePiece: No telemetry
- FastAPI: No telemetry
- SQLite: Local database, no telemetry
---
## Comparison to Cloud LLMs
| Feature | NOVA | Cloud LLMs |
|---------|------|------------|
| **Data Location** | Your device | Company servers |
| **Privacy** | Complete | Varies by provider |
| **Telemetry** | None | Usually tracked |
| **Internet Required** | No (after setup) | Yes |
| **Cost** | One-time (hardware) | Per-token/monthly |
| **Customization** | Full control | Limited |
| **Data Retention** | Your choice | Company policy |
---
## Transparency
### Open Source
NOVA is **fully open source** under Apache 2.0:
- **Source code:** Fully auditable
- **No hidden functionality:** What you see is what you get
- **Community review:** Anyone can inspect for privacy issues
### No Hidden Behavior
NOVA does **not**:
- Phone home
- Send analytics
- Track usage
- Report errors to external services
- Auto-update without your action
---
## Recommendations
### For Maximum Privacy
1. **Offline Mode:** Disable network after downloading dependencies
2. **Encrypt Storage:** Use full-disk encryption (BitLocker, FileVault, LUKS)
3. **Regular Cleanup:** Clear `memory.db` periodically if desired
4. **Review Code:** Inspect the source before running
### For Shared Devices
1. **Enable Disclosure:** Set `always_disclose: true`
2. **Separate Accounts:** Use OS user accounts to isolate data
3. **Clear Conversations:** Delete history after sessions
### For Development
1. **Test Data Only:** Don't use real sensitive data for testing
2. **Version Control:** Add `memory.db` and `checkpoints/` to `.gitignore`
---
## Contact for Privacy Concerns
If you find privacy issues:
- **GitHub Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
- **Security:** Tag issues with `security` label
---
## Summary
**NOVA is designed for local, private use.**
✅ No data collection
✅ No telemetry
✅ No cloud dependencies
✅ Complete user control
✅ Open source and auditable
**Your data stays on your device.**
---
**Last Updated:** 2025
**Document Version:** 1.0