# Privacy and Local Use ## NOVA Privacy Statement NOVA is designed as a **local-first, privacy-focused** language model. This document explains how NOVA handles your data. --- ## Core Principles ### 1. Local-First **Everything runs on your device.** - Model inference happens locally - Training data stays on your machine - No cloud dependencies - No internet required (except for dataset downloads) ### 2. Zero Telemetry **NOVA collects zero data.** - No usage tracking - No error reporting - No analytics - No phone-home functionality ### 3. Complete User Control **You own everything.** - Your conversations - Your trained models - Your custom personas - Your data --- ## Data Storage ### Where Your Data Lives ``` C:\Development\Nova\ ├── memory.db # Your conversation history (SQLite) ├── checkpoints/ # Your trained models ├── data/ # Your training data └── configs/persona/ # Your custom personas ``` **All on your device. Never uploaded.** ### Conversation Memory - **Location:** `memory.db` (SQLite database) - **Contents:** Your chat history - **Encryption:** Not encrypted by default (it's local) - **Deletion:** Delete `memory.db` file to erase all history - **Recommendation:** Encrypt your drive if sharing the device ### Model Checkpoints - **Location:** `checkpoints/` directory - **Contents:** Model weights and training state - **Sharing:** Safe to share (contains no personal data) --- ## Network Activity ### When NOVA Uses the Internet NOVA **only** uses the internet for: 1. **Dataset Downloads:** Downloading legal training datasets (opt-in) 2. **Optional:** Downloading pre-trained weights (if available) ### When NOVA Does NOT Use Internet - **Chat inference:** 100% offline - **Model training:** 100% offline - **Persona customization:** 100% offline - **Evolution (NOVA-EVO):** 100% offline ### Firewall Safety NOVA is safe to run behind a firewall with no internet access (after initial setup). --- ## AI Disclosure Setting ### `always_disclose` Flag NOVA personas have an `always_disclose` setting: ```yaml always_disclose: false # Default ``` **What this means:** - `false` (default): NOVA does NOT disclose being AI - Designed for **private, personal use** - Appropriate for local companion scenarios - `true`: NOVA includes AI disclosure text - Recommended for **shared or public use** - Adds transparency about AI nature ### When to Enable Disclosure ✅ **Enable `always_disclose: true` if:** - Sharing NOVA with others - Deploying publicly (e.g., website, app) - Any scenario where users might not know it's AI ❌ **Keep `always_disclose: false` if:** - Personal, private use on your own device - You're fully aware it's a language model - Testing/development **Default:** False (personal use assumption) --- ## Persona System Privacy ### Personality Matrix The personality matrix (warmth, humor, empathy, etc.) is: - **Stored:** In persona YAML files - **Processed:** Locally during generation - **Shared:** Never (unless you share the files) ### Custom Personas Your custom persona configurations: - **Location:** `configs/persona/` directory - **Format:** YAML (human-readable text) - **Privacy:** Stored locally, never transmitted --- ## Training Data Privacy ### Legal Data Only NOVA enforces **legal-only datasets**: - Public domain sources - Openly licensed datasets (CC0, CC-BY, MIT, Apache) - License tracking in `license_ledger.json` **No private data scraping.** ### Your Own Data If you train NOVA on your own data: - **Stays local:** Never leaves your device - **Your responsibility:** Ensure you have rights to use it - **Recommendation:** Don't train on sensitive/private data you don't want in the model --- ## Security Considerations ### Running NOVA Safely ✅ **Do:** - Run on a trusted device - Keep your OS and Python dependencies updated - Use filesystem encryption if device is shared - Review code before running (it's open source!) ⚠️ **Don't:** - Expose the REST API to the internet without authentication - Train on sensitive data you can't afford to leak - Share `memory.db` if it contains private conversations ### REST API Security If using the REST API (`nova chat serve`): - **Default:** Binds to `0.0.0.0:8000` (all interfaces) - **Recommendation:** Use `--host 127.0.0.1` for local-only - **Authentication:** Not included (add if exposing externally) - **HTTPS:** Not included (add if exposing externally) **For personal use:** Keep localhost-only. **For shared use:** Add authentication, HTTPS, rate limiting. --- ## Data Deletion ### Clear All Conversations ```bash # Delete conversation database rm memory.db # Or programmatically from nova_chat import ConversationMemory memory = ConversationMemory() memory.clear_all() ``` ### Remove Models ```bash # Delete checkpoints rm -rf checkpoints/ ``` ### Complete Reset ```bash # Remove all data rm -rf data/ checkpoints/ memory.db ``` --- ## Third-Party Dependencies NOVA uses standard open-source libraries: - **PyTorch:** ML framework - **SentencePiece:** Tokenization - **FastAPI/Uvicorn:** REST API (optional) - **SQLite:** Conversation storage **All are open source and widely audited.** ### Dependency Privacy - PyTorch: No telemetry (when installed normally) - SentencePiece: No telemetry - FastAPI: No telemetry - SQLite: Local database, no telemetry --- ## Comparison to Cloud LLMs | Feature | NOVA | Cloud LLMs | |---------|------|------------| | **Data Location** | Your device | Company servers | | **Privacy** | Complete | Varies by provider | | **Telemetry** | None | Usually tracked | | **Internet Required** | No (after setup) | Yes | | **Cost** | One-time (hardware) | Per-token/monthly | | **Customization** | Full control | Limited | | **Data Retention** | Your choice | Company policy | --- ## Transparency ### Open Source NOVA is **fully open source** under Apache 2.0: - **Source code:** Fully auditable - **No hidden functionality:** What you see is what you get - **Community review:** Anyone can inspect for privacy issues ### No Hidden Behavior NOVA does **not**: - Phone home - Send analytics - Track usage - Report errors to external services - Auto-update without your action --- ## Recommendations ### For Maximum Privacy 1. **Offline Mode:** Disable network after downloading dependencies 2. **Encrypt Storage:** Use full-disk encryption (BitLocker, FileVault, LUKS) 3. **Regular Cleanup:** Clear `memory.db` periodically if desired 4. **Review Code:** Inspect the source before running ### For Shared Devices 1. **Enable Disclosure:** Set `always_disclose: true` 2. **Separate Accounts:** Use OS user accounts to isolate data 3. **Clear Conversations:** Delete history after sessions ### For Development 1. **Test Data Only:** Don't use real sensitive data for testing 2. **Version Control:** Add `memory.db` and `checkpoints/` to `.gitignore` --- ## Contact for Privacy Concerns If you find privacy issues: - **GitHub Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues) - **Security:** Tag issues with `security` label --- ## Summary **NOVA is designed for local, private use.** ✅ No data collection ✅ No telemetry ✅ No cloud dependencies ✅ Complete user control ✅ Open source and auditable **Your data stays on your device.** --- **Last Updated:** 2025 **Document Version:** 1.0