Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions

227
docs/CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,227 @@
# Contributing to NOVA
Thank you for your interest in contributing to NOVA! This document provides guidelines for contributing.
---
## How to Contribute
### Reporting Issues
**Bug Reports:**
1. Check existing issues first
2. Use the bug report template
3. Include:
- Python version
- OS and hardware
- Steps to reproduce
- Expected vs actual behavior
- Error messages/logs
**Feature Requests:**
1. Check if already proposed
2. Explain the use case
3. Describe the desired behavior
### Code Contributions
**Setup Development Environment:**
```bash
# Fork and clone
git clone https://github.com/yourusername/nova.git
cd nova
# Create venv
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dev dependencies
pip install -r requirements.txt
pip install -e .[dev]
```
**Before Submitting:**
1. **Run Tests:**
```bash
pytest tests/ -v
```
2. **Lint Code:**
```bash
ruff check .
black --check .
```
3. **Format Code:**
```bash
black nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/
```
4. **Type Check (optional but recommended):**
```bash
mypy nova_core/ --ignore-missing-imports
```
### Pull Request Process
1. **Branch Naming:**
- `feature/description` for new features
- `fix/description` for bug fixes
- `docs/description` for documentation
2. **Commit Messages:**
- Clear, descriptive messages
- Reference issues: `Fix #123: Description`
3. **PR Description:**
- What changed
- Why the change
- Testing performed
- Screenshots (if UI changes)
4. **Review Process:**
- CI must pass
- At least one approval required
- Address review feedback
---
## Development Guidelines
### Code Style
**Python:**
- Follow PEP 8
- Use Black formatter (line length 100)
- Type hints encouraged
- Docstrings for public APIs
**Example:**
```python
def example_function(param: str, optional: int = 0) -> bool:
"""
Brief description.
Args:
param: Description
optional: Description (default: 0)
Returns:
Description
"""
return True
```
### Testing
**Write Tests For:**
- New features
- Bug fixes
- Public APIs
**Test Locations:**
- `tests/test_core.py` - Core transformer
- `tests/test_tokenizer.py` - Tokenizer
- `tests/test_persona.py` - Persona system
- `tests/test_<module>.py` - Other modules
**Run Tests:**
```bash
# All tests
pytest
# Specific file
pytest tests/test_core.py
# With coverage
pytest --cov=nova_core
```
### Documentation
**Update Docs For:**
- API changes
- New features
- Configuration options
**Documentation Files:**
- `README.md` - Main documentation
- `docs/MODEL_CARD.md` - Model information
- `docs/PRIVACY_LOCAL.md` - Privacy details
- `docs/DATA_LICENSES.md` - Data licensing
---
## Contribution Areas
### High Priority
- **Pre-trained Models:** Training and releasing checkpoints
- **Export Tools:** GGUF converter, quantization improvements
- **Evaluation Suite:** Comprehensive benchmarks
- **Dataset Downloaders:** Legal dataset acquisition scripts
### Medium Priority
- **LoRA Support:** Fine-tuning with adapters
- **Multi-language:** Support for non-English
- **Performance:** Optimization improvements
- **Tests:** Increase coverage
### Documentation
- **Tutorials:** Step-by-step guides
- **Examples:** Real-world use cases
- **API Docs:** Complete API documentation
- **Architecture:** Deep-dive technical docs
---
## License
By contributing, you agree that your contributions will be licensed under Apache License 2.0.
---
## Code of Conduct
### Our Pledge
- Be respectful and inclusive
- Welcome newcomers
- Focus on constructive feedback
- Assume good intentions
### Unacceptable Behavior
- Harassment or discrimination
- Trolling or insulting comments
- Publishing others' private information
- Other unprofessional conduct
### Enforcement
Violations can be reported to project maintainers. All complaints will be reviewed and investigated.
---
## Questions?
- **Discussions:** GitHub Discussions
- **Issues:** GitHub Issues
- **General:** Open an issue with the "question" label
---
## Recognition
Contributors will be:
- Listed in CONTRIBUTORS.md
- Mentioned in release notes
- Credited for significant features
---
Thank you for contributing to NOVA! 🌟

315
docs/DATA_LICENSES.md Normal file
View File

@@ -0,0 +1,315 @@
# Data Licenses and Attribution
NOVA is committed to using **only legally licensed datasets** for training. This document tracks all approved data sources and their licenses.
---
## License Philosophy
### What We Use
**Public Domain:** No restrictions
**CC0:** Public domain dedication
**CC-BY:** Attribution required
**MIT/Apache/BSD:** Permissive open source
### What We DON'T Use
**All Rights Reserved:** Copyrighted without permission
**CC-BY-NC:** Non-commercial restrictions
**CC-BY-ND:** No derivatives restrictions
**Unknown/Unlicensed:** No verified license
**Scraped Web Data:** Without license verification
---
## Approved Dataset Sources
### 1. Wikipedia (English)
**License:** CC-BY-SA 3.0
**URL:** https://dumps.wikimedia.org/
**Size:** ~20 GB (compressed)
**Language:** English
**Description:** English Wikipedia articles
**Attribution:**
> Wikipedia contributors. English Wikipedia. Wikimedia Foundation. Licensed under CC-BY-SA 3.0.
**Usage:** Text data for general knowledge
---
### 2. Project Gutenberg
**License:** Public Domain
**URL:** https://www.gutenberg.org/
**Size:** ~15 GB
**Language:** Primarily English
**Description:** Public domain books (pre-1928 in US)
**Attribution:**
> Project Gutenberg. Public domain literary works.
**Usage:** Literary text, historical documents
---
### 3. OpenWebText
**License:** CC0 1.0 (Public Domain Dedication)
**URL:** https://huggingface.co/datasets/Skylion007/openwebtext
**Size:** ~38 GB
**Language:** English
**Description:** Open reproduction of WebText (Reddit links)
**Attribution:**
> OpenWebText dataset by Aaron Gokaslan and Vanya Cohen. CC0 1.0 Universal.
**Usage:** Web-scraped text (Reddit-filtered)
---
### 4. C4 (Colossal Clean Crawled Corpus)
**License:** ODC-BY (Open Data Commons Attribution)
**URL:** https://huggingface.co/datasets/c4
**Size:** ~300 GB (en subset)
**Language:** English
**Description:** Cleaned Common Crawl data
**Attribution:**
> C4 dataset from Google's T5 paper. ODC-BY license.
**Usage:** Large-scale web text
---
### 5. The Pile - ArXiv Subset
**License:** Various (mostly permissive for ArXiv subset)
**URL:** https://pile.eleuther.ai/
**Size:** ~60 GB (ArXiv subset)
**Language:** English
**Description:** ArXiv papers (scientific articles)
**Attribution:**
> The Pile by EleutherAI. ArXiv papers subset.
**Usage:** Scientific and technical text
**Note:** Only use subsets with verified permissive licenses
---
## License Tracking System
### Ledger File
All downloaded datasets tracked in:
```
data/processed/license_ledger.json
```
**Format:**
```json
{
"sources": [
{
"name": "wikipedia-en",
"license": "cc-by-sa-3.0",
"url": "https://dumps.wikimedia.org/enwiki/",
"download_date": "2025-01-15",
"size_gb": 20.5,
"attribution": "Wikipedia contributors..."
}
]
}
```
### Verification
Before training, verify licenses:
```bash
python -m nova_data.pipeline verify_licenses
```
This checks that all data sources have approved licenses.
---
## Attribution Requirements
### CC-BY Datasets
**Required:**
- Attribute the original creator
- Include license name
- Link to license
- Indicate if changes were made
**Our Attribution:**
All NOVA models trained on CC-BY data include:
> This model was trained on data including:
> - Wikipedia (CC-BY-SA 3.0)
> - [Other CC-BY sources]
>
> Full attributions in DATA_LICENSES.md
### Public Domain
**Required:** None (but we attribute anyway for transparency)
---
## Custom Datasets
### User-Provided Data
If training NOVA on your own data:
**Your Responsibility:**
- Ensure you have rights to use the data
- Verify any license requirements
- Add custom sources to ledger
**Example:**
```yaml
# configs/data/custom.yaml
sources:
- name: my-custom-dataset
license: mit # or your license
path: /path/to/data
description: My custom training data
```
---
## Commercial Use Considerations
### NOVA Code
**License:** Apache 2.0
**Commercial Use:** ✅ Allowed
### Training Data
Depends on dataset:
| Dataset | Commercial Use |
|---------|---------------|
| Wikipedia | ✅ Allowed (with attribution) |
| Project Gutenberg | ✅ Allowed (public domain) |
| OpenWebText | ✅ Allowed (CC0) |
| C4 | ✅ Allowed (ODC-BY, with attribution) |
| The Pile (ArXiv) | ⚠️ Verify per-subset |
**Recommendation:** Review each dataset's license for commercial projects.
---
## Excluded Sources
### Why We Don't Use Certain Data
**Common Crawl (raw):**
- Contains copyrighted material
- License status unclear for many pages
- We use filtered versions (C4) instead
**Social Media (Twitter, etc.):**
- Terms of Service restrictions
- Privacy concerns
- Unclear licensing
**Books3/LibGen:**
- Contains copyrighted books
- Legal issues
- Not permissively licensed
**YouTube Subtitles:**
- Copyright unclear
- TOS restrictions
---
## Compliance Checklist
Before training NOVA:
- [ ] All data sources listed in `license_ledger.json`
- [ ] Each source has verified license
- [ ] Licenses are permissive (CC-BY, MIT, Apache, public domain, etc.)
- [ ] Attribution prepared for CC-BY sources
- [ ] No excluded sources used
---
## Future Datasets
### Planned Additions
We're evaluating these sources:
- **BookCorpus:** Open domain books (pending license review)
- **Stack Exchange:** CC-BY-SA (with attribution)
- **OpenSubtitles:** Public domain/permissive subset
- **Code datasets:** GitHub permissive licenses (MIT, Apache, BSD)
**Criteria:**
- Clear, permissive license
- High quality
- Legally distributable
---
## Dataset Removal Requests
If you believe we've incorrectly listed a dataset:
1. Open an issue: [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
2. Include:
- Dataset name
- License concern
- Supporting documentation
3. We'll review and respond within 7 days
---
## Legal Disclaimer
**This project aims for legal compliance, but:**
- We're not lawyers
- License interpretation may vary by jurisdiction
- Users are responsible for their own compliance
- Consult legal counsel for commercial use
**NOVA project provides this information for transparency, but makes no warranties about legal compliance.**
---
## References
### License Texts
- **CC-BY 4.0:** https://creativecommons.org/licenses/by/4.0/
- **CC0 1.0:** https://creativecommons.org/publicdomain/zero/1.0/
- **Apache 2.0:** https://www.apache.org/licenses/LICENSE-2.0
- **MIT:** https://opensource.org/licenses/MIT
- **ODC-BY:** https://opendatacommons.org/licenses/by/
### Resources
- Creative Commons: https://creativecommons.org/
- Open Data Commons: https://opendatacommons.org/
- OSI Licenses: https://opensource.org/licenses
---
**Last Updated:** 2025
**Document Version:** 1.0
**Review Frequency:** Quarterly

232
docs/MODEL_CARD.md Normal file
View File

@@ -0,0 +1,232 @@
# NOVA Model Card
## Model Details
**Name:** NOVA (Neuro-Optimizing Versatile Agent)
**Version:** 0.1.0
**Date:** 2025
**License:** Apache 2.0
**Type:** Decoder-only transformer language model
### Model Sizes
NOVA comes in four sizes:
| Size | Parameters | Layers | Hidden Size | Attention Heads | Context Length |
|------|-----------|--------|-------------|-----------------|----------------|
| 125M | 125M | 12 | 768 | 12 | 2048 |
| 350M | 350M | 24 | 1024 | 16 | 2048 |
| 1.3B | 1.3B | 24 | 2048 | 32 (8 KV) | 2048 |
| 3B | 3B | 32 | 2560 | 32 (8 KV) | 4096 |
### Architecture
- **Positional Encoding:** RoPE (Rotary Position Embedding)
- **Normalization:** RMSNorm (default) or LayerNorm
- **Activation:** SwiGLU (default), GeGLU, or GELU
- **Attention:** Multi-head with optional grouped-query attention (GQA)
- **Features:** KV-cache, gradient checkpointing, Flash Attention support
## Intended Use
### Primary Use Cases
- **Personal companion AI:** Conversational agent with customizable personas
- **Local inference:** Privacy-focused applications on consumer hardware
- **Research:** Transformer architecture experimentation
- **Education:** Learning about modern LLM implementation
### Out of Scope
- **Production deployment without safety measures:** Additional content filtering recommended
- **High-stakes decisions:** Not suitable for medical, legal, or financial advice
- **Scalable services:** Designed for local/personal use, not cloud deployment
## Training Data
NOVA uses **only legally licensed datasets**:
### Approved Sources
- **Public Domain:** Project Gutenberg books
- **CC0/CC-BY:** Wikipedia, OpenWebText, C4 corpus
- **Open Licensed:** The Pile (ArXiv), OSI-approved code datasets
### License Tracking
All training data sources logged in `license_ledger.json` with:
- Source name and URL
- License type
- Download date
- Data provenance
### Exclusions
- No scraped data without verified licenses
- No copyrighted material
- No personally identifiable information (PII)
- No user data without explicit consent
## Training Procedure
### Hyperparameters
Default training configuration (125M):
```yaml
batch_size: 8
gradient_accumulation: 4
learning_rate: 3e-4
weight_decay: 0.1
warmup_steps: 1000
max_steps: 100000
optimizer: AdamW
lr_schedule: cosine with warmup
```
### Hardware
- **Minimum:** CPU (4+ cores), 8GB RAM
- **Recommended:** NVIDIA GPU (8GB+ VRAM), 16GB+ RAM
- **Optimal:** NVIDIA GPU (24GB+ VRAM), 32GB+ RAM
### Optimizations
- **Mixed Precision:** AMP (Automatic Mixed Precision) on GPU
- **Gradient Checkpointing:** Reduces memory usage
- **Distributed Training:** DDP (DistributedDataParallel) support
## Evaluation
### Metrics
- **Perplexity:** Language modeling quality
- **Latency:** Inference speed (tokens/second)
- **Memory:** Peak RAM/VRAM usage
- **Persona Adherence:** Style consistency with selected persona
### Benchmarks
(To be added as pre-trained models become available)
## Persona System
### Design Philosophy
NOVA includes a **personality matrix** system for controllable conversational style:
- **No AI Disclosure by Default:** `always_disclose: false`
- **Private Use Context:** Designed for personal, local deployment
- **Customizable:** Users can create custom personas
### Personality Traits
Eight traits (0.0-1.0) that modulate generation:
1. Warmth
2. Humor
3. Empathy
4. Decisiveness
5. Creativity
6. Intimacy
7. Playfulness
8. Formality
### Default Personas
- **girlfriend_gentle:** High warmth, high empathy
- **girlfriend_playful:** High humor, high playfulness
- **girlfriend_supportive:** Balanced traits (default)
## Ethical Considerations
### Privacy
- **Local-First:** All processing on-device
- **No Telemetry:** Zero data collection
- **User Control:** Complete control over data and models
### Bias and Fairness
- **Training Data Bias:** Inherits biases from source datasets
- **Mitigation:** Use diverse, openly licensed sources
- **Ongoing Work:** Bias evaluation and mitigation strategies
### Content Safety
- **Basic Filters:** Profanity and unsafe content detection
- **Limitations:** Not a complete safety solution
- **Recommendation:** Additional filtering for public-facing use
### AI Disclosure
- **Configurable:** `always_disclose` setting in persona config
- **Default:** False (for private, personal use)
- **Recommendation:** Enable for any public or shared deployment
## Limitations
### Technical
- **Small Context:** 2048-4096 tokens (not suitable for long documents)
- **Compute:** Smaller models may have lower quality than larger LLMs
- **Hallucination:** May generate factually incorrect information
### Use Case
- **Not a knowledge base:** May not have up-to-date information
- **Not a specialist:** General-purpose, not domain-specific
- **Not production-ready (as-is):** Requires additional safety/filtering
## Evolutionary Algorithm (NOVA-EVO)
### Purpose
Optional genetic algorithm for automatic configuration optimization:
- **Hyperparameter Search:** Learning rate, batch size, warmup
- **Architecture Search:** Activation, normalization, positional encoding
- **Multi-Objective:** Optimizes loss, latency, memory simultaneously
### Fitness Metrics
- **Loss/Perplexity:** (50% weight)
- **Latency:** (20% weight)
- **Memory:** (20% weight)
- **Quality:** (10% weight)
### Compute Budget
- **Small:** 20 individuals, 10 generations (~6-12 hours)
- **Medium:** 40 individuals, 20 generations (~24-48 hours)
- **Large:** 100 individuals, 50 generations (~1-2 weeks)
## Contact
For questions, issues, or contributions:
- **GitHub:** [github.com/yourusername/nova](https://github.com/yourusername/nova)
- **Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
## Citation
```bibtex
@software{nova2025,
title={NOVA: Neuro-Optimizing Versatile Agent},
author={NOVA Project Contributors},
year={2025},
url={https://github.com/yourusername/nova},
license={Apache-2.0}
}
```
## Acknowledgments
- Transformer architecture inspired by GPT, LLaMA, and modern LLM research
- RoPE, RMSNorm, SwiGLU from recent papers (Su et al., Zhang et al., Shazeer et al.)
- Open source community for datasets and tools
---
**Last Updated:** 2025
**Model Card Version:** 1.0

330
docs/PRIVACY_LOCAL.md Normal file
View File

@@ -0,0 +1,330 @@
# Privacy and Local Use
## NOVA Privacy Statement
NOVA is designed as a **local-first, privacy-focused** language model. This document explains how NOVA handles your data.
---
## Core Principles
### 1. Local-First
**Everything runs on your device.**
- Model inference happens locally
- Training data stays on your machine
- No cloud dependencies
- No internet required (except for dataset downloads)
### 2. Zero Telemetry
**NOVA collects zero data.**
- No usage tracking
- No error reporting
- No analytics
- No phone-home functionality
### 3. Complete User Control
**You own everything.**
- Your conversations
- Your trained models
- Your custom personas
- Your data
---
## Data Storage
### Where Your Data Lives
```
C:\Development\Nova\
├── memory.db # Your conversation history (SQLite)
├── checkpoints/ # Your trained models
├── data/ # Your training data
└── configs/persona/ # Your custom personas
```
**All on your device. Never uploaded.**
### Conversation Memory
- **Location:** `memory.db` (SQLite database)
- **Contents:** Your chat history
- **Encryption:** Not encrypted by default (it's local)
- **Deletion:** Delete `memory.db` file to erase all history
- **Recommendation:** Encrypt your drive if sharing the device
### Model Checkpoints
- **Location:** `checkpoints/` directory
- **Contents:** Model weights and training state
- **Sharing:** Safe to share (contains no personal data)
---
## Network Activity
### When NOVA Uses the Internet
NOVA **only** uses the internet for:
1. **Dataset Downloads:** Downloading legal training datasets (opt-in)
2. **Optional:** Downloading pre-trained weights (if available)
### When NOVA Does NOT Use Internet
- **Chat inference:** 100% offline
- **Model training:** 100% offline
- **Persona customization:** 100% offline
- **Evolution (NOVA-EVO):** 100% offline
### Firewall Safety
NOVA is safe to run behind a firewall with no internet access (after initial setup).
---
## AI Disclosure Setting
### `always_disclose` Flag
NOVA personas have an `always_disclose` setting:
```yaml
always_disclose: false # Default
```
**What this means:**
- `false` (default): NOVA does NOT disclose being AI
- Designed for **private, personal use**
- Appropriate for local companion scenarios
- `true`: NOVA includes AI disclosure text
- Recommended for **shared or public use**
- Adds transparency about AI nature
### When to Enable Disclosure
**Enable `always_disclose: true` if:**
- Sharing NOVA with others
- Deploying publicly (e.g., website, app)
- Any scenario where users might not know it's AI
**Keep `always_disclose: false` if:**
- Personal, private use on your own device
- You're fully aware it's a language model
- Testing/development
**Default:** False (personal use assumption)
---
## Persona System Privacy
### Personality Matrix
The personality matrix (warmth, humor, empathy, etc.) is:
- **Stored:** In persona YAML files
- **Processed:** Locally during generation
- **Shared:** Never (unless you share the files)
### Custom Personas
Your custom persona configurations:
- **Location:** `configs/persona/` directory
- **Format:** YAML (human-readable text)
- **Privacy:** Stored locally, never transmitted
---
## Training Data Privacy
### Legal Data Only
NOVA enforces **legal-only datasets**:
- Public domain sources
- Openly licensed datasets (CC0, CC-BY, MIT, Apache)
- License tracking in `license_ledger.json`
**No private data scraping.**
### Your Own Data
If you train NOVA on your own data:
- **Stays local:** Never leaves your device
- **Your responsibility:** Ensure you have rights to use it
- **Recommendation:** Don't train on sensitive/private data you don't want in the model
---
## Security Considerations
### Running NOVA Safely
**Do:**
- Run on a trusted device
- Keep your OS and Python dependencies updated
- Use filesystem encryption if device is shared
- Review code before running (it's open source!)
⚠️ **Don't:**
- Expose the REST API to the internet without authentication
- Train on sensitive data you can't afford to leak
- Share `memory.db` if it contains private conversations
### REST API Security
If using the REST API (`nova chat serve`):
- **Default:** Binds to `0.0.0.0:8000` (all interfaces)
- **Recommendation:** Use `--host 127.0.0.1` for local-only
- **Authentication:** Not included (add if exposing externally)
- **HTTPS:** Not included (add if exposing externally)
**For personal use:** Keep localhost-only.
**For shared use:** Add authentication, HTTPS, rate limiting.
---
## Data Deletion
### Clear All Conversations
```bash
# Delete conversation database
rm memory.db
# Or programmatically
from nova_chat import ConversationMemory
memory = ConversationMemory()
memory.clear_all()
```
### Remove Models
```bash
# Delete checkpoints
rm -rf checkpoints/
```
### Complete Reset
```bash
# Remove all data
rm -rf data/ checkpoints/ memory.db
```
---
## Third-Party Dependencies
NOVA uses standard open-source libraries:
- **PyTorch:** ML framework
- **SentencePiece:** Tokenization
- **FastAPI/Uvicorn:** REST API (optional)
- **SQLite:** Conversation storage
**All are open source and widely audited.**
### Dependency Privacy
- PyTorch: No telemetry (when installed normally)
- SentencePiece: No telemetry
- FastAPI: No telemetry
- SQLite: Local database, no telemetry
---
## Comparison to Cloud LLMs
| Feature | NOVA | Cloud LLMs |
|---------|------|------------|
| **Data Location** | Your device | Company servers |
| **Privacy** | Complete | Varies by provider |
| **Telemetry** | None | Usually tracked |
| **Internet Required** | No (after setup) | Yes |
| **Cost** | One-time (hardware) | Per-token/monthly |
| **Customization** | Full control | Limited |
| **Data Retention** | Your choice | Company policy |
---
## Transparency
### Open Source
NOVA is **fully open source** under Apache 2.0:
- **Source code:** Fully auditable
- **No hidden functionality:** What you see is what you get
- **Community review:** Anyone can inspect for privacy issues
### No Hidden Behavior
NOVA does **not**:
- Phone home
- Send analytics
- Track usage
- Report errors to external services
- Auto-update without your action
---
## Recommendations
### For Maximum Privacy
1. **Offline Mode:** Disable network after downloading dependencies
2. **Encrypt Storage:** Use full-disk encryption (BitLocker, FileVault, LUKS)
3. **Regular Cleanup:** Clear `memory.db` periodically if desired
4. **Review Code:** Inspect the source before running
### For Shared Devices
1. **Enable Disclosure:** Set `always_disclose: true`
2. **Separate Accounts:** Use OS user accounts to isolate data
3. **Clear Conversations:** Delete history after sessions
### For Development
1. **Test Data Only:** Don't use real sensitive data for testing
2. **Version Control:** Add `memory.db` and `checkpoints/` to `.gitignore`
---
## Contact for Privacy Concerns
If you find privacy issues:
- **GitHub Issues:** [github.com/yourusername/nova/issues](https://github.com/yourusername/nova/issues)
- **Security:** Tag issues with `security` label
---
## Summary
**NOVA is designed for local, private use.**
✅ No data collection
✅ No telemetry
✅ No cloud dependencies
✅ Complete user control
✅ Open source and auditable
**Your data stays on your device.**
---
**Last Updated:** 2025
**Document Version:** 1.0