Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -0,0 +1,227 @@
+# Contributing to NOVA
+
+Thank you for your interest in contributing to NOVA! This document provides guidelines for contributing.
+
+---
+
+## How to Contribute
+
+### Reporting Issues
+
+**Bug Reports:**
+1. Check existing issues first
+2. Use the bug report template
+3. Include:
+   - Python version
+   - OS and hardware
+   - Steps to reproduce
+   - Expected vs actual behavior
+   - Error messages/logs
+
+**Feature Requests:**
+1. Check if already proposed
+2. Explain the use case
+3. Describe the desired behavior
+
+### Code Contributions
+
+**Setup Development Environment:**
+
+```bash
+# Fork and clone
+git clone https://github.com/yourusername/nova.git
+cd nova
+
+# Create venv
+python -m venv venv
+source venv/bin/activate  # Windows: venv\Scripts\activate
+
+# Install dev dependencies
+pip install -r requirements.txt
+pip install -e .[dev]
+```
+
+**Before Submitting:**
+
+1. **Run Tests:**
+   ```bash
+   pytest tests/ -v
+   ```
+
+2. **Lint Code:**
+   ```bash
+   ruff check .
+   black --check .
+   ```
+
+3. **Format Code:**
+   ```bash
+   black nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/
+   ```
+
+4. **Type Check (optional but recommended):**
+   ```bash
+   mypy nova_core/ --ignore-missing-imports
+   ```
+
+### Pull Request Process
+
+1. **Branch Naming:**
+   - `feature/description` for new features
+   - `fix/description` for bug fixes
+   - `docs/description` for documentation
+
+2. **Commit Messages:**
+   - Clear, descriptive messages
+   - Reference issues: `Fix #123: Description`
+
+3. **PR Description:**
+   - What changed
+   - Why the change
+   - Testing performed
+   - Screenshots (if UI changes)
+
+4. **Review Process:**
+   - CI must pass
+   - At least one approval required
+   - Address review feedback
+
+---
+
+## Development Guidelines
+
+### Code Style
+
+**Python:**
+- Follow PEP 8
+- Use Black formatter (line length 100)
+- Type hints encouraged
+- Docstrings for public APIs
+
+**Example:**
+```python
+def example_function(param: str, optional: int = 0) -> bool:
+    """
+    Brief description.
+
+    Args:
+        param: Description
+        optional: Description (default: 0)
+
+    Returns:
+        Description
+    """
+    return True
+```
+
+### Testing
+
+**Write Tests For:**
+- New features
+- Bug fixes
+- Public APIs
+
+**Test Locations:**
+- `tests/test_core.py` - Core transformer
+- `tests/test_tokenizer.py` - Tokenizer
+- `tests/test_persona.py` - Persona system
+- `tests/test_<module>.py` - Other modules
+
+**Run Tests:**
+```bash
+# All tests
+pytest
+
+# Specific file
+pytest tests/test_core.py
+
+# With coverage
+pytest --cov=nova_core
+```
+
+### Documentation
+
+**Update Docs For:**
+- API changes
+- New features
+- Configuration options
+
+**Documentation Files:**
+- `README.md` - Main documentation
+- `docs/MODEL_CARD.md` - Model information
+- `docs/PRIVACY_LOCAL.md` - Privacy details
+- `docs/DATA_LICENSES.md` - Data licensing
+
+---
+
+## Contribution Areas
+
+### High Priority
+
+- **Pre-trained Models:** Training and releasing checkpoints
+- **Export Tools:** GGUF converter, quantization improvements
+- **Evaluation Suite:** Comprehensive benchmarks
+- **Dataset Downloaders:** Legal dataset acquisition scripts
+
+### Medium Priority
+
+- **LoRA Support:** Fine-tuning with adapters
+- **Multi-language:** Support for non-English
+- **Performance:** Optimization improvements
+- **Tests:** Increase coverage
+
+### Documentation
+
+- **Tutorials:** Step-by-step guides
+- **Examples:** Real-world use cases
+- **API Docs:** Complete API documentation
+- **Architecture:** Deep-dive technical docs
+
+---
+
+## License
+
+By contributing, you agree that your contributions will be licensed under Apache License 2.0.
+
+---
+
+## Code of Conduct
+
+### Our Pledge
+
+- Be respectful and inclusive
+- Welcome newcomers
+- Focus on constructive feedback
+- Assume good intentions
+
+### Unacceptable Behavior
+
+- Harassment or discrimination
+- Trolling or insulting comments
+- Publishing others' private information
+- Other unprofessional conduct
+
+### Enforcement
+
+Violations can be reported to project maintainers. All complaints will be reviewed and investigated.
+
+---
+
+## Questions?
+
+- **Discussions:** GitHub Discussions
+- **Issues:** GitHub Issues
+- **General:** Open an issue with the "question" label
+
+---
+
+## Recognition
+
+Contributors will be:
+- Listed in CONTRIBUTORS.md
+- Mentioned in release notes
+- Credited for significant features
+
+---
+
+Thank you for contributing to NOVA! 🌟