Initial commit: NOVA - Neuro-Optimizing Versatile Agent

Complete transformer LLM built from scratch with:

Core Features:
- Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache)
- SentencePiece tokenizer (BPE/Unigram)
- Training pipeline (AMP, gradient checkpointing, DDP)
- Persona system with personality matrix (NO AI disclosure by default)
- Genetic evolution (NOVA-EVO) for hyperparameter optimization
- Legal-only data pipeline with license tracking
- Chat interface (CLI + REST API)
- Conversation memory (SQLite)

Model Sizes:
- 125M, 350M, 1.3B, 3B parameters
- Local-first, runs on CPU or GPU
- Python 3.10.6+, PyTorch 2.0+

Personas:
- girlfriend_gentle (high warmth, high empathy)
- girlfriend_playful (high humor, high playfulness)
- girlfriend_supportive (balanced, default)

Documentation:
- Complete README with quickstart
- Model card with ethical considerations
- Privacy documentation (local-first, zero telemetry)
- Data licenses and attribution
- Contributing guide

Infrastructure:
- GitHub Actions CI/CD
- Comprehensive test suite
- Quickstart script
- CLI tool

License: Apache 2.0

🤖 Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2025-10-12 20:56:37 -04:00
commit a7f091aa45
50 changed files with 6437 additions and 0 deletions

227
docs/CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,227 @@
# Contributing to NOVA
Thank you for your interest in contributing to NOVA! This document provides guidelines for contributing.
---
## How to Contribute
### Reporting Issues
**Bug Reports:**
1. Check existing issues first
2. Use the bug report template
3. Include:
- Python version
- OS and hardware
- Steps to reproduce
- Expected vs actual behavior
- Error messages/logs
**Feature Requests:**
1. Check if already proposed
2. Explain the use case
3. Describe the desired behavior
### Code Contributions
**Setup Development Environment:**
```bash
# Fork and clone
git clone https://github.com/yourusername/nova.git
cd nova
# Create venv
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dev dependencies
pip install -r requirements.txt
pip install -e .[dev]
```
**Before Submitting:**
1. **Run Tests:**
```bash
pytest tests/ -v
```
2. **Lint Code:**
```bash
ruff check .
black --check .
```
3. **Format Code:**
```bash
black nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/
```
4. **Type Check (optional but recommended):**
```bash
mypy nova_core/ --ignore-missing-imports
```
### Pull Request Process
1. **Branch Naming:**
- `feature/description` for new features
- `fix/description` for bug fixes
- `docs/description` for documentation
2. **Commit Messages:**
- Clear, descriptive messages
- Reference issues: `Fix #123: Description`
3. **PR Description:**
- What changed
- Why the change
- Testing performed
- Screenshots (if UI changes)
4. **Review Process:**
- CI must pass
- At least one approval required
- Address review feedback
---
## Development Guidelines
### Code Style
**Python:**
- Follow PEP 8
- Use Black formatter (line length 100)
- Type hints encouraged
- Docstrings for public APIs
**Example:**
```python
def example_function(param: str, optional: int = 0) -> bool:
"""
Brief description.
Args:
param: Description
optional: Description (default: 0)
Returns:
Description
"""
return True
```
### Testing
**Write Tests For:**
- New features
- Bug fixes
- Public APIs
**Test Locations:**
- `tests/test_core.py` - Core transformer
- `tests/test_tokenizer.py` - Tokenizer
- `tests/test_persona.py` - Persona system
- `tests/test_<module>.py` - Other modules
**Run Tests:**
```bash
# All tests
pytest
# Specific file
pytest tests/test_core.py
# With coverage
pytest --cov=nova_core
```
### Documentation
**Update Docs For:**
- API changes
- New features
- Configuration options
**Documentation Files:**
- `README.md` - Main documentation
- `docs/MODEL_CARD.md` - Model information
- `docs/PRIVACY_LOCAL.md` - Privacy details
- `docs/DATA_LICENSES.md` - Data licensing
---
## Contribution Areas
### High Priority
- **Pre-trained Models:** Training and releasing checkpoints
- **Export Tools:** GGUF converter, quantization improvements
- **Evaluation Suite:** Comprehensive benchmarks
- **Dataset Downloaders:** Legal dataset acquisition scripts
### Medium Priority
- **LoRA Support:** Fine-tuning with adapters
- **Multi-language:** Support for non-English
- **Performance:** Optimization improvements
- **Tests:** Increase coverage
### Documentation
- **Tutorials:** Step-by-step guides
- **Examples:** Real-world use cases
- **API Docs:** Complete API documentation
- **Architecture:** Deep-dive technical docs
---
## License
By contributing, you agree that your contributions will be licensed under Apache License 2.0.
---
## Code of Conduct
### Our Pledge
- Be respectful and inclusive
- Welcome newcomers
- Focus on constructive feedback
- Assume good intentions
### Unacceptable Behavior
- Harassment or discrimination
- Trolling or insulting comments
- Publishing others' private information
- Other unprofessional conduct
### Enforcement
Violations can be reported to project maintainers. All complaints will be reviewed and investigated.
---
## Questions?
- **Discussions:** GitHub Discussions
- **Issues:** GitHub Issues
- **General:** Open an issue with the "question" label
---
## Recognition
Contributors will be:
- Listed in CONTRIBUTORS.md
- Mentioned in release notes
- Credited for significant features
---
Thank you for contributing to NOVA! 🌟