Complete transformer LLM built from scratch with: Core Features: - Full transformer architecture (RoPE, RMSNorm, SwiGLU, KV-cache) - SentencePiece tokenizer (BPE/Unigram) - Training pipeline (AMP, gradient checkpointing, DDP) - Persona system with personality matrix (NO AI disclosure by default) - Genetic evolution (NOVA-EVO) for hyperparameter optimization - Legal-only data pipeline with license tracking - Chat interface (CLI + REST API) - Conversation memory (SQLite) Model Sizes: - 125M, 350M, 1.3B, 3B parameters - Local-first, runs on CPU or GPU - Python 3.10.6+, PyTorch 2.0+ Personas: - girlfriend_gentle (high warmth, high empathy) - girlfriend_playful (high humor, high playfulness) - girlfriend_supportive (balanced, default) Documentation: - Complete README with quickstart - Model card with ethical considerations - Privacy documentation (local-first, zero telemetry) - Data licenses and attribution - Contributing guide Infrastructure: - GitHub Actions CI/CD - Comprehensive test suite - Quickstart script - CLI tool License: Apache 2.0 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <noreply@anthropic.com>
4.2 KiB
4.2 KiB
Contributing to NOVA
Thank you for your interest in contributing to NOVA! This document provides guidelines for contributing.
How to Contribute
Reporting Issues
Bug Reports:
- Check existing issues first
- Use the bug report template
- Include:
- Python version
- OS and hardware
- Steps to reproduce
- Expected vs actual behavior
- Error messages/logs
Feature Requests:
- Check if already proposed
- Explain the use case
- Describe the desired behavior
Code Contributions
Setup Development Environment:
# Fork and clone
git clone https://github.com/yourusername/nova.git
cd nova
# Create venv
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dev dependencies
pip install -r requirements.txt
pip install -e .[dev]
Before Submitting:
-
Run Tests:
pytest tests/ -v
-
Lint Code:
ruff check . black --check .
-
Format Code:
black nova_core/ nova_tokenizer/ nova_train/ nova_evo/ nova_chat/
-
Type Check (optional but recommended):
mypy nova_core/ --ignore-missing-imports
Pull Request Process
-
Branch Naming:
feature/description
for new featuresfix/description
for bug fixesdocs/description
for documentation
-
Commit Messages:
- Clear, descriptive messages
- Reference issues:
Fix #123: Description
-
PR Description:
- What changed
- Why the change
- Testing performed
- Screenshots (if UI changes)
-
Review Process:
- CI must pass
- At least one approval required
- Address review feedback
Development Guidelines
Code Style
Python:
- Follow PEP 8
- Use Black formatter (line length 100)
- Type hints encouraged
- Docstrings for public APIs
Example:
def example_function(param: str, optional: int = 0) -> bool:
"""
Brief description.
Args:
param: Description
optional: Description (default: 0)
Returns:
Description
"""
return True
Testing
Write Tests For:
- New features
- Bug fixes
- Public APIs
Test Locations:
tests/test_core.py
- Core transformertests/test_tokenizer.py
- Tokenizertests/test_persona.py
- Persona systemtests/test_<module>.py
- Other modules
Run Tests:
# All tests
pytest
# Specific file
pytest tests/test_core.py
# With coverage
pytest --cov=nova_core
Documentation
Update Docs For:
- API changes
- New features
- Configuration options
Documentation Files:
README.md
- Main documentationdocs/MODEL_CARD.md
- Model informationdocs/PRIVACY_LOCAL.md
- Privacy detailsdocs/DATA_LICENSES.md
- Data licensing
Contribution Areas
High Priority
- Pre-trained Models: Training and releasing checkpoints
- Export Tools: GGUF converter, quantization improvements
- Evaluation Suite: Comprehensive benchmarks
- Dataset Downloaders: Legal dataset acquisition scripts
Medium Priority
- LoRA Support: Fine-tuning with adapters
- Multi-language: Support for non-English
- Performance: Optimization improvements
- Tests: Increase coverage
Documentation
- Tutorials: Step-by-step guides
- Examples: Real-world use cases
- API Docs: Complete API documentation
- Architecture: Deep-dive technical docs
License
By contributing, you agree that your contributions will be licensed under Apache License 2.0.
Code of Conduct
Our Pledge
- Be respectful and inclusive
- Welcome newcomers
- Focus on constructive feedback
- Assume good intentions
Unacceptable Behavior
- Harassment or discrimination
- Trolling or insulting comments
- Publishing others' private information
- Other unprofessional conduct
Enforcement
Violations can be reported to project maintainers. All complaints will be reviewed and investigated.
Questions?
- Discussions: GitHub Discussions
- Issues: GitHub Issues
- General: Open an issue with the "question" label
Recognition
Contributors will be:
- Listed in CONTRIBUTORS.md
- Mentioned in release notes
- Credited for significant features
Thank you for contributing to NOVA! 🌟