Creating the project.

This commit is contained in:
2025-09-23 11:54:02 -04:00
commit 272172e87c
3 changed files with 367 additions and 0 deletions

65
README.md Normal file
View File

@@ -0,0 +1,65 @@
# ARIA — Zero-to-Tiny LLM (Python)
**ARIA** is a beginner-friendly, step-by-step course that takes you from **“Hello World”** to training a **tiny decoder-only, character-level LLM** in Python. Each lesson is a single, runnable file with clear docstrings, doctests where helpful, and minimal dependencies.
> **Note:** This repositorys instructional content was **generated with the assistance of an AI language model**.
---
## What youll build
- A progression of tiny language models:
- Count-based bigram model → NumPy softmax toy → PyTorch bigram NN
- Single-head self-attention → Mini Transformer block
- A tiny decoder-only model trained on a small corpus (e.g., Tiny Shakespeare)
---
## Who this is for
- Beginners who can run `python script.py` and have written a basic “Hello World”.
- Learners who want a **clear path** to an LLM without heavy math or large datasets.
---
## Course outline (lessons)
1. Read a Text File (with docstrings)
2. Character Frequency Counter
3. Train/Val Split
4. Char Vocabulary + Encode/Decode
5. Uniform Random Text Generator
6. Bigram Counts Language Model
7. Laplace Smoothing (compare w/ and w/o)
8. Temperature & Top-k Sampling
9. Perplexity on Validation
10. NumPy Softmax + Cross-Entropy (toy)
11. PyTorch Tensors 101
12. Autograd Mini-Lab (fit *y = 2x + 3*)
13. Char Bigram Neural LM (PyTorch)
14. Sampling Function (PyTorch)
15. Single-Head Self-Attention (causal mask)
16. Mini Transformer Block (pre-LN)
17. Tiny Decoder-Only Model (12 blocks)
18. *(Optional)* Save/Load & CLI Interface
Each lesson includes: **Outcome, Files, Dependencies, Directions, Starter Code with docstrings + doctests, Run, What you learned, Troubleshooting, Mini-exercises, Next lesson.**
---
## Requirements
- **Python**: 3.10+
- **OS**: Windows/macOS/Linux (UTF-8 locale recommended)
- **Dependencies**:
- Stdlib only until Lesson 9
- **NumPy** for Lessons 810
- **PyTorch** (CPU is fine) from Lesson 11 onward
- **Hardware**: CPU is enough for all lessons; tiny models, short runs
Install common deps (when needed):
```bash
pip install numpy torch --upgrade
```