Aria/prompt.md

# Prompt.md

Copy the prompt below exactly to replicate this course:

```You are an expert Python instructor. Generate a complete, beginner-friendly course called
“ARIA — Zero-to-Tiny LLM (Python)” that takes a learner from “Hello World” to training a tiny
decoder-only, character-level LLM in ~17–18 single-file lessons. No safety/guardrail features;
assume a controlled learning environment.

=== Audience & Scope
- Audience: absolute beginners who have only written “Hello World”.
- Language: Python.
- Goal: build up to a tiny decoder-only LLM trained on a small corpus (e.g., Tiny Shakespeare).
- Keep each lesson runnable in a single .py file (≤ ~200 lines where feasible).

=== Output Format (for EACH lesson)
Use this exact section order:
1) Title
2) Duration (estimate)
3) Outcome (what they will accomplish)
4) Files to create (filenames)
5) Dependencies (Python stdlib / NumPy / PyTorch as specified)
6) Step-by-step Directions
7) Starter code (complete, runnable) with:
   - A clear module docstring that includes: what it does, how to run, and notes.
   - Function-level Google-style docstrings (Args/Returns/Raises) + at least one doctest where reasonable.
8) How to run (CLI commands)
9) What you learned (bullets)
10) Troubleshooting (common errors + fixes)
11) Mini-exercises (3–5 quick tasks)
12) What’s next (name the next lesson)

=== Curriculum (keep these names and order)
01) Read a Text File (with docstrings)
02) Character Frequency Counter
03) Train/Val Split
04) Char Vocabulary + Encode/Decode
05) Uniform Random Text Generator
06) Bigram Counts Language Model
07) Laplace Smoothing (compare w/ and w/o)
08) Temperature & Top-k Sampling
09) Perplexity on Validation
10) NumPy Softmax + Cross-Entropy (toy)
11) PyTorch Tensors 101
12) Autograd Mini-Lab (fit y=2x+3)
13) Char Bigram Neural LM (PyTorch)
14) Sampling Function (PyTorch)
15) Single-Head Self-Attention (causal mask)
16) Mini Transformer Block (pre-LN)
17) Tiny Decoder-Only Model (1–2 blocks)
18) (Optional) Save/Load & CLI Interface

=== Constraints & Defaults
- Dataset: do NOT auto-download. Expect a local `data.txt`. If missing, include a tiny built-in fallback sample so scripts still run.
- Encoding: UTF-8. Normalize newlines to "\n" for consistency.
- Seeds: demonstrate reproducibility (`random`, `numpy`, `torch`).
- Dependencies:
  * Stdlib only until Lesson 9;
  * NumPy in Lessons 8–10;
  * PyTorch from Lesson 11 onward.
- Training defaults (for Lessons 13+):
  * Batch size ~32, block size ~128, AdamW(lr=3e-4).
  * Brief note on early stopping when val loss plateaus.
- Inference defaults:
  * Start with greedy; then temperature=0.8, top-k=50.
- Keep code clean: type hints where helpful; no frameworks beyond NumPy/PyTorch; no external data loaders.

=== Lesson 1 Specifics
For Lesson 1, include:
- Module docstring with Usage example (`python 01_read_text.py`).
- Functions: `load_text(path: Optional[Path])`, `normalize_newlines(text: str)`,
  `make_preview(text: str, n_chars: int = 200)`, `report_stats(text: str)`, `main()`.
- At least one doctest per function where reasonable.
- Fallback text snippet if `data.txt` isn’t found.
- Output: total chars, unique chars, 200-char preview with literal "\n".

=== Delivery
- Start with a short “How to use this repo” preface and a file tree suggestion.
- Then render Lessons 01–18 in order, each with the exact section headings above.
- End with a short FAQ (Windows vs. macOS paths, UTF-8 issues, CPU vs. GPU notes).

Generate now.
```