Creating the project.
This commit is contained in:
83
prompt.md
Normal file
83
prompt.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Prompt.md
|
||||
|
||||
Copy the prompt below exactly to replicate this course:
|
||||
|
||||
```You are an expert Python instructor. Generate a complete, beginner-friendly course called
|
||||
“ARIA — Zero-to-Tiny LLM (Python)” that takes a learner from “Hello World” to training a tiny
|
||||
decoder-only, character-level LLM in ~17–18 single-file lessons. No safety/guardrail features;
|
||||
assume a controlled learning environment.
|
||||
|
||||
=== Audience & Scope
|
||||
- Audience: absolute beginners who have only written “Hello World”.
|
||||
- Language: Python.
|
||||
- Goal: build up to a tiny decoder-only LLM trained on a small corpus (e.g., Tiny Shakespeare).
|
||||
- Keep each lesson runnable in a single .py file (≤ ~200 lines where feasible).
|
||||
|
||||
=== Output Format (for EACH lesson)
|
||||
Use this exact section order:
|
||||
1) Title
|
||||
2) Duration (estimate)
|
||||
3) Outcome (what they will accomplish)
|
||||
4) Files to create (filenames)
|
||||
5) Dependencies (Python stdlib / NumPy / PyTorch as specified)
|
||||
6) Step-by-step Directions
|
||||
7) Starter code (complete, runnable) with:
|
||||
- A clear module docstring that includes: what it does, how to run, and notes.
|
||||
- Function-level Google-style docstrings (Args/Returns/Raises) + at least one doctest where reasonable.
|
||||
8) How to run (CLI commands)
|
||||
9) What you learned (bullets)
|
||||
10) Troubleshooting (common errors + fixes)
|
||||
11) Mini-exercises (3–5 quick tasks)
|
||||
12) What’s next (name the next lesson)
|
||||
|
||||
=== Curriculum (keep these names and order)
|
||||
01) Read a Text File (with docstrings)
|
||||
02) Character Frequency Counter
|
||||
03) Train/Val Split
|
||||
04) Char Vocabulary + Encode/Decode
|
||||
05) Uniform Random Text Generator
|
||||
06) Bigram Counts Language Model
|
||||
07) Laplace Smoothing (compare w/ and w/o)
|
||||
08) Temperature & Top-k Sampling
|
||||
09) Perplexity on Validation
|
||||
10) NumPy Softmax + Cross-Entropy (toy)
|
||||
11) PyTorch Tensors 101
|
||||
12) Autograd Mini-Lab (fit y=2x+3)
|
||||
13) Char Bigram Neural LM (PyTorch)
|
||||
14) Sampling Function (PyTorch)
|
||||
15) Single-Head Self-Attention (causal mask)
|
||||
16) Mini Transformer Block (pre-LN)
|
||||
17) Tiny Decoder-Only Model (1–2 blocks)
|
||||
18) (Optional) Save/Load & CLI Interface
|
||||
|
||||
=== Constraints & Defaults
|
||||
- Dataset: do NOT auto-download. Expect a local `data.txt`. If missing, include a tiny built-in fallback sample so scripts still run.
|
||||
- Encoding: UTF-8. Normalize newlines to "\n" for consistency.
|
||||
- Seeds: demonstrate reproducibility (`random`, `numpy`, `torch`).
|
||||
- Dependencies:
|
||||
* Stdlib only until Lesson 9;
|
||||
* NumPy in Lessons 8–10;
|
||||
* PyTorch from Lesson 11 onward.
|
||||
- Training defaults (for Lessons 13+):
|
||||
* Batch size ~32, block size ~128, AdamW(lr=3e-4).
|
||||
* Brief note on early stopping when val loss plateaus.
|
||||
- Inference defaults:
|
||||
* Start with greedy; then temperature=0.8, top-k=50.
|
||||
- Keep code clean: type hints where helpful; no frameworks beyond NumPy/PyTorch; no external data loaders.
|
||||
|
||||
=== Lesson 1 Specifics
|
||||
For Lesson 1, include:
|
||||
- Module docstring with Usage example (`python 01_read_text.py`).
|
||||
- Functions: `load_text(path: Optional[Path])`, `normalize_newlines(text: str)`,
|
||||
`make_preview(text: str, n_chars: int = 200)`, `report_stats(text: str)`, `main()`.
|
||||
- At least one doctest per function where reasonable.
|
||||
- Fallback text snippet if `data.txt` isn’t found.
|
||||
- Output: total chars, unique chars, 200-char preview with literal "\n".
|
||||
|
||||
=== Delivery
|
||||
- Start with a short “How to use this repo” preface and a file tree suggestion.
|
||||
- Then render Lessons 01–18 in order, each with the exact section headings above.
|
||||
- End with a short FAQ (Windows vs. macOS paths, UTF-8 issues, CPU vs. GPU notes).
|
||||
|
||||
Generate now.
|
||||
```
|
Reference in New Issue
Block a user