1 Commits

Author SHA1 Message Date
538a44247b feat: add uniform random text generator with reproducible sampling
This commit introduces a new script that generates random text by uniformly sampling characters from the training data's vocabulary. It loads text from train.txt or falls back to data.txt, normalizes line endings, builds a sorted character vocabulary, and samples characters using a fixed RNG seed for reproducibility. The implementation includes command-line arguments for specifying generation length and random seed, making it configurable while maintaining consistent output for the same inputs.
2025-09-23 21:18:57 -04:00