1 Commits

Author SHA1 Message Date
a82efe6ea2 add character frequency counter script with newline normalization and sorted output
This commit introduces a new script (02_char_freq.py) that analyzes character frequencies in text corpora. The script loads text from train.txt or data.txt (with fallback), normalizes all line endings to LF, counts character occurrences, sorts them by frequency (descending) and character (ascending), then prints a formatted ASCII table showing the top 50 most frequent characters. Newlines are displayed as literal "\\n" sequences in the output. The script includes proper type hints, docstrings with doctests, and handles missing files gracefully with a built-in fallback text.
2025-09-23 20:08:30 -04:00