mystiatech/Aria - Aria - Gitea: Git with a cup of tea

mystiatech/Aria

Fork 0

Commit Graph

Author	SHA1	Message	Date
Dani	674c53651c	feat: add Laplace-smoothed bigram model perplexity computation script This commit introduces a new script that implements a Laplace-smoothed bigram language model for computing validation perplexity. The implementation includes: - Data loading and splitting functionality (90/10 train/validation split) - Character vocabulary building from training data only - Bigram counting and Laplace smoothing with alpha=1.0 - Negative log-likelihood and perplexity computation - Proper handling of out-of-vocabulary characters during evaluation The script can process existing train.txt/val.txt files or automatically split a data.txt file if the required input files are missing, making it self-contained and easy to use for language model evaluation tasks.	2025-09-24 00:33:23 -04:00

Author

SHA1

Message

Date

Dani

674c53651c

feat: add Laplace-smoothed bigram model perplexity computation script

This commit introduces a new script that implements a Laplace-smoothed bigram language model for computing validation perplexity. The implementation includes:
- Data loading and splitting functionality (90/10 train/validation split)
- Character vocabulary building from training data only
- Bigram counting and Laplace smoothing with alpha=1.0
- Negative log-likelihood and perplexity computation
- Proper handling of out-of-vocabulary characters during evaluation

The script can process existing train.txt/val.txt files or automatically split a data.txt file if the required input files are missing, making it self-contained and easy to use for language model evaluation tasks.

2025-09-24 00:33:23 -04:00

1 Commits