mystiatech/Aria - Aria - Gitea: Git with a cup of tea

mystiatech/Aria

Fork 0

Commit Graph

Author	SHA1	Message	Date
Dani	feecf05ee3	feat: add vocabulary encoding/decoding script with character-level tokenization This commit introduces a new script that implements character-level vocabulary building and text encoding/decoding functionality. The script loads text from train.txt or falls back to data.txt, normalizes line endings, builds character-to-id mappings, and includes round-trip encoding/decoding validation. It's designed for CPU-only operation using only Python standard library modules and provides clear error handling for unseen characters during encoding.	2025-09-23 20:57:48 -04:00

Author

SHA1

Message

Date

Dani

feecf05ee3

feat: add vocabulary encoding/decoding script with character-level tokenization

This commit introduces a new script that implements character-level vocabulary building and text encoding/decoding functionality. The script loads text from train.txt or falls back to data.txt, normalizes line endings, builds character-to-id mappings, and includes round-trip encoding/decoding validation. It's designed for CPU-only operation using only Python standard library modules and provides clear error handling for unseen characters during encoding.

2025-09-23 20:57:48 -04:00

1 Commits