feat: Add train/val split script and update gitignore

- Added 03_train_val_split.py to create deterministic train/validation splits from data.txt or fallback text
- Updated .gitignore to un-comment .vscode/ directory exclusion
- Changed data.txt pattern to *.txt for better file matching in gitignore
- Script handles UTF-8 text loading with newline normalization and writes train.txt/val.txt files
- Includes doctest examples and proper type hints
This commit is contained in:
2025-09-23 20:28:22 -04:00
parent a82efe6ea2
commit abba60a798
2 changed files with 81 additions and 2 deletions

4
.gitignore vendored
View File

@@ -199,7 +199,7 @@ cython_debug/
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/
.vscode/
# Ruff stuff:
.ruff_cache/
@@ -216,4 +216,4 @@ __marimo__/
.streamlit/secrets.toml
# Data/Material that should not be synced
data.txt
*.txt