Files
Rosie/scripts
Dani 10ccdc2420 feat: add training data collection for Rosie
Personality Dataset (300+ examples):
- Greetings and farewells
- Emotions and reactions
- Physical interactions (pats, drags, touches)
- Questions and answers
- Help and support
- Jokes and entertainment
- Mood-based responses
- Conversation fillers
- Various user intents

Data Download Script:
- Download Project Gutenberg books (public domain)
- Instructions for OpenWebText (~8B tokens)
- Instructions for The Pile (~300B tokens)
- Automatic dataset combination
- Token counting and statistics
- Download progress bars

Ready to train:
1. Run: python scripts/download_training_data.py --all
2. Download additional datasets as needed
3. Run: python train_rosie.py --data_path data/combined_training.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 23:44:36 -04:00
..