- Phase 1 (Foundation): PluralKit detection + database setup - Phase 2 (Translation Engine): Emoji parsing + auto-translate - Phase 3 (Teaching System): User commands to learn emoji meanings - Phase 4 (Configuration): Per-server settings + scaling - Phase 5 (Polish): Logging + production hardening 100% requirement coverage: all 33 v1 requirements mapped to exactly one phase. Dependencies identified: Phase 1 → 2 → 3 → 4 → 5 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
19 KiB
Roadmap: Vivi Speech Translator
Vision: Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.
Phases: 5 | Requirements Covered: 33/33 | Coverage: 100% ✓
Phase Overview
| Phase | Name | Goal | Requirements | Success Criteria |
|---|---|---|---|---|
| 1 | Foundation | Reliably detect Vivi's PluralKit messages and set up database | DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 | 4 |
| 2 | Translation Engine | Parse emojis and translate to text with auto mode | TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 | 5 |
| 3 | Teaching System | Let users teach emoji meanings via simple commands | TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 | 5 |
| 4 | Configuration & Scaling | Add per-server settings and advanced features | CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 | 4 |
| 5 | Production Polish | Hardening, logging, and reliability | ERROR-02, ERROR-03 | 3 |
Phase Breakdown
Phase 1: Foundation (Weeks 1-2)
Goal: Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.
Why First:
- PluralKit detection is a load-bearing component (foundation for all translation)
- Database setup enables all future phases
- Demonstrates proof of concept that detection works reliably
Requirements Covered:
- DETECT-01: Bot detects Vivi via PluralKit webhook
- DETECT-02: Bot ignores non-Vivi messages (no false positives)
- DETECT-03: Bot works in channels and DMs
- DB-01: Global emoji dictionary persists
- GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)
Success Criteria:
- Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
- Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
- Database (SQLite for MVP) initializes on bot startup without errors
- Emoji dictionary table and server configuration table created and readable
Key Implementation Tasks:
- Set up discord.py client with correct intents (GUILDS, MEMBERS)
- Implement webhook detection (check message.webhook_id)
- Implement PluralKit API query to verify member_id matches Vivi's ID
- Initialize SQLite database with emoji_dictionary and server_config tables
- Test edge cases: Vivi's message edits, reactions, message deletes, DM context
Architecture Notes:
- Use aiosqlite for async database access (MVP)
- Cache PluralKit member list with 1-hour TTL to reduce API calls
- Use nacl.signing to verify PluralKit webhook signatures where possible
- Design message handler as a cog (modular for future systems)
Risks & Mitigations:
- Risk: PluralKit API rate limits during initialization (10/sec)
- Mitigation: Implement webhook dispatch instead of polling; cache member list
- Risk: Message content intent denied by Discord
- Mitigation: Design for slash commands as primary path; message content as optional enhancement
Dependencies: None (Phase 1 is the foundation)
Phase 2: Translation Engine (Weeks 3-4)
Goal: Parse emoji sequences from Vivi's messages and translate to text automatically.
Why Second:
- Builds on Phase 1's detection foundation (no detection = nothing to translate)
- Delivers core user value: Vivi posts, others understand
- Unblocks Phase 3 (teaching system requires working translation to be useful)
- Enables MVP validation with early users
Requirements Covered:
- TRANS-01: Parse standard emojis (😷, ❌, 2️⃣)
- TRANS-02: Parse custom Discord emojis (
:me1:, etc.) - TRANS-03: Translate and reply automatically (auto mode)
- TRANS-04: Read left-to-right composition (preserve emoji order)
- TRANS-05: Plain text responses only (no emoji-only translations)
- TRANS-06: Concise, clear responses (respects message length limits)
- DETECT-04: Reliably detect edits and reactions on Vivi messages
- UNK-01: Skip unknown emojis (only translate known ones)
- ERROR-01: React with ❌ on translation errors (graceful failure)
- CONFIG-02: Default mode is auto (bot translates every Vivi message)
- A11Y-01: Plain text accessible format (no heavy formatting)
- A11Y-04: Use slash commands (better keyboard navigation than buttons)
- DB-02: Emoji meanings shared across all Discord servers (one global dictionary)
Success Criteria:
- Bot translates
:me1:😷 2️⃣sequence to "Vivi is sick" and shows which emoji are unknown - Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
- Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
- Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run
/teach emoji meaning") - If database query fails, bot reacts with ❌ instead of crashing or spamming errors
Key Implementation Tasks:
- Use
emojilibrary 2.11.0 for standard emoji parsing (Unicode 15.0 support) - Regex pattern for custom Discord emoji detection (
:emoji_name:) - Dictionary lookup function (emoji → meaning) with O(1) query time
- Translation composition function (assemble meanings into natural language)
- Error handler that reacts with ❌ and logs error
- Message edit detection (track message ID, compare old vs new emoji)
- Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors
Architecture Notes:
- Translation engine should be stateless (all state in database)
- Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
- Handle emoji variants correctly (normalize input with NFD if needed)
- Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
- Response format: "Vivi says: [translation]" for clarity
Risks & Mitigations:
- Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
- Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
- Risk: Bot floods channel with translations (rate limiting)
- Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
- Risk: Message edits race condition (bot editing response while Vivi edits message)
- Mitigation: Post new translation instead of editing; queue requests with small delay if race detected
Dependencies: Phase 1 (detection + database)
Phase 3: Teaching System (Weeks 5-6)
Goal: Enable users to teach the bot new emoji meanings via simple commands.
Why Third:
- Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
- Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
- Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
- Addresses dysgraphia accessibility: teaching interface must be ultra-simple
Requirements Covered:
- TEACH-01:
/teach emoji "meaning"command - TEACH-02: Bot confirms what it learned (shows emoji and meaning)
- TEACH-03: Meanings stored globally (shared across all servers)
- TEACH-04:
/meaning emojiquery command - TEACH-05:
/correct emoji "new meaning"update command - TEACH-06: Audit trail (who, what, when for all changes)
- TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
- UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
- UNK-03: Teaching prompt is accessible (simple command:
/teach emoji "meaning") - A11Y-02: Simple command syntax (one-liner, no complex structure)
- A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
- A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
- DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
- GEN-03: Modular code for other systems (teaching system not Vivi-specific)
Success Criteria:
- User runs
/teach 🎭 "happy performance", bot confirms "Taught: 🎭 = happy performance" with emoji shown - Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
- User runs
/meaning 🎭, bot replies with current meaning (or "Unknown emoji") - User runs
/correct 🎭 "joyful", meaning updates and audit trail records who changed it and when /teachcommand fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)
Key Implementation Tasks:
- Discord.py slash command handler for
/teach emoji "meaning" - Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
- Database insert into emoji_dictionary with user_id and timestamp
/meaning emojiquery command with error handling/correct emoji "new meaning"update command with audit trail- Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
- Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
- Permission checks: log who taught what, but don't restrict based on role (accessibility)
- Test that emojis can be taught, corrected, queried in rapid succession
Architecture Notes:
- Teaching commands should be accessible to all users (not admin-only; logged for accountability)
- Confirmation should always show the emoji visually (immediate visual feedback)
- Responses must be short (<2 sentences) for dysgraphia accessibility
- Audit trail enables future
/undofeature (deferred to v2) - Global dictionary shared across servers; per-server overrides deferred to Phase 4+
Risks & Mitigations:
- Risk: Users teach wrong/inappropriate meanings (spam, trolling)
- Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
- Risk: Teaching interface too complex for Vivi (with dysgraphia)
- Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
- Risk: Conflicting emoji meanings across servers
- Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides
Dependencies: Phase 2 (translation must work so teaching has value)
Phase 4: Configuration & Scaling (Week 7)
Goal: Add per-server settings and prepare for scaling.
Why Fourth:
- Builds on Phases 1-3 foundation (all core features working)
- Enables server customization without breaking global emoji dictionary
- Prepares architecture for multi-server scaling and future systems
- Adds operator controls (config commands)
Requirements Covered:
- CONFIG-01:
/config auto-translate on|offtoggle per server - CONFIG-03: Per-server persistence (setting survives bot restart)
- CONFIG-04: Admin-only changes (permission check on
/configcommand) - DB-03: Per-server config table
- GEN-02: Architecture supports per-system overrides (design for future multi-system)
Success Criteria:
- Server admin runs
/config auto-translate off, future Vivi messages don't auto-translate (bot is silent by default) - Another server has auto enabled; both work independently (no crosstalk)
- Setting persists across bot restart (database query returns correct value)
- Non-admin user runs
/config auto-translate on, bot rejects with "Admin only" message - Default for new servers is auto mode enabled (true by default in code)
Key Implementation Tasks:
/config auto-translate <on|off>slash command- Permission check (admin-only; use discord.py's default_member_permissions)
- server_config table update/insert (guild_id as PK, auto_translate as boolean)
- Modify message handler to check per-guild setting before auto-translating
- Implement on-demand mode placeholder (manual translation via
/translatecommand; full reaction-based mode deferred to v2) - Cache per-server settings for performance (1-minute TTL)
- Test: change setting, verify immediate effect; verify effect persists after restart
Architecture Notes:
- On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
- Per-server config indexed by guild_id for O(1) lookup
- Cache server settings in memory with TTL to avoid DB hammering
- Design allows per-system overrides in future (deferred to v2)
Risks & Mitigations:
- Risk: Per-server override conflicts if not careful
- Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
- Risk: Config command is confusing to admins
- Mitigation: Clear help text; only two options (on/off); simple feedback message
Dependencies: Phases 1-3 (all features working; config customizes them)
Phase 5: Production Polish (Week 8+)
Goal: Production hardening, logging, error handling, and monitoring.
Why Last:
- Follows all feature phases (features must be complete before hardening)
- Improves reliability and debuggability (enables diagnosis of issues in production)
- Prepares for public adoption by other systems or larger communities
Requirements Covered:
- ERROR-02: Bot logging for debugging (structured JSON logs)
- ERROR-03: Exponential backoff for failed DB operations and API calls
Success Criteria:
- Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
- Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
- Bot retries failed database operations (same backoff strategy; max 5 attempts)
- Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
- All error logs include context (guild_id, user_id, emoji attempted, operation type)
Key Implementation Tasks:
- Set up Python logging with JSON format (structlog or custom JSON formatter)
- Implement retry logic with asyncio.sleep backoff for PluralKit API calls
- Implement retry logic with backoff for database queries
- Add global exception handler (on_error in discord.py)
- Comprehensive error documentation (what errors mean, how to diagnose)
- Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
- Optional: Sentry integration for error tracking (recommended but not required for v1)
Architecture Notes:
- Async error handling must not block the event loop
- Retry logic should use exponential backoff with jitter (avoid thundering herd)
- Logs should include PluralKit request context (duration, status code, member_id) for debugging
- Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps
Risks & Mitigations:
- Risk: Logging overhead impacts performance
- Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
- Risk: Backoff strategy causes noticeable delays when DB is down
- Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly
Dependencies: Phases 1-4 (all features working; Phase 5 hardens them)
Dependency Chain
Phase 1: Foundation (Discord client, PluralKit detection, database)
↓
Phase 2: Translation (emoji parsing, lookups, auto-translation)
↓
Phase 3: Teaching (commands to add/update meanings; audit trail)
↓
Phase 4: Configuration (per-server auto/on-demand toggle)
↓
Phase 5: Polish (logging, retry logic, production hardening)
Critical Path: Phase 1 → Phase 2 → Phase 3 (core value) Optional Path: Phase 4 (customization), Phase 5 (hardening)
Requirement Coverage Summary
Total v1 Requirements: 33 Mapped to Phases: 33 Unmapped: 0 ✓
Coverage by Category:
- Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
- Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
- Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
- Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
- Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
- Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
- Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
- Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
- Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4
Key Decisions & Rationale
| Decision | Rationale | Phase |
|---|---|---|
| Global emoji dictionary | Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge | 1-3 |
| Learning-based system | Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. | 3 |
| PluralKit webhook integration | Standard plural system tool; webhook dispatch is instant and free (vs API polling) | 1 |
| Auto mode as default | Translates automatically; more accessible for Vivi (no extra action needed) | 2, 4 |
| Per-server configuration | Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary | 4 |
| Plain text responses only | Accessibility for dysgraphia; avoids confusion with Discord formatting | 2 |
| No context inference | Meanings learned explicitly; avoids false positives and keeps system transparent | Out of Scope |
| Five-phase structure | Mirrors research recommendations; each phase delivers measurable value | All |
Key Metrics & Success Indicators
Per Phase:
- Phase 1: >99% detection accuracy, zero false positives in testing
- Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
- Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
- Phase 4: Settings persist across restart, multi-server support verified
- Phase 5: <0.1% error rate, all failures logged and alertable
Overall:
- Time to MVP (Phases 1-2): 3-4 weeks
- Time to v1 (Phases 1-5): 5-6 weeks
- Requirements per phase: 3-7 (manageable scope)
- Value delivered: Incremental (each phase adds core functionality)
Known Limitations & v2+ Backlog
Intentionally v2+:
- Per-server emoji overrides (global dictionary only in v1)
- Reaction-based on-demand translation (slash command placeholder only)
- Analytics dashboard (
/stats,/emoji-list) - Moderation UI (flag/approve/reject meanings)
- Multi-language emoji meanings
- Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)
Implementation Notes
Tech Stack:
- discord.py 2.6.4 (async-first, native slash commands)
- SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
- aiosqlite (async database access)
- emoji 2.11.0 (Unicode 15.0 support)
- pydantic 2.5.0 (data validation)
- Railway Cloud for hosting (free tier for MVP)
Code Organization:
bot.py- Main discord.py client, event loopcogs/detection.py- Message event handler, PluralKit detectioncogs/translation.py- Emoji parsing, dictionary lookup, compositioncogs/teaching.py-/teach,/meaning,/correctcommandscogs/config.py-/configcommand, per-server settingsdatabase.py- SQLAlchemy ORM, async querieslogging.py- Structured JSON logging, retry logic
Testing Strategy:
- Unit tests: emoji parsing, dictionary lookups, composition
- Integration tests: PluralKit detection with mock webhooks
- Accessibility tests: response length, emoji names for screen readers
- Load tests: 100+ servers, 1000+ emoji, response times
Roadmap created: 2025-01-29 Ready for Phase 1 planning Depth: Quick (5 phases, natural boundaries)