# Roadmap: Vivi Speech Translator **Vision:** Discord bot that translates Vivi's emoji communication into text so her community understands her instantly. **Phases:** 5 | **Requirements Covered:** 33/33 | **Coverage:** 100% ✓ --- ## Phase Overview | Phase | Name | Goal | Requirements | Success Criteria | |-------|------|------|--------------|------------------| | 1 | Foundation | Reliably detect Vivi's PluralKit messages and set up database | DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 | 4 | | 2 | Translation Engine | Parse emojis and translate to text with auto mode | TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 | 5 | | 3 | Teaching System | Let users teach emoji meanings via simple commands | TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 | 5 | | 4 | Configuration & Scaling | Add per-server settings and advanced features | CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 | 4 | | 5 | Production Polish | Hardening, logging, and reliability | ERROR-02, ERROR-03 | 3 | --- ## Phase Breakdown ### Phase 1: Foundation (Weeks 1-2) **Goal:** Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure. **Why First:** - PluralKit detection is a load-bearing component (foundation for all translation) - Database setup enables all future phases - Demonstrates proof of concept that detection works reliably **Requirements Covered:** - DETECT-01: Bot detects Vivi via PluralKit webhook - DETECT-02: Bot ignores non-Vivi messages (no false positives) - DETECT-03: Bot works in channels and DMs - DB-01: Global emoji dictionary persists - GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded) **Success Criteria:** 1. Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook 2. Bot ignores messages from other system members (no false positives on non-Vivi webhooks) 3. Database (SQLite for MVP) initializes on bot startup without errors 4. Emoji dictionary table and server configuration table created and readable **Key Implementation Tasks:** - Set up discord.py client with correct intents (GUILDS, MEMBERS) - Implement webhook detection (check message.webhook_id) - Implement PluralKit API query to verify member_id matches Vivi's ID - Initialize SQLite database with emoji_dictionary and server_config tables - Test edge cases: Vivi's message edits, reactions, message deletes, DM context **Architecture Notes:** - Use aiosqlite for async database access (MVP) - Cache PluralKit member list with 1-hour TTL to reduce API calls - Use nacl.signing to verify PluralKit webhook signatures where possible - Design message handler as a cog (modular for future systems) **Risks & Mitigations:** - Risk: PluralKit API rate limits during initialization (10/sec) - Mitigation: Implement webhook dispatch instead of polling; cache member list - Risk: Message content intent denied by Discord - Mitigation: Design for slash commands as primary path; message content as optional enhancement **Dependencies:** None (Phase 1 is the foundation) --- ### Phase 2: Translation Engine (Weeks 3-4) **Goal:** Parse emoji sequences from Vivi's messages and translate to text automatically. **Why Second:** - Builds on Phase 1's detection foundation (no detection = nothing to translate) - Delivers core user value: Vivi posts, others understand - Unblocks Phase 3 (teaching system requires working translation to be useful) - Enables MVP validation with early users **Requirements Covered:** - TRANS-01: Parse standard emojis (😷, ❌, 2️⃣) - TRANS-02: Parse custom Discord emojis (`:me1:`, etc.) - TRANS-03: Translate and reply automatically (auto mode) - TRANS-04: Read left-to-right composition (preserve emoji order) - TRANS-05: Plain text responses only (no emoji-only translations) - TRANS-06: Concise, clear responses (respects message length limits) - DETECT-04: Reliably detect edits and reactions on Vivi messages - UNK-01: Skip unknown emojis (only translate known ones) - ERROR-01: React with ❌ on translation errors (graceful failure) - CONFIG-02: Default mode is auto (bot translates every Vivi message) - A11Y-01: Plain text accessible format (no heavy formatting) - A11Y-04: Use slash commands (better keyboard navigation than buttons) - DB-02: Emoji meanings shared across all Discord servers (one global dictionary) **Success Criteria:** 1. Bot translates `:me1:😷 2️⃣` sequence to "Vivi is sick" and shows which emoji are unknown 2. Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition 3. Bot response is plain text, under 3 sentences (accessibility for dysgraphia) 4. Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run `/teach emoji meaning`") 5. If database query fails, bot reacts with ❌ instead of crashing or spamming errors **Key Implementation Tasks:** - Use `emoji` library 2.11.0 for standard emoji parsing (Unicode 15.0 support) - Regex pattern for custom Discord emoji detection (`:emoji_name:`) - Dictionary lookup function (emoji → meaning) with O(1) query time - Translation composition function (assemble meanings into natural language) - Error handler that reacts with ❌ and logs error - Message edit detection (track message ID, compare old vs new emoji) - Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors **Architecture Notes:** - Translation engine should be stateless (all state in database) - Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor) - Handle emoji variants correctly (normalize input with NFD if needed) - Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4) - Response format: "Vivi says: [translation]" for clarity **Risks & Mitigations:** - Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures - Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex - Risk: Bot floods channel with translations (rate limiting) - Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+) - Risk: Message edits race condition (bot editing response while Vivi edits message) - Mitigation: Post new translation instead of editing; queue requests with small delay if race detected **Dependencies:** Phase 1 (detection + database) --- ### Phase 3: Teaching System (Weeks 5-6) **Goal:** Enable users to teach the bot new emoji meanings via simple commands. **Why Third:** - Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point) - Enables the learning system (core differentiator: bot becomes more useful as community teaches it) - Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary - Addresses dysgraphia accessibility: teaching interface must be ultra-simple **Requirements Covered:** - TEACH-01: `/teach emoji "meaning"` command - TEACH-02: Bot confirms what it learned (shows emoji and meaning) - TEACH-03: Meanings stored globally (shared across all servers) - TEACH-04: `/meaning emoji` query command - TEACH-05: `/correct emoji "new meaning"` update command - TEACH-06: Audit trail (who, what, when for all changes) - TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability) - UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt) - UNK-03: Teaching prompt is accessible (simple command: `/teach emoji "meaning"`) - A11Y-02: Simple command syntax (one-liner, no complex structure) - A11Y-03: Concise responses (under 2 sentences; important for dysgraphia) - A11Y-05: Emoji names in responses (for screen readers; show Unicode name) - DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp) - GEN-03: Modular code for other systems (teaching system not Vivi-specific) **Success Criteria:** 1. User runs `/teach 🎭 "happy performance"`, bot confirms "Taught: 🎭 = happy performance" with emoji shown 2. Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup) 3. User runs `/meaning 🎭`, bot replies with current meaning (or "Unknown emoji") 4. User runs `/correct 🎭 "joyful"`, meaning updates and audit trail records who changed it and when 5. `/teach` command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message) **Key Implementation Tasks:** - Discord.py slash command handler for `/teach emoji "meaning"` - Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation - Database insert into emoji_dictionary with user_id and timestamp - `/meaning emoji` query command with error handling - `/correct emoji "new meaning"` update command with audit trail - Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS") - Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type - Permission checks: log who taught what, but don't restrict based on role (accessibility) - Test that emojis can be taught, corrected, queried in rapid succession **Architecture Notes:** - Teaching commands should be accessible to all users (not admin-only; logged for accountability) - Confirmation should always show the emoji visually (immediate visual feedback) - Responses must be short (<2 sentences) for dysgraphia accessibility - Audit trail enables future `/undo` feature (deferred to v2) - Global dictionary shared across servers; per-server overrides deferred to Phase 4+ **Risks & Mitigations:** - Risk: Users teach wrong/inappropriate meanings (spam, trolling) - Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed - Risk: Teaching interface too complex for Vivi (with dysgraphia) - Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early - Risk: Conflicting emoji meanings across servers - Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides **Dependencies:** Phase 2 (translation must work so teaching has value) --- ### Phase 4: Configuration & Scaling (Week 7) **Goal:** Add per-server settings and prepare for scaling. **Why Fourth:** - Builds on Phases 1-3 foundation (all core features working) - Enables server customization without breaking global emoji dictionary - Prepares architecture for multi-server scaling and future systems - Adds operator controls (config commands) **Requirements Covered:** - CONFIG-01: `/config auto-translate on|off` toggle per server - CONFIG-03: Per-server persistence (setting survives bot restart) - CONFIG-04: Admin-only changes (permission check on `/config` command) - DB-03: Per-server config table - GEN-02: Architecture supports per-system overrides (design for future multi-system) **Success Criteria:** 1. Server admin runs `/config auto-translate off`, future Vivi messages don't auto-translate (bot is silent by default) 2. Another server has auto enabled; both work independently (no crosstalk) 3. Setting persists across bot restart (database query returns correct value) 4. Non-admin user runs `/config auto-translate on`, bot rejects with "Admin only" message 5. Default for new servers is auto mode enabled (true by default in code) **Key Implementation Tasks:** - `/config auto-translate ` slash command - Permission check (admin-only; use discord.py's default_member_permissions) - server_config table update/insert (guild_id as PK, auto_translate as boolean) - Modify message handler to check per-guild setting before auto-translating - Implement on-demand mode placeholder (manual translation via `/translate` command; full reaction-based mode deferred to v2) - Cache per-server settings for performance (1-minute TTL) - Test: change setting, verify immediate effect; verify effect persists after restart **Architecture Notes:** - On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1) - Per-server config indexed by guild_id for O(1) lookup - Cache server settings in memory with TTL to avoid DB hammering - Design allows per-system overrides in future (deferred to v2) **Risks & Mitigations:** - Risk: Per-server override conflicts if not careful - Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle - Risk: Config command is confusing to admins - Mitigation: Clear help text; only two options (on/off); simple feedback message **Dependencies:** Phases 1-3 (all features working; config customizes them) --- ### Phase 5: Production Polish (Week 8+) **Goal:** Production hardening, logging, error handling, and monitoring. **Why Last:** - Follows all feature phases (features must be complete before hardening) - Improves reliability and debuggability (enables diagnosis of issues in production) - Prepares for public adoption by other systems or larger communities **Requirements Covered:** - ERROR-02: Bot logging for debugging (structured JSON logs) - ERROR-03: Exponential backoff for failed DB operations and API calls **Success Criteria:** 1. Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation 2. Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter) 3. Bot retries failed database operations (same backoff strategy; max 5 attempts) 4. Unhandled exceptions caught and logged (no spam in user channels; clear error reaction) 5. All error logs include context (guild_id, user_id, emoji attempted, operation type) **Key Implementation Tasks:** - Set up Python logging with JSON format (structlog or custom JSON formatter) - Implement retry logic with asyncio.sleep backoff for PluralKit API calls - Implement retry logic with backoff for database queries - Add global exception handler (on_error in discord.py) - Comprehensive error documentation (what errors mean, how to diagnose) - Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents - Optional: Sentry integration for error tracking (recommended but not required for v1) **Architecture Notes:** - Async error handling must not block the event loop - Retry logic should use exponential backoff with jitter (avoid thundering herd) - Logs should include PluralKit request context (duration, status code, member_id) for debugging - Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps **Risks & Mitigations:** - Risk: Logging overhead impacts performance - Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup) - Risk: Backoff strategy causes noticeable delays when DB is down - Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly **Dependencies:** Phases 1-4 (all features working; Phase 5 hardens them) --- ## Dependency Chain ``` Phase 1: Foundation (Discord client, PluralKit detection, database) ↓ Phase 2: Translation (emoji parsing, lookups, auto-translation) ↓ Phase 3: Teaching (commands to add/update meanings; audit trail) ↓ Phase 4: Configuration (per-server auto/on-demand toggle) ↓ Phase 5: Polish (logging, retry logic, production hardening) ``` **Critical Path:** Phase 1 → Phase 2 → Phase 3 (core value) **Optional Path:** Phase 4 (customization), Phase 5 (hardening) --- ## Requirement Coverage Summary **Total v1 Requirements:** 33 **Mapped to Phases:** 33 **Unmapped:** 0 ✓ **Coverage by Category:** - Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2 - Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2 - Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3 - Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3 - Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5 - Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4 - Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4 - Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3 - Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4 --- ## Key Decisions & Rationale | Decision | Rationale | Phase | |----------|-----------|-------| | Global emoji dictionary | Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge | 1-3 | | Learning-based system | Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. | 3 | | PluralKit webhook integration | Standard plural system tool; webhook dispatch is instant and free (vs API polling) | 1 | | Auto mode as default | Translates automatically; more accessible for Vivi (no extra action needed) | 2, 4 | | Per-server configuration | Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary | 4 | | Plain text responses only | Accessibility for dysgraphia; avoids confusion with Discord formatting | 2 | | No context inference | Meanings learned explicitly; avoids false positives and keeps system transparent | Out of Scope | | Five-phase structure | Mirrors research recommendations; each phase delivers measurable value | All | --- ## Key Metrics & Success Indicators **Per Phase:** - Phase 1: >99% detection accuracy, zero false positives in testing - Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji - Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week - Phase 4: Settings persist across restart, multi-server support verified - Phase 5: <0.1% error rate, all failures logged and alertable **Overall:** - Time to MVP (Phases 1-2): 3-4 weeks - Time to v1 (Phases 1-5): 5-6 weeks - Requirements per phase: 3-7 (manageable scope) - Value delivered: Incremental (each phase adds core functionality) --- ## Known Limitations & v2+ Backlog **Intentionally v2+:** - Per-server emoji overrides (global dictionary only in v1) - Reaction-based on-demand translation (slash command placeholder only) - Analytics dashboard (`/stats`, `/emoji-list`) - Moderation UI (flag/approve/reject meanings) - Multi-language emoji meanings - Support for other plural systems beyond PluralKit (architecture designed for it, not enabled) --- ## Implementation Notes **Tech Stack:** - discord.py 2.6.4 (async-first, native slash commands) - SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji) - aiosqlite (async database access) - emoji 2.11.0 (Unicode 15.0 support) - pydantic 2.5.0 (data validation) - Railway Cloud for hosting (free tier for MVP) **Code Organization:** - `bot.py` - Main discord.py client, event loop - `cogs/detection.py` - Message event handler, PluralKit detection - `cogs/translation.py` - Emoji parsing, dictionary lookup, composition - `cogs/teaching.py` - `/teach`, `/meaning`, `/correct` commands - `cogs/config.py` - `/config` command, per-server settings - `database.py` - SQLAlchemy ORM, async queries - `logging.py` - Structured JSON logging, retry logic **Testing Strategy:** - Unit tests: emoji parsing, dictionary lookups, composition - Integration tests: PluralKit detection with mock webhooks - Accessibility tests: response length, emoji names for screen readers - Load tests: 100+ servers, 1000+ emoji, response times --- **Roadmap created: 2025-01-29** **Ready for Phase 1 planning** **Depth: Quick (5 phases, natural boundaries)**