Files
Vivi-Speech/.planning/ROADMAP.md
Dani B 3e89f67666 docs: create 5-phase roadmap with 33 requirements mapped
- Phase 1 (Foundation): PluralKit detection + database setup
- Phase 2 (Translation Engine): Emoji parsing + auto-translate
- Phase 3 (Teaching System): User commands to learn emoji meanings
- Phase 4 (Configuration): Per-server settings + scaling
- Phase 5 (Polish): Logging + production hardening

100% requirement coverage: all 33 v1 requirements mapped to exactly one phase.
Dependencies identified: Phase 1 → 2 → 3 → 4 → 5

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:08:02 -05:00

19 KiB
Raw Blame History

Roadmap: Vivi Speech Translator

Vision: Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.

Phases: 5 | Requirements Covered: 33/33 | Coverage: 100% ✓


Phase Overview

Phase Name Goal Requirements Success Criteria
1 Foundation Reliably detect Vivi's PluralKit messages and set up database DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 4
2 Translation Engine Parse emojis and translate to text with auto mode TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 5
3 Teaching System Let users teach emoji meanings via simple commands TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 5
4 Configuration & Scaling Add per-server settings and advanced features CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 4
5 Production Polish Hardening, logging, and reliability ERROR-02, ERROR-03 3

Phase Breakdown

Phase 1: Foundation (Weeks 1-2)

Goal: Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.

Why First:

  • PluralKit detection is a load-bearing component (foundation for all translation)
  • Database setup enables all future phases
  • Demonstrates proof of concept that detection works reliably

Requirements Covered:

  • DETECT-01: Bot detects Vivi via PluralKit webhook
  • DETECT-02: Bot ignores non-Vivi messages (no false positives)
  • DETECT-03: Bot works in channels and DMs
  • DB-01: Global emoji dictionary persists
  • GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)

Success Criteria:

  1. Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
  2. Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
  3. Database (SQLite for MVP) initializes on bot startup without errors
  4. Emoji dictionary table and server configuration table created and readable

Key Implementation Tasks:

  • Set up discord.py client with correct intents (GUILDS, MEMBERS)
  • Implement webhook detection (check message.webhook_id)
  • Implement PluralKit API query to verify member_id matches Vivi's ID
  • Initialize SQLite database with emoji_dictionary and server_config tables
  • Test edge cases: Vivi's message edits, reactions, message deletes, DM context

Architecture Notes:

  • Use aiosqlite for async database access (MVP)
  • Cache PluralKit member list with 1-hour TTL to reduce API calls
  • Use nacl.signing to verify PluralKit webhook signatures where possible
  • Design message handler as a cog (modular for future systems)

Risks & Mitigations:

  • Risk: PluralKit API rate limits during initialization (10/sec)
    • Mitigation: Implement webhook dispatch instead of polling; cache member list
  • Risk: Message content intent denied by Discord
    • Mitigation: Design for slash commands as primary path; message content as optional enhancement

Dependencies: None (Phase 1 is the foundation)


Phase 2: Translation Engine (Weeks 3-4)

Goal: Parse emoji sequences from Vivi's messages and translate to text automatically.

Why Second:

  • Builds on Phase 1's detection foundation (no detection = nothing to translate)
  • Delivers core user value: Vivi posts, others understand
  • Unblocks Phase 3 (teaching system requires working translation to be useful)
  • Enables MVP validation with early users

Requirements Covered:

  • TRANS-01: Parse standard emojis (😷, , 2)
  • TRANS-02: Parse custom Discord emojis (:me1:, etc.)
  • TRANS-03: Translate and reply automatically (auto mode)
  • TRANS-04: Read left-to-right composition (preserve emoji order)
  • TRANS-05: Plain text responses only (no emoji-only translations)
  • TRANS-06: Concise, clear responses (respects message length limits)
  • DETECT-04: Reliably detect edits and reactions on Vivi messages
  • UNK-01: Skip unknown emojis (only translate known ones)
  • ERROR-01: React with on translation errors (graceful failure)
  • CONFIG-02: Default mode is auto (bot translates every Vivi message)
  • A11Y-01: Plain text accessible format (no heavy formatting)
  • A11Y-04: Use slash commands (better keyboard navigation than buttons)
  • DB-02: Emoji meanings shared across all Discord servers (one global dictionary)

Success Criteria:

  1. Bot translates :me1:😷 2 sequence to "Vivi is sick" and shows which emoji are unknown
  2. Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
  3. Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
  4. Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run /teach emoji meaning")
  5. If database query fails, bot reacts with instead of crashing or spamming errors

Key Implementation Tasks:

  • Use emoji library 2.11.0 for standard emoji parsing (Unicode 15.0 support)
  • Regex pattern for custom Discord emoji detection (:emoji_name:)
  • Dictionary lookup function (emoji → meaning) with O(1) query time
  • Translation composition function (assemble meanings into natural language)
  • Error handler that reacts with and logs error
  • Message edit detection (track message ID, compare old vs new emoji)
  • Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors

Architecture Notes:

  • Translation engine should be stateless (all state in database)
  • Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
  • Handle emoji variants correctly (normalize input with NFD if needed)
  • Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
  • Response format: "Vivi says: [translation]" for clarity

Risks & Mitigations:

  • Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
    • Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
  • Risk: Bot floods channel with translations (rate limiting)
    • Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
  • Risk: Message edits race condition (bot editing response while Vivi edits message)
    • Mitigation: Post new translation instead of editing; queue requests with small delay if race detected

Dependencies: Phase 1 (detection + database)


Phase 3: Teaching System (Weeks 5-6)

Goal: Enable users to teach the bot new emoji meanings via simple commands.

Why Third:

  • Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
  • Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
  • Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
  • Addresses dysgraphia accessibility: teaching interface must be ultra-simple

Requirements Covered:

  • TEACH-01: /teach emoji "meaning" command
  • TEACH-02: Bot confirms what it learned (shows emoji and meaning)
  • TEACH-03: Meanings stored globally (shared across all servers)
  • TEACH-04: /meaning emoji query command
  • TEACH-05: /correct emoji "new meaning" update command
  • TEACH-06: Audit trail (who, what, when for all changes)
  • TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
  • UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
  • UNK-03: Teaching prompt is accessible (simple command: /teach emoji "meaning")
  • A11Y-02: Simple command syntax (one-liner, no complex structure)
  • A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
  • A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
  • DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
  • GEN-03: Modular code for other systems (teaching system not Vivi-specific)

Success Criteria:

  1. User runs /teach 🎭 "happy performance", bot confirms "Taught: 🎭 = happy performance" with emoji shown
  2. Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
  3. User runs /meaning 🎭, bot replies with current meaning (or "Unknown emoji")
  4. User runs /correct 🎭 "joyful", meaning updates and audit trail records who changed it and when
  5. /teach command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)

Key Implementation Tasks:

  • Discord.py slash command handler for /teach emoji "meaning"
  • Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
  • Database insert into emoji_dictionary with user_id and timestamp
  • /meaning emoji query command with error handling
  • /correct emoji "new meaning" update command with audit trail
  • Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
  • Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
  • Permission checks: log who taught what, but don't restrict based on role (accessibility)
  • Test that emojis can be taught, corrected, queried in rapid succession

Architecture Notes:

  • Teaching commands should be accessible to all users (not admin-only; logged for accountability)
  • Confirmation should always show the emoji visually (immediate visual feedback)
  • Responses must be short (<2 sentences) for dysgraphia accessibility
  • Audit trail enables future /undo feature (deferred to v2)
  • Global dictionary shared across servers; per-server overrides deferred to Phase 4+

Risks & Mitigations:

  • Risk: Users teach wrong/inappropriate meanings (spam, trolling)
    • Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
  • Risk: Teaching interface too complex for Vivi (with dysgraphia)
    • Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
  • Risk: Conflicting emoji meanings across servers
    • Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides

Dependencies: Phase 2 (translation must work so teaching has value)


Phase 4: Configuration & Scaling (Week 7)

Goal: Add per-server settings and prepare for scaling.

Why Fourth:

  • Builds on Phases 1-3 foundation (all core features working)
  • Enables server customization without breaking global emoji dictionary
  • Prepares architecture for multi-server scaling and future systems
  • Adds operator controls (config commands)

Requirements Covered:

  • CONFIG-01: /config auto-translate on|off toggle per server
  • CONFIG-03: Per-server persistence (setting survives bot restart)
  • CONFIG-04: Admin-only changes (permission check on /config command)
  • DB-03: Per-server config table
  • GEN-02: Architecture supports per-system overrides (design for future multi-system)

Success Criteria:

  1. Server admin runs /config auto-translate off, future Vivi messages don't auto-translate (bot is silent by default)
  2. Another server has auto enabled; both work independently (no crosstalk)
  3. Setting persists across bot restart (database query returns correct value)
  4. Non-admin user runs /config auto-translate on, bot rejects with "Admin only" message
  5. Default for new servers is auto mode enabled (true by default in code)

Key Implementation Tasks:

  • /config auto-translate <on|off> slash command
  • Permission check (admin-only; use discord.py's default_member_permissions)
  • server_config table update/insert (guild_id as PK, auto_translate as boolean)
  • Modify message handler to check per-guild setting before auto-translating
  • Implement on-demand mode placeholder (manual translation via /translate command; full reaction-based mode deferred to v2)
  • Cache per-server settings for performance (1-minute TTL)
  • Test: change setting, verify immediate effect; verify effect persists after restart

Architecture Notes:

  • On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
  • Per-server config indexed by guild_id for O(1) lookup
  • Cache server settings in memory with TTL to avoid DB hammering
  • Design allows per-system overrides in future (deferred to v2)

Risks & Mitigations:

  • Risk: Per-server override conflicts if not careful
    • Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
  • Risk: Config command is confusing to admins
    • Mitigation: Clear help text; only two options (on/off); simple feedback message

Dependencies: Phases 1-3 (all features working; config customizes them)


Phase 5: Production Polish (Week 8+)

Goal: Production hardening, logging, error handling, and monitoring.

Why Last:

  • Follows all feature phases (features must be complete before hardening)
  • Improves reliability and debuggability (enables diagnosis of issues in production)
  • Prepares for public adoption by other systems or larger communities

Requirements Covered:

  • ERROR-02: Bot logging for debugging (structured JSON logs)
  • ERROR-03: Exponential backoff for failed DB operations and API calls

Success Criteria:

  1. Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
  2. Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
  3. Bot retries failed database operations (same backoff strategy; max 5 attempts)
  4. Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
  5. All error logs include context (guild_id, user_id, emoji attempted, operation type)

Key Implementation Tasks:

  • Set up Python logging with JSON format (structlog or custom JSON formatter)
  • Implement retry logic with asyncio.sleep backoff for PluralKit API calls
  • Implement retry logic with backoff for database queries
  • Add global exception handler (on_error in discord.py)
  • Comprehensive error documentation (what errors mean, how to diagnose)
  • Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
  • Optional: Sentry integration for error tracking (recommended but not required for v1)

Architecture Notes:

  • Async error handling must not block the event loop
  • Retry logic should use exponential backoff with jitter (avoid thundering herd)
  • Logs should include PluralKit request context (duration, status code, member_id) for debugging
  • Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps

Risks & Mitigations:

  • Risk: Logging overhead impacts performance
    • Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
  • Risk: Backoff strategy causes noticeable delays when DB is down
    • Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees reaction quickly

Dependencies: Phases 1-4 (all features working; Phase 5 hardens them)


Dependency Chain

Phase 1: Foundation (Discord client, PluralKit detection, database)
    ↓
Phase 2: Translation (emoji parsing, lookups, auto-translation)
    ↓
Phase 3: Teaching (commands to add/update meanings; audit trail)
    ↓
Phase 4: Configuration (per-server auto/on-demand toggle)
    ↓
Phase 5: Polish (logging, retry logic, production hardening)

Critical Path: Phase 1 → Phase 2 → Phase 3 (core value) Optional Path: Phase 4 (customization), Phase 5 (hardening)


Requirement Coverage Summary

Total v1 Requirements: 33 Mapped to Phases: 33 Unmapped: 0 ✓

Coverage by Category:

  • Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
  • Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
  • Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
  • Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
  • Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
  • Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
  • Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
  • Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
  • Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4

Key Decisions & Rationale

Decision Rationale Phase
Global emoji dictionary Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge 1-3
Learning-based system Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. 3
PluralKit webhook integration Standard plural system tool; webhook dispatch is instant and free (vs API polling) 1
Auto mode as default Translates automatically; more accessible for Vivi (no extra action needed) 2, 4
Per-server configuration Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary 4
Plain text responses only Accessibility for dysgraphia; avoids confusion with Discord formatting 2
No context inference Meanings learned explicitly; avoids false positives and keeps system transparent Out of Scope
Five-phase structure Mirrors research recommendations; each phase delivers measurable value All

Key Metrics & Success Indicators

Per Phase:

  • Phase 1: >99% detection accuracy, zero false positives in testing
  • Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
  • Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
  • Phase 4: Settings persist across restart, multi-server support verified
  • Phase 5: <0.1% error rate, all failures logged and alertable

Overall:

  • Time to MVP (Phases 1-2): 3-4 weeks
  • Time to v1 (Phases 1-5): 5-6 weeks
  • Requirements per phase: 3-7 (manageable scope)
  • Value delivered: Incremental (each phase adds core functionality)

Known Limitations & v2+ Backlog

Intentionally v2+:

  • Per-server emoji overrides (global dictionary only in v1)
  • Reaction-based on-demand translation (slash command placeholder only)
  • Analytics dashboard (/stats, /emoji-list)
  • Moderation UI (flag/approve/reject meanings)
  • Multi-language emoji meanings
  • Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)

Implementation Notes

Tech Stack:

  • discord.py 2.6.4 (async-first, native slash commands)
  • SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
  • aiosqlite (async database access)
  • emoji 2.11.0 (Unicode 15.0 support)
  • pydantic 2.5.0 (data validation)
  • Railway Cloud for hosting (free tier for MVP)

Code Organization:

  • bot.py - Main discord.py client, event loop
  • cogs/detection.py - Message event handler, PluralKit detection
  • cogs/translation.py - Emoji parsing, dictionary lookup, composition
  • cogs/teaching.py - /teach, /meaning, /correct commands
  • cogs/config.py - /config command, per-server settings
  • database.py - SQLAlchemy ORM, async queries
  • logging.py - Structured JSON logging, retry logic

Testing Strategy:

  • Unit tests: emoji parsing, dictionary lookups, composition
  • Integration tests: PluralKit detection with mock webhooks
  • Accessibility tests: response length, emoji names for screen readers
  • Load tests: 100+ servers, 1000+ emoji, response times

Roadmap created: 2025-01-29 Ready for Phase 1 planning Depth: Quick (5 phases, natural boundaries)