- Phase 1 (Foundation): PluralKit detection + database setup - Phase 2 (Translation Engine): Emoji parsing + auto-translate - Phase 3 (Teaching System): User commands to learn emoji meanings - Phase 4 (Configuration): Per-server settings + scaling - Phase 5 (Polish): Logging + production hardening 100% requirement coverage: all 33 v1 requirements mapped to exactly one phase. Dependencies identified: Phase 1 → 2 → 3 → 4 → 5 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
402 lines
19 KiB
Markdown
402 lines
19 KiB
Markdown
# Roadmap: Vivi Speech Translator
|
||
|
||
**Vision:** Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.
|
||
|
||
**Phases:** 5 | **Requirements Covered:** 33/33 | **Coverage:** 100% ✓
|
||
|
||
---
|
||
|
||
## Phase Overview
|
||
|
||
| Phase | Name | Goal | Requirements | Success Criteria |
|
||
|-------|------|------|--------------|------------------|
|
||
| 1 | Foundation | Reliably detect Vivi's PluralKit messages and set up database | DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 | 4 |
|
||
| 2 | Translation Engine | Parse emojis and translate to text with auto mode | TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 | 5 |
|
||
| 3 | Teaching System | Let users teach emoji meanings via simple commands | TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 | 5 |
|
||
| 4 | Configuration & Scaling | Add per-server settings and advanced features | CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 | 4 |
|
||
| 5 | Production Polish | Hardening, logging, and reliability | ERROR-02, ERROR-03 | 3 |
|
||
|
||
---
|
||
|
||
## Phase Breakdown
|
||
|
||
### Phase 1: Foundation (Weeks 1-2)
|
||
|
||
**Goal:** Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.
|
||
|
||
**Why First:**
|
||
- PluralKit detection is a load-bearing component (foundation for all translation)
|
||
- Database setup enables all future phases
|
||
- Demonstrates proof of concept that detection works reliably
|
||
|
||
**Requirements Covered:**
|
||
- DETECT-01: Bot detects Vivi via PluralKit webhook
|
||
- DETECT-02: Bot ignores non-Vivi messages (no false positives)
|
||
- DETECT-03: Bot works in channels and DMs
|
||
- DB-01: Global emoji dictionary persists
|
||
- GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)
|
||
|
||
**Success Criteria:**
|
||
|
||
1. Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
|
||
2. Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
|
||
3. Database (SQLite for MVP) initializes on bot startup without errors
|
||
4. Emoji dictionary table and server configuration table created and readable
|
||
|
||
**Key Implementation Tasks:**
|
||
- Set up discord.py client with correct intents (GUILDS, MEMBERS)
|
||
- Implement webhook detection (check message.webhook_id)
|
||
- Implement PluralKit API query to verify member_id matches Vivi's ID
|
||
- Initialize SQLite database with emoji_dictionary and server_config tables
|
||
- Test edge cases: Vivi's message edits, reactions, message deletes, DM context
|
||
|
||
**Architecture Notes:**
|
||
- Use aiosqlite for async database access (MVP)
|
||
- Cache PluralKit member list with 1-hour TTL to reduce API calls
|
||
- Use nacl.signing to verify PluralKit webhook signatures where possible
|
||
- Design message handler as a cog (modular for future systems)
|
||
|
||
**Risks & Mitigations:**
|
||
- Risk: PluralKit API rate limits during initialization (10/sec)
|
||
- Mitigation: Implement webhook dispatch instead of polling; cache member list
|
||
- Risk: Message content intent denied by Discord
|
||
- Mitigation: Design for slash commands as primary path; message content as optional enhancement
|
||
|
||
**Dependencies:** None (Phase 1 is the foundation)
|
||
|
||
---
|
||
|
||
### Phase 2: Translation Engine (Weeks 3-4)
|
||
|
||
**Goal:** Parse emoji sequences from Vivi's messages and translate to text automatically.
|
||
|
||
**Why Second:**
|
||
- Builds on Phase 1's detection foundation (no detection = nothing to translate)
|
||
- Delivers core user value: Vivi posts, others understand
|
||
- Unblocks Phase 3 (teaching system requires working translation to be useful)
|
||
- Enables MVP validation with early users
|
||
|
||
**Requirements Covered:**
|
||
- TRANS-01: Parse standard emojis (😷, ❌, 2️⃣)
|
||
- TRANS-02: Parse custom Discord emojis (`:me1:`, etc.)
|
||
- TRANS-03: Translate and reply automatically (auto mode)
|
||
- TRANS-04: Read left-to-right composition (preserve emoji order)
|
||
- TRANS-05: Plain text responses only (no emoji-only translations)
|
||
- TRANS-06: Concise, clear responses (respects message length limits)
|
||
- DETECT-04: Reliably detect edits and reactions on Vivi messages
|
||
- UNK-01: Skip unknown emojis (only translate known ones)
|
||
- ERROR-01: React with ❌ on translation errors (graceful failure)
|
||
- CONFIG-02: Default mode is auto (bot translates every Vivi message)
|
||
- A11Y-01: Plain text accessible format (no heavy formatting)
|
||
- A11Y-04: Use slash commands (better keyboard navigation than buttons)
|
||
- DB-02: Emoji meanings shared across all Discord servers (one global dictionary)
|
||
|
||
**Success Criteria:**
|
||
|
||
1. Bot translates `:me1:😷 2️⃣` sequence to "Vivi is sick" and shows which emoji are unknown
|
||
2. Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
|
||
3. Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
|
||
4. Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run `/teach emoji meaning`")
|
||
5. If database query fails, bot reacts with ❌ instead of crashing or spamming errors
|
||
|
||
**Key Implementation Tasks:**
|
||
- Use `emoji` library 2.11.0 for standard emoji parsing (Unicode 15.0 support)
|
||
- Regex pattern for custom Discord emoji detection (`:emoji_name:`)
|
||
- Dictionary lookup function (emoji → meaning) with O(1) query time
|
||
- Translation composition function (assemble meanings into natural language)
|
||
- Error handler that reacts with ❌ and logs error
|
||
- Message edit detection (track message ID, compare old vs new emoji)
|
||
- Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors
|
||
|
||
**Architecture Notes:**
|
||
- Translation engine should be stateless (all state in database)
|
||
- Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
|
||
- Handle emoji variants correctly (normalize input with NFD if needed)
|
||
- Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
|
||
- Response format: "Vivi says: [translation]" for clarity
|
||
|
||
**Risks & Mitigations:**
|
||
- Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
|
||
- Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
|
||
- Risk: Bot floods channel with translations (rate limiting)
|
||
- Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
|
||
- Risk: Message edits race condition (bot editing response while Vivi edits message)
|
||
- Mitigation: Post new translation instead of editing; queue requests with small delay if race detected
|
||
|
||
**Dependencies:** Phase 1 (detection + database)
|
||
|
||
---
|
||
|
||
### Phase 3: Teaching System (Weeks 5-6)
|
||
|
||
**Goal:** Enable users to teach the bot new emoji meanings via simple commands.
|
||
|
||
**Why Third:**
|
||
- Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
|
||
- Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
|
||
- Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
|
||
- Addresses dysgraphia accessibility: teaching interface must be ultra-simple
|
||
|
||
**Requirements Covered:**
|
||
- TEACH-01: `/teach emoji "meaning"` command
|
||
- TEACH-02: Bot confirms what it learned (shows emoji and meaning)
|
||
- TEACH-03: Meanings stored globally (shared across all servers)
|
||
- TEACH-04: `/meaning emoji` query command
|
||
- TEACH-05: `/correct emoji "new meaning"` update command
|
||
- TEACH-06: Audit trail (who, what, when for all changes)
|
||
- TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
|
||
- UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
|
||
- UNK-03: Teaching prompt is accessible (simple command: `/teach emoji "meaning"`)
|
||
- A11Y-02: Simple command syntax (one-liner, no complex structure)
|
||
- A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
|
||
- A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
|
||
- DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
|
||
- GEN-03: Modular code for other systems (teaching system not Vivi-specific)
|
||
|
||
**Success Criteria:**
|
||
|
||
1. User runs `/teach 🎭 "happy performance"`, bot confirms "Taught: 🎭 = happy performance" with emoji shown
|
||
2. Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
|
||
3. User runs `/meaning 🎭`, bot replies with current meaning (or "Unknown emoji")
|
||
4. User runs `/correct 🎭 "joyful"`, meaning updates and audit trail records who changed it and when
|
||
5. `/teach` command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)
|
||
|
||
**Key Implementation Tasks:**
|
||
- Discord.py slash command handler for `/teach emoji "meaning"`
|
||
- Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
|
||
- Database insert into emoji_dictionary with user_id and timestamp
|
||
- `/meaning emoji` query command with error handling
|
||
- `/correct emoji "new meaning"` update command with audit trail
|
||
- Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
|
||
- Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
|
||
- Permission checks: log who taught what, but don't restrict based on role (accessibility)
|
||
- Test that emojis can be taught, corrected, queried in rapid succession
|
||
|
||
**Architecture Notes:**
|
||
- Teaching commands should be accessible to all users (not admin-only; logged for accountability)
|
||
- Confirmation should always show the emoji visually (immediate visual feedback)
|
||
- Responses must be short (<2 sentences) for dysgraphia accessibility
|
||
- Audit trail enables future `/undo` feature (deferred to v2)
|
||
- Global dictionary shared across servers; per-server overrides deferred to Phase 4+
|
||
|
||
**Risks & Mitigations:**
|
||
- Risk: Users teach wrong/inappropriate meanings (spam, trolling)
|
||
- Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
|
||
- Risk: Teaching interface too complex for Vivi (with dysgraphia)
|
||
- Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
|
||
- Risk: Conflicting emoji meanings across servers
|
||
- Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides
|
||
|
||
**Dependencies:** Phase 2 (translation must work so teaching has value)
|
||
|
||
---
|
||
|
||
### Phase 4: Configuration & Scaling (Week 7)
|
||
|
||
**Goal:** Add per-server settings and prepare for scaling.
|
||
|
||
**Why Fourth:**
|
||
- Builds on Phases 1-3 foundation (all core features working)
|
||
- Enables server customization without breaking global emoji dictionary
|
||
- Prepares architecture for multi-server scaling and future systems
|
||
- Adds operator controls (config commands)
|
||
|
||
**Requirements Covered:**
|
||
- CONFIG-01: `/config auto-translate on|off` toggle per server
|
||
- CONFIG-03: Per-server persistence (setting survives bot restart)
|
||
- CONFIG-04: Admin-only changes (permission check on `/config` command)
|
||
- DB-03: Per-server config table
|
||
- GEN-02: Architecture supports per-system overrides (design for future multi-system)
|
||
|
||
**Success Criteria:**
|
||
|
||
1. Server admin runs `/config auto-translate off`, future Vivi messages don't auto-translate (bot is silent by default)
|
||
2. Another server has auto enabled; both work independently (no crosstalk)
|
||
3. Setting persists across bot restart (database query returns correct value)
|
||
4. Non-admin user runs `/config auto-translate on`, bot rejects with "Admin only" message
|
||
5. Default for new servers is auto mode enabled (true by default in code)
|
||
|
||
**Key Implementation Tasks:**
|
||
- `/config auto-translate <on|off>` slash command
|
||
- Permission check (admin-only; use discord.py's default_member_permissions)
|
||
- server_config table update/insert (guild_id as PK, auto_translate as boolean)
|
||
- Modify message handler to check per-guild setting before auto-translating
|
||
- Implement on-demand mode placeholder (manual translation via `/translate` command; full reaction-based mode deferred to v2)
|
||
- Cache per-server settings for performance (1-minute TTL)
|
||
- Test: change setting, verify immediate effect; verify effect persists after restart
|
||
|
||
**Architecture Notes:**
|
||
- On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
|
||
- Per-server config indexed by guild_id for O(1) lookup
|
||
- Cache server settings in memory with TTL to avoid DB hammering
|
||
- Design allows per-system overrides in future (deferred to v2)
|
||
|
||
**Risks & Mitigations:**
|
||
- Risk: Per-server override conflicts if not careful
|
||
- Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
|
||
- Risk: Config command is confusing to admins
|
||
- Mitigation: Clear help text; only two options (on/off); simple feedback message
|
||
|
||
**Dependencies:** Phases 1-3 (all features working; config customizes them)
|
||
|
||
---
|
||
|
||
### Phase 5: Production Polish (Week 8+)
|
||
|
||
**Goal:** Production hardening, logging, error handling, and monitoring.
|
||
|
||
**Why Last:**
|
||
- Follows all feature phases (features must be complete before hardening)
|
||
- Improves reliability and debuggability (enables diagnosis of issues in production)
|
||
- Prepares for public adoption by other systems or larger communities
|
||
|
||
**Requirements Covered:**
|
||
- ERROR-02: Bot logging for debugging (structured JSON logs)
|
||
- ERROR-03: Exponential backoff for failed DB operations and API calls
|
||
|
||
**Success Criteria:**
|
||
|
||
1. Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
|
||
2. Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
|
||
3. Bot retries failed database operations (same backoff strategy; max 5 attempts)
|
||
4. Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
|
||
5. All error logs include context (guild_id, user_id, emoji attempted, operation type)
|
||
|
||
**Key Implementation Tasks:**
|
||
- Set up Python logging with JSON format (structlog or custom JSON formatter)
|
||
- Implement retry logic with asyncio.sleep backoff for PluralKit API calls
|
||
- Implement retry logic with backoff for database queries
|
||
- Add global exception handler (on_error in discord.py)
|
||
- Comprehensive error documentation (what errors mean, how to diagnose)
|
||
- Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
|
||
- Optional: Sentry integration for error tracking (recommended but not required for v1)
|
||
|
||
**Architecture Notes:**
|
||
- Async error handling must not block the event loop
|
||
- Retry logic should use exponential backoff with jitter (avoid thundering herd)
|
||
- Logs should include PluralKit request context (duration, status code, member_id) for debugging
|
||
- Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps
|
||
|
||
**Risks & Mitigations:**
|
||
- Risk: Logging overhead impacts performance
|
||
- Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
|
||
- Risk: Backoff strategy causes noticeable delays when DB is down
|
||
- Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly
|
||
|
||
**Dependencies:** Phases 1-4 (all features working; Phase 5 hardens them)
|
||
|
||
---
|
||
|
||
## Dependency Chain
|
||
|
||
```
|
||
Phase 1: Foundation (Discord client, PluralKit detection, database)
|
||
↓
|
||
Phase 2: Translation (emoji parsing, lookups, auto-translation)
|
||
↓
|
||
Phase 3: Teaching (commands to add/update meanings; audit trail)
|
||
↓
|
||
Phase 4: Configuration (per-server auto/on-demand toggle)
|
||
↓
|
||
Phase 5: Polish (logging, retry logic, production hardening)
|
||
```
|
||
|
||
**Critical Path:** Phase 1 → Phase 2 → Phase 3 (core value)
|
||
**Optional Path:** Phase 4 (customization), Phase 5 (hardening)
|
||
|
||
---
|
||
|
||
## Requirement Coverage Summary
|
||
|
||
**Total v1 Requirements:** 33
|
||
**Mapped to Phases:** 33
|
||
**Unmapped:** 0 ✓
|
||
|
||
**Coverage by Category:**
|
||
- Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
|
||
- Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
|
||
- Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
|
||
- Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
|
||
- Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
|
||
- Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
|
||
- Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
|
||
- Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
|
||
- Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4
|
||
|
||
---
|
||
|
||
## Key Decisions & Rationale
|
||
|
||
| Decision | Rationale | Phase |
|
||
|----------|-----------|-------|
|
||
| Global emoji dictionary | Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge | 1-3 |
|
||
| Learning-based system | Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. | 3 |
|
||
| PluralKit webhook integration | Standard plural system tool; webhook dispatch is instant and free (vs API polling) | 1 |
|
||
| Auto mode as default | Translates automatically; more accessible for Vivi (no extra action needed) | 2, 4 |
|
||
| Per-server configuration | Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary | 4 |
|
||
| Plain text responses only | Accessibility for dysgraphia; avoids confusion with Discord formatting | 2 |
|
||
| No context inference | Meanings learned explicitly; avoids false positives and keeps system transparent | Out of Scope |
|
||
| Five-phase structure | Mirrors research recommendations; each phase delivers measurable value | All |
|
||
|
||
---
|
||
|
||
## Key Metrics & Success Indicators
|
||
|
||
**Per Phase:**
|
||
- Phase 1: >99% detection accuracy, zero false positives in testing
|
||
- Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
|
||
- Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
|
||
- Phase 4: Settings persist across restart, multi-server support verified
|
||
- Phase 5: <0.1% error rate, all failures logged and alertable
|
||
|
||
**Overall:**
|
||
- Time to MVP (Phases 1-2): 3-4 weeks
|
||
- Time to v1 (Phases 1-5): 5-6 weeks
|
||
- Requirements per phase: 3-7 (manageable scope)
|
||
- Value delivered: Incremental (each phase adds core functionality)
|
||
|
||
---
|
||
|
||
## Known Limitations & v2+ Backlog
|
||
|
||
**Intentionally v2+:**
|
||
- Per-server emoji overrides (global dictionary only in v1)
|
||
- Reaction-based on-demand translation (slash command placeholder only)
|
||
- Analytics dashboard (`/stats`, `/emoji-list`)
|
||
- Moderation UI (flag/approve/reject meanings)
|
||
- Multi-language emoji meanings
|
||
- Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)
|
||
|
||
---
|
||
|
||
## Implementation Notes
|
||
|
||
**Tech Stack:**
|
||
- discord.py 2.6.4 (async-first, native slash commands)
|
||
- SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
|
||
- aiosqlite (async database access)
|
||
- emoji 2.11.0 (Unicode 15.0 support)
|
||
- pydantic 2.5.0 (data validation)
|
||
- Railway Cloud for hosting (free tier for MVP)
|
||
|
||
**Code Organization:**
|
||
- `bot.py` - Main discord.py client, event loop
|
||
- `cogs/detection.py` - Message event handler, PluralKit detection
|
||
- `cogs/translation.py` - Emoji parsing, dictionary lookup, composition
|
||
- `cogs/teaching.py` - `/teach`, `/meaning`, `/correct` commands
|
||
- `cogs/config.py` - `/config` command, per-server settings
|
||
- `database.py` - SQLAlchemy ORM, async queries
|
||
- `logging.py` - Structured JSON logging, retry logic
|
||
|
||
**Testing Strategy:**
|
||
- Unit tests: emoji parsing, dictionary lookups, composition
|
||
- Integration tests: PluralKit detection with mock webhooks
|
||
- Accessibility tests: response length, emoji names for screen readers
|
||
- Load tests: 100+ servers, 1000+ emoji, response times
|
||
|
||
---
|
||
|
||
**Roadmap created: 2025-01-29**
|
||
**Ready for Phase 1 planning**
|
||
**Depth: Quick (5 phases, natural boundaries)**
|