Files
Vivi-Speech/.planning/ROADMAP.md
Dani B 3e89f67666 docs: create 5-phase roadmap with 33 requirements mapped
- Phase 1 (Foundation): PluralKit detection + database setup
- Phase 2 (Translation Engine): Emoji parsing + auto-translate
- Phase 3 (Teaching System): User commands to learn emoji meanings
- Phase 4 (Configuration): Per-server settings + scaling
- Phase 5 (Polish): Logging + production hardening

100% requirement coverage: all 33 v1 requirements mapped to exactly one phase.
Dependencies identified: Phase 1 → 2 → 3 → 4 → 5

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:08:02 -05:00

402 lines
19 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Roadmap: Vivi Speech Translator
**Vision:** Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.
**Phases:** 5 | **Requirements Covered:** 33/33 | **Coverage:** 100% ✓
---
## Phase Overview
| Phase | Name | Goal | Requirements | Success Criteria |
|-------|------|------|--------------|------------------|
| 1 | Foundation | Reliably detect Vivi's PluralKit messages and set up database | DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 | 4 |
| 2 | Translation Engine | Parse emojis and translate to text with auto mode | TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 | 5 |
| 3 | Teaching System | Let users teach emoji meanings via simple commands | TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 | 5 |
| 4 | Configuration & Scaling | Add per-server settings and advanced features | CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 | 4 |
| 5 | Production Polish | Hardening, logging, and reliability | ERROR-02, ERROR-03 | 3 |
---
## Phase Breakdown
### Phase 1: Foundation (Weeks 1-2)
**Goal:** Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.
**Why First:**
- PluralKit detection is a load-bearing component (foundation for all translation)
- Database setup enables all future phases
- Demonstrates proof of concept that detection works reliably
**Requirements Covered:**
- DETECT-01: Bot detects Vivi via PluralKit webhook
- DETECT-02: Bot ignores non-Vivi messages (no false positives)
- DETECT-03: Bot works in channels and DMs
- DB-01: Global emoji dictionary persists
- GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)
**Success Criteria:**
1. Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
2. Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
3. Database (SQLite for MVP) initializes on bot startup without errors
4. Emoji dictionary table and server configuration table created and readable
**Key Implementation Tasks:**
- Set up discord.py client with correct intents (GUILDS, MEMBERS)
- Implement webhook detection (check message.webhook_id)
- Implement PluralKit API query to verify member_id matches Vivi's ID
- Initialize SQLite database with emoji_dictionary and server_config tables
- Test edge cases: Vivi's message edits, reactions, message deletes, DM context
**Architecture Notes:**
- Use aiosqlite for async database access (MVP)
- Cache PluralKit member list with 1-hour TTL to reduce API calls
- Use nacl.signing to verify PluralKit webhook signatures where possible
- Design message handler as a cog (modular for future systems)
**Risks & Mitigations:**
- Risk: PluralKit API rate limits during initialization (10/sec)
- Mitigation: Implement webhook dispatch instead of polling; cache member list
- Risk: Message content intent denied by Discord
- Mitigation: Design for slash commands as primary path; message content as optional enhancement
**Dependencies:** None (Phase 1 is the foundation)
---
### Phase 2: Translation Engine (Weeks 3-4)
**Goal:** Parse emoji sequences from Vivi's messages and translate to text automatically.
**Why Second:**
- Builds on Phase 1's detection foundation (no detection = nothing to translate)
- Delivers core user value: Vivi posts, others understand
- Unblocks Phase 3 (teaching system requires working translation to be useful)
- Enables MVP validation with early users
**Requirements Covered:**
- TRANS-01: Parse standard emojis (😷, ❌, 2⃣)
- TRANS-02: Parse custom Discord emojis (`:me1:`, etc.)
- TRANS-03: Translate and reply automatically (auto mode)
- TRANS-04: Read left-to-right composition (preserve emoji order)
- TRANS-05: Plain text responses only (no emoji-only translations)
- TRANS-06: Concise, clear responses (respects message length limits)
- DETECT-04: Reliably detect edits and reactions on Vivi messages
- UNK-01: Skip unknown emojis (only translate known ones)
- ERROR-01: React with ❌ on translation errors (graceful failure)
- CONFIG-02: Default mode is auto (bot translates every Vivi message)
- A11Y-01: Plain text accessible format (no heavy formatting)
- A11Y-04: Use slash commands (better keyboard navigation than buttons)
- DB-02: Emoji meanings shared across all Discord servers (one global dictionary)
**Success Criteria:**
1. Bot translates `:me1:😷 2⃣` sequence to "Vivi is sick" and shows which emoji are unknown
2. Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
3. Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
4. Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run `/teach emoji meaning`")
5. If database query fails, bot reacts with ❌ instead of crashing or spamming errors
**Key Implementation Tasks:**
- Use `emoji` library 2.11.0 for standard emoji parsing (Unicode 15.0 support)
- Regex pattern for custom Discord emoji detection (`:emoji_name:`)
- Dictionary lookup function (emoji → meaning) with O(1) query time
- Translation composition function (assemble meanings into natural language)
- Error handler that reacts with ❌ and logs error
- Message edit detection (track message ID, compare old vs new emoji)
- Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors
**Architecture Notes:**
- Translation engine should be stateless (all state in database)
- Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
- Handle emoji variants correctly (normalize input with NFD if needed)
- Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
- Response format: "Vivi says: [translation]" for clarity
**Risks & Mitigations:**
- Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
- Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
- Risk: Bot floods channel with translations (rate limiting)
- Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
- Risk: Message edits race condition (bot editing response while Vivi edits message)
- Mitigation: Post new translation instead of editing; queue requests with small delay if race detected
**Dependencies:** Phase 1 (detection + database)
---
### Phase 3: Teaching System (Weeks 5-6)
**Goal:** Enable users to teach the bot new emoji meanings via simple commands.
**Why Third:**
- Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
- Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
- Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
- Addresses dysgraphia accessibility: teaching interface must be ultra-simple
**Requirements Covered:**
- TEACH-01: `/teach emoji "meaning"` command
- TEACH-02: Bot confirms what it learned (shows emoji and meaning)
- TEACH-03: Meanings stored globally (shared across all servers)
- TEACH-04: `/meaning emoji` query command
- TEACH-05: `/correct emoji "new meaning"` update command
- TEACH-06: Audit trail (who, what, when for all changes)
- TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
- UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
- UNK-03: Teaching prompt is accessible (simple command: `/teach emoji "meaning"`)
- A11Y-02: Simple command syntax (one-liner, no complex structure)
- A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
- A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
- DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
- GEN-03: Modular code for other systems (teaching system not Vivi-specific)
**Success Criteria:**
1. User runs `/teach 🎭 "happy performance"`, bot confirms "Taught: 🎭 = happy performance" with emoji shown
2. Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
3. User runs `/meaning 🎭`, bot replies with current meaning (or "Unknown emoji")
4. User runs `/correct 🎭 "joyful"`, meaning updates and audit trail records who changed it and when
5. `/teach` command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)
**Key Implementation Tasks:**
- Discord.py slash command handler for `/teach emoji "meaning"`
- Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
- Database insert into emoji_dictionary with user_id and timestamp
- `/meaning emoji` query command with error handling
- `/correct emoji "new meaning"` update command with audit trail
- Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
- Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
- Permission checks: log who taught what, but don't restrict based on role (accessibility)
- Test that emojis can be taught, corrected, queried in rapid succession
**Architecture Notes:**
- Teaching commands should be accessible to all users (not admin-only; logged for accountability)
- Confirmation should always show the emoji visually (immediate visual feedback)
- Responses must be short (<2 sentences) for dysgraphia accessibility
- Audit trail enables future `/undo` feature (deferred to v2)
- Global dictionary shared across servers; per-server overrides deferred to Phase 4+
**Risks & Mitigations:**
- Risk: Users teach wrong/inappropriate meanings (spam, trolling)
- Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
- Risk: Teaching interface too complex for Vivi (with dysgraphia)
- Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
- Risk: Conflicting emoji meanings across servers
- Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides
**Dependencies:** Phase 2 (translation must work so teaching has value)
---
### Phase 4: Configuration & Scaling (Week 7)
**Goal:** Add per-server settings and prepare for scaling.
**Why Fourth:**
- Builds on Phases 1-3 foundation (all core features working)
- Enables server customization without breaking global emoji dictionary
- Prepares architecture for multi-server scaling and future systems
- Adds operator controls (config commands)
**Requirements Covered:**
- CONFIG-01: `/config auto-translate on|off` toggle per server
- CONFIG-03: Per-server persistence (setting survives bot restart)
- CONFIG-04: Admin-only changes (permission check on `/config` command)
- DB-03: Per-server config table
- GEN-02: Architecture supports per-system overrides (design for future multi-system)
**Success Criteria:**
1. Server admin runs `/config auto-translate off`, future Vivi messages don't auto-translate (bot is silent by default)
2. Another server has auto enabled; both work independently (no crosstalk)
3. Setting persists across bot restart (database query returns correct value)
4. Non-admin user runs `/config auto-translate on`, bot rejects with "Admin only" message
5. Default for new servers is auto mode enabled (true by default in code)
**Key Implementation Tasks:**
- `/config auto-translate <on|off>` slash command
- Permission check (admin-only; use discord.py's default_member_permissions)
- server_config table update/insert (guild_id as PK, auto_translate as boolean)
- Modify message handler to check per-guild setting before auto-translating
- Implement on-demand mode placeholder (manual translation via `/translate` command; full reaction-based mode deferred to v2)
- Cache per-server settings for performance (1-minute TTL)
- Test: change setting, verify immediate effect; verify effect persists after restart
**Architecture Notes:**
- On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
- Per-server config indexed by guild_id for O(1) lookup
- Cache server settings in memory with TTL to avoid DB hammering
- Design allows per-system overrides in future (deferred to v2)
**Risks & Mitigations:**
- Risk: Per-server override conflicts if not careful
- Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
- Risk: Config command is confusing to admins
- Mitigation: Clear help text; only two options (on/off); simple feedback message
**Dependencies:** Phases 1-3 (all features working; config customizes them)
---
### Phase 5: Production Polish (Week 8+)
**Goal:** Production hardening, logging, error handling, and monitoring.
**Why Last:**
- Follows all feature phases (features must be complete before hardening)
- Improves reliability and debuggability (enables diagnosis of issues in production)
- Prepares for public adoption by other systems or larger communities
**Requirements Covered:**
- ERROR-02: Bot logging for debugging (structured JSON logs)
- ERROR-03: Exponential backoff for failed DB operations and API calls
**Success Criteria:**
1. Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
2. Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
3. Bot retries failed database operations (same backoff strategy; max 5 attempts)
4. Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
5. All error logs include context (guild_id, user_id, emoji attempted, operation type)
**Key Implementation Tasks:**
- Set up Python logging with JSON format (structlog or custom JSON formatter)
- Implement retry logic with asyncio.sleep backoff for PluralKit API calls
- Implement retry logic with backoff for database queries
- Add global exception handler (on_error in discord.py)
- Comprehensive error documentation (what errors mean, how to diagnose)
- Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
- Optional: Sentry integration for error tracking (recommended but not required for v1)
**Architecture Notes:**
- Async error handling must not block the event loop
- Retry logic should use exponential backoff with jitter (avoid thundering herd)
- Logs should include PluralKit request context (duration, status code, member_id) for debugging
- Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps
**Risks & Mitigations:**
- Risk: Logging overhead impacts performance
- Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
- Risk: Backoff strategy causes noticeable delays when DB is down
- Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly
**Dependencies:** Phases 1-4 (all features working; Phase 5 hardens them)
---
## Dependency Chain
```
Phase 1: Foundation (Discord client, PluralKit detection, database)
Phase 2: Translation (emoji parsing, lookups, auto-translation)
Phase 3: Teaching (commands to add/update meanings; audit trail)
Phase 4: Configuration (per-server auto/on-demand toggle)
Phase 5: Polish (logging, retry logic, production hardening)
```
**Critical Path:** Phase 1 → Phase 2 → Phase 3 (core value)
**Optional Path:** Phase 4 (customization), Phase 5 (hardening)
---
## Requirement Coverage Summary
**Total v1 Requirements:** 33
**Mapped to Phases:** 33
**Unmapped:** 0 ✓
**Coverage by Category:**
- Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
- Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
- Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
- Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
- Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
- Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
- Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
- Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
- Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4
---
## Key Decisions & Rationale
| Decision | Rationale | Phase |
|----------|-----------|-------|
| Global emoji dictionary | Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge | 1-3 |
| Learning-based system | Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. | 3 |
| PluralKit webhook integration | Standard plural system tool; webhook dispatch is instant and free (vs API polling) | 1 |
| Auto mode as default | Translates automatically; more accessible for Vivi (no extra action needed) | 2, 4 |
| Per-server configuration | Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary | 4 |
| Plain text responses only | Accessibility for dysgraphia; avoids confusion with Discord formatting | 2 |
| No context inference | Meanings learned explicitly; avoids false positives and keeps system transparent | Out of Scope |
| Five-phase structure | Mirrors research recommendations; each phase delivers measurable value | All |
---
## Key Metrics & Success Indicators
**Per Phase:**
- Phase 1: >99% detection accuracy, zero false positives in testing
- Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
- Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
- Phase 4: Settings persist across restart, multi-server support verified
- Phase 5: <0.1% error rate, all failures logged and alertable
**Overall:**
- Time to MVP (Phases 1-2): 3-4 weeks
- Time to v1 (Phases 1-5): 5-6 weeks
- Requirements per phase: 3-7 (manageable scope)
- Value delivered: Incremental (each phase adds core functionality)
---
## Known Limitations & v2+ Backlog
**Intentionally v2+:**
- Per-server emoji overrides (global dictionary only in v1)
- Reaction-based on-demand translation (slash command placeholder only)
- Analytics dashboard (`/stats`, `/emoji-list`)
- Moderation UI (flag/approve/reject meanings)
- Multi-language emoji meanings
- Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)
---
## Implementation Notes
**Tech Stack:**
- discord.py 2.6.4 (async-first, native slash commands)
- SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
- aiosqlite (async database access)
- emoji 2.11.0 (Unicode 15.0 support)
- pydantic 2.5.0 (data validation)
- Railway Cloud for hosting (free tier for MVP)
**Code Organization:**
- `bot.py` - Main discord.py client, event loop
- `cogs/detection.py` - Message event handler, PluralKit detection
- `cogs/translation.py` - Emoji parsing, dictionary lookup, composition
- `cogs/teaching.py` - `/teach`, `/meaning`, `/correct` commands
- `cogs/config.py` - `/config` command, per-server settings
- `database.py` - SQLAlchemy ORM, async queries
- `logging.py` - Structured JSON logging, retry logic
**Testing Strategy:**
- Unit tests: emoji parsing, dictionary lookups, composition
- Integration tests: PluralKit detection with mock webhooks
- Accessibility tests: response length, emoji names for screen readers
- Load tests: 100+ servers, 1000+ emoji, response times
---
**Roadmap created: 2025-01-29**
**Ready for Phase 1 planning**
**Depth: Quick (5 phases, natural boundaries)**