Vivi-Speech/.planning/ROADMAP.md

# Roadmap: Vivi Speech Translator

**Vision:** Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.

**Phases:** 5 | **Requirements Covered:** 33/33 | **Coverage:** 100% ✓

---

## Phase Overview

| Phase | Name | Goal | Requirements | Success Criteria |
|-------|------|------|--------------|------------------|
| 1 | Foundation | Reliably detect Vivi's PluralKit messages and set up database | DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01 | 4 |
| 2 | Translation Engine | Parse emojis and translate to text with auto mode | TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02 | 5 |
| 3 | Teaching System | Let users teach emoji meanings via simple commands | TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03 | 5 |
| 4 | Configuration & Scaling | Add per-server settings and advanced features | CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02 | 4 |
| 5 | Production Polish | Hardening, logging, and reliability | ERROR-02, ERROR-03 | 3 |

---

## Phase Breakdown

### Phase 1: Foundation (Weeks 1-2)

**Goal:** Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.

**Why First:**
- PluralKit detection is a load-bearing component (foundation for all translation)
- Database setup enables all future phases
- Demonstrates proof of concept that detection works reliably

**Requirements Covered:**
- DETECT-01: Bot detects Vivi via PluralKit webhook
- DETECT-02: Bot ignores non-Vivi messages (no false positives)
- DETECT-03: Bot works in channels and DMs
- DB-01: Global emoji dictionary persists
- GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)

**Success Criteria:**

1. Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
2. Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
3. Database (SQLite for MVP) initializes on bot startup without errors
4. Emoji dictionary table and server configuration table created and readable

**Key Implementation Tasks:**
- Set up discord.py client with correct intents (GUILDS, MEMBERS)
- Implement webhook detection (check message.webhook_id)
- Implement PluralKit API query to verify member_id matches Vivi's ID
- Initialize SQLite database with emoji_dictionary and server_config tables
- Test edge cases: Vivi's message edits, reactions, message deletes, DM context

**Architecture Notes:**
- Use aiosqlite for async database access (MVP)
- Cache PluralKit member list with 1-hour TTL to reduce API calls
- Use nacl.signing to verify PluralKit webhook signatures where possible
- Design message handler as a cog (modular for future systems)

**Risks & Mitigations:**
- Risk: PluralKit API rate limits during initialization (10/sec)
  - Mitigation: Implement webhook dispatch instead of polling; cache member list
- Risk: Message content intent denied by Discord
  - Mitigation: Design for slash commands as primary path; message content as optional enhancement

**Dependencies:** None (Phase 1 is the foundation)

---

### Phase 2: Translation Engine (Weeks 3-4)

**Goal:** Parse emoji sequences from Vivi's messages and translate to text automatically.

**Why Second:**
- Builds on Phase 1's detection foundation (no detection = nothing to translate)
- Delivers core user value: Vivi posts, others understand
- Unblocks Phase 3 (teaching system requires working translation to be useful)
- Enables MVP validation with early users

**Requirements Covered:**
- TRANS-01: Parse standard emojis (😷, ❌, 2️⃣)
- TRANS-02: Parse custom Discord emojis (`:me1:`, etc.)
- TRANS-03: Translate and reply automatically (auto mode)
- TRANS-04: Read left-to-right composition (preserve emoji order)
- TRANS-05: Plain text responses only (no emoji-only translations)
- TRANS-06: Concise, clear responses (respects message length limits)
- DETECT-04: Reliably detect edits and reactions on Vivi messages
- UNK-01: Skip unknown emojis (only translate known ones)
- ERROR-01: React with ❌ on translation errors (graceful failure)
- CONFIG-02: Default mode is auto (bot translates every Vivi message)
- A11Y-01: Plain text accessible format (no heavy formatting)
- A11Y-04: Use slash commands (better keyboard navigation than buttons)
- DB-02: Emoji meanings shared across all Discord servers (one global dictionary)

**Success Criteria:**

1. Bot translates `:me1:😷 2️⃣` sequence to "Vivi is sick" and shows which emoji are unknown
2. Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
3. Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
4. Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run `/teach emoji meaning`")
5. If database query fails, bot reacts with ❌ instead of crashing or spamming errors

**Key Implementation Tasks:**
- Use `emoji` library 2.11.0 for standard emoji parsing (Unicode 15.0 support)
- Regex pattern for custom Discord emoji detection (`:emoji_name:`)
- Dictionary lookup function (emoji → meaning) with O(1) query time
- Translation composition function (assemble meanings into natural language)
- Error handler that reacts with ❌ and logs error
- Message edit detection (track message ID, compare old vs new emoji)
- Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors

**Architecture Notes:**
- Translation engine should be stateless (all state in database)
- Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
- Handle emoji variants correctly (normalize input with NFD if needed)
- Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
- Response format: "Vivi says: [translation]" for clarity

**Risks & Mitigations:**
- Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
  - Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
- Risk: Bot floods channel with translations (rate limiting)
  - Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
- Risk: Message edits race condition (bot editing response while Vivi edits message)
  - Mitigation: Post new translation instead of editing; queue requests with small delay if race detected

**Dependencies:** Phase 1 (detection + database)

---

### Phase 3: Teaching System (Weeks 5-6)

**Goal:** Enable users to teach the bot new emoji meanings via simple commands.

**Why Third:**
- Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
- Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
- Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
- Addresses dysgraphia accessibility: teaching interface must be ultra-simple

**Requirements Covered:**
- TEACH-01: `/teach emoji "meaning"` command
- TEACH-02: Bot confirms what it learned (shows emoji and meaning)
- TEACH-03: Meanings stored globally (shared across all servers)
- TEACH-04: `/meaning emoji` query command
- TEACH-05: `/correct emoji "new meaning"` update command
- TEACH-06: Audit trail (who, what, when for all changes)
- TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
- UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
- UNK-03: Teaching prompt is accessible (simple command: `/teach emoji "meaning"`)
- A11Y-02: Simple command syntax (one-liner, no complex structure)
- A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
- A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
- DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
- GEN-03: Modular code for other systems (teaching system not Vivi-specific)

**Success Criteria:**

1. User runs `/teach 🎭 "happy performance"`, bot confirms "Taught: 🎭 = happy performance" with emoji shown
2. Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
3. User runs `/meaning 🎭`, bot replies with current meaning (or "Unknown emoji")
4. User runs `/correct 🎭 "joyful"`, meaning updates and audit trail records who changed it and when
5. `/teach` command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)

**Key Implementation Tasks:**
- Discord.py slash command handler for `/teach emoji "meaning"`
- Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
- Database insert into emoji_dictionary with user_id and timestamp
- `/meaning emoji` query command with error handling
- `/correct emoji "new meaning"` update command with audit trail
- Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
- Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
- Permission checks: log who taught what, but don't restrict based on role (accessibility)
- Test that emojis can be taught, corrected, queried in rapid succession

**Architecture Notes:**
- Teaching commands should be accessible to all users (not admin-only; logged for accountability)
- Confirmation should always show the emoji visually (immediate visual feedback)
- Responses must be short (<2 sentences) for dysgraphia accessibility
- Audit trail enables future `/undo` feature (deferred to v2)
- Global dictionary shared across servers; per-server overrides deferred to Phase 4+

**Risks & Mitigations:**
- Risk: Users teach wrong/inappropriate meanings (spam, trolling)
  - Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
- Risk: Teaching interface too complex for Vivi (with dysgraphia)
  - Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
- Risk: Conflicting emoji meanings across servers
  - Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides

**Dependencies:** Phase 2 (translation must work so teaching has value)

---

### Phase 4: Configuration & Scaling (Week 7)

**Goal:** Add per-server settings and prepare for scaling.

**Why Fourth:**
- Builds on Phases 1-3 foundation (all core features working)
- Enables server customization without breaking global emoji dictionary
- Prepares architecture for multi-server scaling and future systems
- Adds operator controls (config commands)

**Requirements Covered:**
- CONFIG-01: `/config auto-translate on|off` toggle per server
- CONFIG-03: Per-server persistence (setting survives bot restart)
- CONFIG-04: Admin-only changes (permission check on `/config` command)
- DB-03: Per-server config table
- GEN-02: Architecture supports per-system overrides (design for future multi-system)

**Success Criteria:**

1. Server admin runs `/config auto-translate off`, future Vivi messages don't auto-translate (bot is silent by default)
2. Another server has auto enabled; both work independently (no crosstalk)
3. Setting persists across bot restart (database query returns correct value)
4. Non-admin user runs `/config auto-translate on`, bot rejects with "Admin only" message
5. Default for new servers is auto mode enabled (true by default in code)

**Key Implementation Tasks:**
- `/config auto-translate <on|off>` slash command
- Permission check (admin-only; use discord.py's default_member_permissions)
- server_config table update/insert (guild_id as PK, auto_translate as boolean)
- Modify message handler to check per-guild setting before auto-translating
- Implement on-demand mode placeholder (manual translation via `/translate` command; full reaction-based mode deferred to v2)
- Cache per-server settings for performance (1-minute TTL)
- Test: change setting, verify immediate effect; verify effect persists after restart

**Architecture Notes:**
- On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
- Per-server config indexed by guild_id for O(1) lookup
- Cache server settings in memory with TTL to avoid DB hammering
- Design allows per-system overrides in future (deferred to v2)

**Risks & Mitigations:**
- Risk: Per-server override conflicts if not careful
  - Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
- Risk: Config command is confusing to admins
  - Mitigation: Clear help text; only two options (on/off); simple feedback message

**Dependencies:** Phases 1-3 (all features working; config customizes them)

---

### Phase 5: Production Polish (Week 8+)

**Goal:** Production hardening, logging, error handling, and monitoring.

**Why Last:**
- Follows all feature phases (features must be complete before hardening)
- Improves reliability and debuggability (enables diagnosis of issues in production)
- Prepares for public adoption by other systems or larger communities

**Requirements Covered:**
- ERROR-02: Bot logging for debugging (structured JSON logs)
- ERROR-03: Exponential backoff for failed DB operations and API calls

**Success Criteria:**

1. Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
2. Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
3. Bot retries failed database operations (same backoff strategy; max 5 attempts)
4. Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
5. All error logs include context (guild_id, user_id, emoji attempted, operation type)

**Key Implementation Tasks:**
- Set up Python logging with JSON format (structlog or custom JSON formatter)
- Implement retry logic with asyncio.sleep backoff for PluralKit API calls
- Implement retry logic with backoff for database queries
- Add global exception handler (on_error in discord.py)
- Comprehensive error documentation (what errors mean, how to diagnose)
- Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
- Optional: Sentry integration for error tracking (recommended but not required for v1)

**Architecture Notes:**
- Async error handling must not block the event loop
- Retry logic should use exponential backoff with jitter (avoid thundering herd)
- Logs should include PluralKit request context (duration, status code, member_id) for debugging
- Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps

**Risks & Mitigations:**
- Risk: Logging overhead impacts performance
  - Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
- Risk: Backoff strategy causes noticeable delays when DB is down
  - Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly

**Dependencies:** Phases 1-4 (all features working; Phase 5 hardens them)

---

## Dependency Chain

```
Phase 1: Foundation (Discord client, PluralKit detection, database)
    ↓
Phase 2: Translation (emoji parsing, lookups, auto-translation)
    ↓
Phase 3: Teaching (commands to add/update meanings; audit trail)
    ↓
Phase 4: Configuration (per-server auto/on-demand toggle)
    ↓
Phase 5: Polish (logging, retry logic, production hardening)
```

**Critical Path:** Phase 1 → Phase 2 → Phase 3 (core value)
**Optional Path:** Phase 4 (customization), Phase 5 (hardening)

---

## Requirement Coverage Summary

**Total v1 Requirements:** 33
**Mapped to Phases:** 33
**Unmapped:** 0 ✓

**Coverage by Category:**
- Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
- Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
- Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
- Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
- Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
- Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
- Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
- Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
- Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4

---

## Key Decisions & Rationale

| Decision | Rationale | Phase |
|----------|-----------|-------|
| Global emoji dictionary | Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge | 1-3 |
| Learning-based system | Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught. | 3 |
| PluralKit webhook integration | Standard plural system tool; webhook dispatch is instant and free (vs API polling) | 1 |
| Auto mode as default | Translates automatically; more accessible for Vivi (no extra action needed) | 2, 4 |
| Per-server configuration | Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary | 4 |
| Plain text responses only | Accessibility for dysgraphia; avoids confusion with Discord formatting | 2 |
| No context inference | Meanings learned explicitly; avoids false positives and keeps system transparent | Out of Scope |
| Five-phase structure | Mirrors research recommendations; each phase delivers measurable value | All |

---

## Key Metrics & Success Indicators

**Per Phase:**
- Phase 1: >99% detection accuracy, zero false positives in testing
- Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
- Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
- Phase 4: Settings persist across restart, multi-server support verified
- Phase 5: <0.1% error rate, all failures logged and alertable

**Overall:**
- Time to MVP (Phases 1-2): 3-4 weeks
- Time to v1 (Phases 1-5): 5-6 weeks
- Requirements per phase: 3-7 (manageable scope)
- Value delivered: Incremental (each phase adds core functionality)

---

## Known Limitations & v2+ Backlog

**Intentionally v2+:**
- Per-server emoji overrides (global dictionary only in v1)
- Reaction-based on-demand translation (slash command placeholder only)
- Analytics dashboard (`/stats`, `/emoji-list`)
- Moderation UI (flag/approve/reject meanings)
- Multi-language emoji meanings
- Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)

---

## Implementation Notes

**Tech Stack:**
- discord.py 2.6.4 (async-first, native slash commands)
- SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
- aiosqlite (async database access)
- emoji 2.11.0 (Unicode 15.0 support)
- pydantic 2.5.0 (data validation)
- Railway Cloud for hosting (free tier for MVP)

**Code Organization:**
- `bot.py` - Main discord.py client, event loop
- `cogs/detection.py` - Message event handler, PluralKit detection
- `cogs/translation.py` - Emoji parsing, dictionary lookup, composition
- `cogs/teaching.py` - `/teach`, `/meaning`, `/correct` commands
- `cogs/config.py` - `/config` command, per-server settings
- `database.py` - SQLAlchemy ORM, async queries
- `logging.py` - Structured JSON logging, retry logic

**Testing Strategy:**
- Unit tests: emoji parsing, dictionary lookups, composition
- Integration tests: PluralKit detection with mock webhooks
- Accessibility tests: response length, emoji names for screen readers
- Load tests: 100+ servers, 1000+ emoji, response times

---

**Roadmap created: 2025-01-29**
**Ready for Phase 1 planning**
**Depth: Quick (5 phases, natural boundaries)**