Files

Dani B 3e89f67666 docs: create 5-phase roadmap with 33 requirements mapped

- Phase 1 (Foundation): PluralKit detection + database setup
- Phase 2 (Translation Engine): Emoji parsing + auto-translate
- Phase 3 (Teaching System): User commands to learn emoji meanings
- Phase 4 (Configuration): Per-server settings + scaling
- Phase 5 (Polish): Logging + production hardening

100% requirement coverage: all 33 v1 requirements mapped to exactly one phase.
Dependencies identified: Phase 1 → 2 → 3 → 4 → 5

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:08:02 -05:00

19 KiB

Raw Blame History

Roadmap: Vivi Speech Translator

Vision: Discord bot that translates Vivi's emoji communication into text so her community understands her instantly.

Phases: 5 | Requirements Covered: 33/33 | Coverage: 100% ✓

Phase Overview

Phase	Name	Goal	Requirements	Success Criteria
1	Foundation	Reliably detect Vivi's PluralKit messages and set up database	DETECT-01, DETECT-02, DETECT-03, DB-01, GEN-01	4
2	Translation Engine	Parse emojis and translate to text with auto mode	TRANS-01, TRANS-02, TRANS-03, TRANS-04, TRANS-05, TRANS-06, DETECT-04, UNK-01, ERROR-01, CONFIG-02, A11Y-01, A11Y-04, DB-02	5
3	Teaching System	Let users teach emoji meanings via simple commands	TEACH-01, TEACH-02, TEACH-03, TEACH-04, TEACH-05, TEACH-06, TEACH-07, UNK-02, UNK-03, A11Y-02, A11Y-03, A11Y-05, DB-04, GEN-03	5
4	Configuration & Scaling	Add per-server settings and advanced features	CONFIG-01, CONFIG-03, CONFIG-04, DB-03, GEN-02	4
5	Production Polish	Hardening, logging, and reliability	ERROR-02, ERROR-03	3

Phase Breakdown

Phase 1: Foundation (Weeks 1-2)

Goal: Reliably detect when Vivi is communicating via PluralKit and establish database infrastructure.

Why First:

PluralKit detection is a load-bearing component (foundation for all translation)
Database setup enables all future phases
Demonstrates proof of concept that detection works reliably

Requirements Covered:

DETECT-01: Bot detects Vivi via PluralKit webhook
DETECT-02: Bot ignores non-Vivi messages (no false positives)
DETECT-03: Bot works in channels and DMs
DB-01: Global emoji dictionary persists
GEN-01: Architecture supports multiple systems (not Vivi-only hardcoded)

Success Criteria:

Bot reliably logs "Vivi detected" when she sends a test message with PluralKit webhook
Bot ignores messages from other system members (no false positives on non-Vivi webhooks)
Database (SQLite for MVP) initializes on bot startup without errors
Emoji dictionary table and server configuration table created and readable

Key Implementation Tasks:

Set up discord.py client with correct intents (GUILDS, MEMBERS)
Implement webhook detection (check message.webhook_id)
Implement PluralKit API query to verify member_id matches Vivi's ID
Initialize SQLite database with emoji_dictionary and server_config tables
Test edge cases: Vivi's message edits, reactions, message deletes, DM context

Architecture Notes:

Use aiosqlite for async database access (MVP)
Cache PluralKit member list with 1-hour TTL to reduce API calls
Use nacl.signing to verify PluralKit webhook signatures where possible
Design message handler as a cog (modular for future systems)

Risks & Mitigations:

Risk: PluralKit API rate limits during initialization (10/sec)
- Mitigation: Implement webhook dispatch instead of polling; cache member list
Risk: Message content intent denied by Discord
- Mitigation: Design for slash commands as primary path; message content as optional enhancement

Dependencies: None (Phase 1 is the foundation)

Phase 2: Translation Engine (Weeks 3-4)

Goal: Parse emoji sequences from Vivi's messages and translate to text automatically.

Why Second:

Builds on Phase 1's detection foundation (no detection = nothing to translate)
Delivers core user value: Vivi posts, others understand
Unblocks Phase 3 (teaching system requires working translation to be useful)
Enables MVP validation with early users

Requirements Covered:

TRANS-01: Parse standard emojis (😷, ❌, 2️⃣)
TRANS-02: Parse custom Discord emojis (:me1:, etc.)
TRANS-03: Translate and reply automatically (auto mode)
TRANS-04: Read left-to-right composition (preserve emoji order)
TRANS-05: Plain text responses only (no emoji-only translations)
TRANS-06: Concise, clear responses (respects message length limits)
DETECT-04: Reliably detect edits and reactions on Vivi messages
UNK-01: Skip unknown emojis (only translate known ones)
ERROR-01: React with ❌ on translation errors (graceful failure)
CONFIG-02: Default mode is auto (bot translates every Vivi message)
A11Y-01: Plain text accessible format (no heavy formatting)
A11Y-04: Use slash commands (better keyboard navigation than buttons)
DB-02: Emoji meanings shared across all Discord servers (one global dictionary)

Success Criteria:

Bot translates :me1:😷 2️⃣ sequence to "Vivi is sick" and shows which emoji are unknown
Bot translates multi-emoji sequences (5+ emojis) correctly while preserving order and composition
Bot response is plain text, under 3 sentences (accessibility for dysgraphia)
Unknown emojis are skipped with gentle prompt to teach ("Unknown emoji; run /teach emoji meaning")
If database query fails, bot reacts with ❌ instead of crashing or spamming errors

Key Implementation Tasks:

Use emoji library 2.11.0 for standard emoji parsing (Unicode 15.0 support)
Regex pattern for custom Discord emoji detection (:emoji_name:)
Dictionary lookup function (emoji → meaning) with O(1) query time
Translation composition function (assemble meanings into natural language)
Error handler that reacts with ❌ and logs error
Message edit detection (track message ID, compare old vs new emoji)
Test emoji edge cases: skin tone modifiers, ZWJ sequences, variation selectors

Architecture Notes:

Translation engine should be stateless (all state in database)
Compose natural language by concatenating meanings when simple, or identifying patterns (subject + descriptor + descriptor)
Handle emoji variants correctly (normalize input with NFD if needed)
Cache frequently-translated emoji in memory (optional for Phase 2; may add in Phase 4)
Response format: "Vivi says: [translation]" for clarity

Risks & Mitigations:

Risk: Emoji edge cases (combining characters, ZWJ) causing parse failures
- Mitigation: Comprehensive test suite with Unicode 15.0 samples; use emoji library instead of manual regex
Risk: Bot floods channel with translations (rate limiting)
- Mitigation: Track requests per guild; implement cooldown if needed (Phase 4+)
Risk: Message edits race condition (bot editing response while Vivi edits message)
- Mitigation: Post new translation instead of editing; queue requests with small delay if race detected

Dependencies: Phase 1 (detection + database)

Phase 3: Teaching System (Weeks 5-6)

Goal: Enable users to teach the bot new emoji meanings via simple commands.

Why Third:

Requires working translation from Phase 2 (teaching emoji that don't translate yet is the whole point)
Enables the learning system (core differentiator: bot becomes more useful as community teaches it)
Makes bot valuable to Vivi's community: they drive growth, not predefined dictionary
Addresses dysgraphia accessibility: teaching interface must be ultra-simple

Requirements Covered:

TEACH-01: /teach emoji "meaning" command
TEACH-02: Bot confirms what it learned (shows emoji and meaning)
TEACH-03: Meanings stored globally (shared across all servers)
TEACH-04: /meaning emoji query command
TEACH-05: /correct emoji "new meaning" update command
TEACH-06: Audit trail (who, what, when for all changes)
TEACH-07: Anyone can teach (no admin restrictions, but logged for accountability)
UNK-02: Skip unknowns; prompt to teach (gentle, actionable prompt)
UNK-03: Teaching prompt is accessible (simple command: /teach emoji "meaning")
A11Y-02: Simple command syntax (one-liner, no complex structure)
A11Y-03: Concise responses (under 2 sentences; important for dysgraphia)
A11Y-05: Emoji names in responses (for screen readers; show Unicode name)
DB-04: Audit trail storage (emoji_id, old_meaning, new_meaning, user_id, timestamp)
GEN-03: Modular code for other systems (teaching system not Vivi-specific)

Success Criteria:

User runs /teach 🎭 "happy performance", bot confirms "Taught: 🎭 = happy performance" with emoji shown
Next time Vivi uses 🎭, bot translates it correctly in the translation engine (cache or DB lookup)
User runs /meaning 🎭, bot replies with current meaning (or "Unknown emoji")
User runs /correct 🎭 "joyful", meaning updates and audit trail records who changed it and when
/teach command fails gracefully if emoji is invalid, meaning is empty, or meaning is too long (with helpful error message)

Key Implementation Tasks:

Discord.py slash command handler for /teach emoji "meaning"
Input validation: emoji exists, meaning is 1-200 characters, no excessive punctuation
Database insert into emoji_dictionary with user_id and timestamp
/meaning emoji query command with error handling
/correct emoji "new meaning" update command with audit trail
Confirmation messages that show emoji visually (e.g., "🎭" not "PERFORMING_ARTS")
Audit trail table: emoji_id, old_meaning, new_meaning, user_id, timestamp, action_type
Permission checks: log who taught what, but don't restrict based on role (accessibility)
Test that emojis can be taught, corrected, queried in rapid succession

Architecture Notes:

Teaching commands should be accessible to all users (not admin-only; logged for accountability)
Confirmation should always show the emoji visually (immediate visual feedback)
Responses must be short (<2 sentences) for dysgraphia accessibility
Audit trail enables future /undo feature (deferred to v2)
Global dictionary shared across servers; per-server overrides deferred to Phase 4+

Risks & Mitigations:

Risk: Users teach wrong/inappropriate meanings (spam, trolling)
- Mitigation: Audit trail allows corrections; log all changes; v2 adds moderation if needed
Risk: Teaching interface too complex for Vivi (with dysgraphia)
- Mitigation: Ultra-simple one-liner syntax; visual confirmation with emoji; test with Vivi early
Risk: Conflicting emoji meanings across servers
- Mitigation: Global dictionary is by design (shared knowledge); document this limitation; v2 adds per-server overrides

Dependencies: Phase 2 (translation must work so teaching has value)

Phase 4: Configuration & Scaling (Week 7)

Goal: Add per-server settings and prepare for scaling.

Why Fourth:

Builds on Phases 1-3 foundation (all core features working)
Enables server customization without breaking global emoji dictionary
Prepares architecture for multi-server scaling and future systems
Adds operator controls (config commands)

Requirements Covered:

CONFIG-01: /config auto-translate on|off toggle per server
CONFIG-03: Per-server persistence (setting survives bot restart)
CONFIG-04: Admin-only changes (permission check on /config command)
DB-03: Per-server config table
GEN-02: Architecture supports per-system overrides (design for future multi-system)

Success Criteria:

Server admin runs /config auto-translate off, future Vivi messages don't auto-translate (bot is silent by default)
Another server has auto enabled; both work independently (no crosstalk)
Setting persists across bot restart (database query returns correct value)
Non-admin user runs /config auto-translate on, bot rejects with "Admin only" message
Default for new servers is auto mode enabled (true by default in code)

Key Implementation Tasks:

/config auto-translate <on|off> slash command
Permission check (admin-only; use discord.py's default_member_permissions)
server_config table update/insert (guild_id as PK, auto_translate as boolean)
Modify message handler to check per-guild setting before auto-translating
Implement on-demand mode placeholder (manual translation via /translate command; full reaction-based mode deferred to v2)
Cache per-server settings for performance (1-minute TTL)
Test: change setting, verify immediate effect; verify effect persists after restart

Architecture Notes:

On-demand mode: bot only replies if explicitly requested (stored in DB but not fully implemented in v1)
Per-server config indexed by guild_id for O(1) lookup
Cache server settings in memory with TTL to avoid DB hammering
Design allows per-system overrides in future (deferred to v2)

Risks & Mitigations:

Risk: Per-server override conflicts if not careful
- Mitigation: v2 design allows per-system overrides; v1 focuses on global dictionary with server-level auto/on-demand toggle
Risk: Config command is confusing to admins
- Mitigation: Clear help text; only two options (on/off); simple feedback message

Dependencies: Phases 1-3 (all features working; config customizes them)

Phase 5: Production Polish (Week 8+)

Goal: Production hardening, logging, error handling, and monitoring.

Why Last:

Follows all feature phases (features must be complete before hardening)
Improves reliability and debuggability (enables diagnosis of issues in production)
Prepares for public adoption by other systems or larger communities

Requirements Covered:

ERROR-02: Bot logging for debugging (structured JSON logs)
ERROR-03: Exponential backoff for failed DB operations and API calls

Success Criteria:

Structured JSON logs written to file (timestamp, level, message, context, duration) with rotation
Bot retries failed PluralKit API calls (exponential backoff: 1s, 2s, 4s, 8s with jitter)
Bot retries failed database operations (same backoff strategy; max 5 attempts)
Unhandled exceptions caught and logged (no spam in user channels; clear error reaction)
All error logs include context (guild_id, user_id, emoji attempted, operation type)

Key Implementation Tasks:

Set up Python logging with JSON format (structlog or custom JSON formatter)
Implement retry logic with asyncio.sleep backoff for PluralKit API calls
Implement retry logic with backoff for database queries
Add global exception handler (on_error in discord.py)
Comprehensive error documentation (what errors mean, how to diagnose)
Testing edge cases: rate limits, database disconnects, webhook timing issues, missing intents
Optional: Sentry integration for error tracking (recommended but not required for v1)

Architecture Notes:

Async error handling must not block the event loop
Retry logic should use exponential backoff with jitter (avoid thundering herd)
Logs should include PluralKit request context (duration, status code, member_id) for debugging
Log all emoji lookups (emoji, result: found/unknown) to identify teaching gaps

Risks & Mitigations:

Risk: Logging overhead impacts performance
- Mitigation: Async logging; batch writes; exclude noisy operations (every emoji lookup)
Risk: Backoff strategy causes noticeable delays when DB is down
- Mitigation: Set reasonable max wait (8s); fail fast if max attempts exceeded; user sees ❌ reaction quickly

Dependencies: Phases 1-4 (all features working; Phase 5 hardens them)

Dependency Chain

Phase 1: Foundation (Discord client, PluralKit detection, database)
    ↓
Phase 2: Translation (emoji parsing, lookups, auto-translation)
    ↓
Phase 3: Teaching (commands to add/update meanings; audit trail)
    ↓
Phase 4: Configuration (per-server auto/on-demand toggle)
    ↓
Phase 5: Polish (logging, retry logic, production hardening)

Critical Path: Phase 1 → Phase 2 → Phase 3 (core value) Optional Path: Phase 4 (customization), Phase 5 (hardening)

Requirement Coverage Summary

Total v1 Requirements: 33 Mapped to Phases: 33 Unmapped: 0 ✓

Coverage by Category:

Message Detection (DETECT-01 through DETECT-04): 4/4 mapped → Phases 1, 2
Emoji Parsing & Translation (TRANS-01 through TRANS-06): 6/6 mapped → Phase 2
Teaching System (TEACH-01 through TEACH-07): 7/7 mapped → Phase 3
Unknown Emoji Handling (UNK-01 through UNK-03): 3/3 mapped → Phases 2, 3
Error Handling (ERROR-01 through ERROR-03): 3/3 mapped → Phases 2, 5
Configuration (CONFIG-01 through CONFIG-04): 4/4 mapped → Phases 2, 4
Database & Persistence (DB-01 through DB-04): 4/4 mapped → Phases 1, 2, 3, 4
Accessibility (A11Y-01 through A11Y-05): 5/5 mapped → Phases 2, 3
Generalization & Multi-System (GEN-01 through GEN-03): 3/3 mapped → Phases 1, 3, 4

Key Decisions & Rationale

Decision	Rationale	Phase
Global emoji dictionary	Vivi's emoji language is consistent across communities; other systems benefit from shared knowledge	1-3
Learning-based system	Vivi uses many emojis; manual mapping would be unsustainable. Bot learns when taught.	3
PluralKit webhook integration	Standard plural system tool; webhook dispatch is instant and free (vs API polling)	1
Auto mode as default	Translates automatically; more accessible for Vivi (no extra action needed)	2, 4
Per-server configuration	Different communities have different needs (auto vs on-demand); can customize without breaking shared dictionary	4
Plain text responses only	Accessibility for dysgraphia; avoids confusion with Discord formatting	2
No context inference	Meanings learned explicitly; avoids false positives and keeps system transparent	Out of Scope
Five-phase structure	Mirrors research recommendations; each phase delivers measurable value	All

Key Metrics & Success Indicators

Per Phase:

Phase 1: >99% detection accuracy, zero false positives in testing
Phase 2: <500ms response time, 100% parse accuracy on Unicode 15.0 emoji
Phase 3: Teaching interface usable by Vivi (requires user testing), 50+ emoji taught in beta week
Phase 4: Settings persist across restart, multi-server support verified
Phase 5: <0.1% error rate, all failures logged and alertable

Overall:

Time to MVP (Phases 1-2): 3-4 weeks
Time to v1 (Phases 1-5): 5-6 weeks
Requirements per phase: 3-7 (manageable scope)
Value delivered: Incremental (each phase adds core functionality)

Known Limitations & v2+ Backlog

Intentionally v2+:

Per-server emoji overrides (global dictionary only in v1)
Reaction-based on-demand translation (slash command placeholder only)
Analytics dashboard (/stats, /emoji-list)
Moderation UI (flag/approve/reject meanings)
Multi-language emoji meanings
Support for other plural systems beyond PluralKit (architecture designed for it, not enabled)

Implementation Notes

Tech Stack:

discord.py 2.6.4 (async-first, native slash commands)
SQLite MVP (zero setup; migrate to PostgreSQL if >1000 emoji)
aiosqlite (async database access)
emoji 2.11.0 (Unicode 15.0 support)
pydantic 2.5.0 (data validation)
Railway Cloud for hosting (free tier for MVP)

Code Organization:

bot.py - Main discord.py client, event loop
cogs/detection.py - Message event handler, PluralKit detection
cogs/translation.py - Emoji parsing, dictionary lookup, composition
cogs/teaching.py - /teach, /meaning, /correct commands
cogs/config.py - /config command, per-server settings
database.py - SQLAlchemy ORM, async queries
logging.py - Structured JSON logging, retry logic

Testing Strategy:

Unit tests: emoji parsing, dictionary lookups, composition
Integration tests: PluralKit detection with mock webhooks
Accessibility tests: response length, emoji names for screen readers
Load tests: 100+ servers, 1000+ emoji, response times

Roadmap created: 2025-01-29 Ready for Phase 1 planning Depth: Quick (5 phases, natural boundaries)

19 KiB Raw Blame History Unescape Escape

Roadmap: Vivi Speech Translator

Phase Overview

Phase Breakdown

Phase 1: Foundation (Weeks 1-2)

Phase 2: Translation Engine (Weeks 3-4)

Phase 3: Teaching System (Weeks 5-6)

Phase 4: Configuration & Scaling (Week 7)

Phase 5: Production Polish (Week 8+)

Dependency Chain

Requirement Coverage Summary

Key Decisions & Rationale

Key Metrics & Success Indicators

Known Limitations & v2+ Backlog

Implementation Notes

19 KiB

Raw Blame History