Files
Vivi-Speech/.planning/research/SUMMARY.md
Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:02:32 -05:00

22 KiB
Raw Blame History

Research Summary: Vivi Speech Translator

Synthesized from: STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md Date: January 29, 2025 Status: Ready for Requirements Definition


Executive Summary

Vivi Speech Translator is a rule-based emoji-to-text translation Discord bot built for a specific user system (Vivi via PluralKit). The recommended 2025 stack is discord.py 2.6.4 + PostgreSQL/SQLite + webhook-driven PluralKit integration, prioritizing simplicity, reliability, and community-driven learning over complex AI. The core differentiator is transparent, learnable translation with strong accessibility for users with dysgraphia—the bot becomes more valuable as users teach it emoji meanings, creating positive network effects.

The project succeeds by staying focused: detect Vivi's PluralKit-proxied messages, parse emoji sequences, translate via a persistent dictionary, and enable users to grow that dictionary through simple commands. Avoid context inference, cross-Discord generalization, and real-time chat simulation. This narrowly-scoped approach maximizes shipping speed while maintaining high confidence in architectural decisions.

Key Risk: PluralKit webhook detection is load-bearing. Message detection must be bulletproof (Phase 1) before scaling. Secondary risk: keeping the teaching interface simple enough for a user with dysgraphia to adopt comfortably.


Key Findings

From STACK.md: Technology Recommendations

Recommended Stack (2025):

Component Choice Rationale
Language Python 3.10+ Richest emoji/text processing libraries, largest Discord bot community
Framework discord.py 2.6.4 Actively maintained (October 2025), mature 7-year ecosystem, async-first, native slash commands
Database (MVP) SQLite 3 + aiosqlite Zero setup, single file, sufficient for MVP testing (<10K emoji mappings)
Database (Prod) PostgreSQL 15+ + asyncpg Scales to millions of mappings, native array types, connection pooling, $0-15/mo on Railway
PluralKit Integration pluralkit.py + webhook dispatch Use event-driven webhooks (instant, free) vs API polling (expensive, slow)
Hosting Railway Cloud $0-5/mo free tier, auto-deploys from Git, built-in PostgreSQL, public webhook URL for PluralKit
Key Libraries emoji 2.11.0, pydantic 2.5.0, aiohttp 3.9.0 Unicode 15.0 support, async-native, data validation

Critical Avoids:

  • Pycord (py-cord) — Inactive since 2023, no PyPI releases
  • Message content intent as primary architecture — Design for slash commands, treat intent as optional
  • REST API polling for PluralKit — Use webhook dispatch instead (rate limits: 2 req/sec vs unlimited webhooks)
  • Synchronous database libraries (sqlite3, psycopg2) — Block bot event loop; use aiosqlite/asyncpg

Confidence: VERY HIGH — All recommendations are current, production-proven, and community-standard in early 2025.


From FEATURES.md: What to Build

Table Stakes (Must-Have):

  • Message detection + emoji parsing
  • Reply/response infrastructure
  • Slash command interface (not prefix commands)
  • Per-server configuration (auto vs on-demand mode)
  • Rate limiting + error handling

Differentiators (Should-Have):

  • Learning system: /teach emoji meaning → stores in database
  • Emoji sequence detection (e.g., "👩‍💻📱" = compound concept)
  • Query system: /meaning emoji or /what emoji
  • Correction system: /correct emoji new_meaning
  • Reaction-based feedback (/ on translations)
  • Accessibility: plain text output, no emoji-only responses, visual confirmation

PluralKit Integration (Critical for Scope):

  • Detect webhook proxy via message.webhook_id
  • Verify member_id via GET /v2/messages/{id} API
  • Enable "Vivi says: [translation]" style responses

Never Build (Out of Scope):

  • Context-based inference ("infer emoji meaning from conversation")
  • Cross-Discord emoji translation ("same meaning everywhere")
  • Real-time chat simulation ("bot generates new emoji sequences")
  • Full NLP context analysis ("understand subtle tone shifts")

MVP Feature Set (Phases 1-3):

  • Message detection & emoji parsing
  • Rule-based translation (no ML)
  • /teach, /meaning, /correct commands
  • Auto/on-demand toggle per server
  • Accessible output (plain text, emoji names)

Roadmap Implication: Build in layers. Phase 1-2 deliver value (users see translations). Phase 3 enables growth (users can teach). Phase 4+ adds refinement (caching, stats, multi-server overrides).

Confidence: HIGH — Feature research grounded in Discord bot best practices, accessibility standards, and PluralKit integration patterns.


From ARCHITECTURE.md: Component Design

7-Component System:

  1. Discord Client — Maintains WebSocket, initializes event loop
  2. Message Event Handler — Filters for webhook, queries PluralKit, verifies Vivi
  3. Emoji Parser — Extracts emoji sequences via regex, preserves order
  4. Translation Engine — Looks up emoji meanings, composes natural language
  5. Database Layer — Async SQLAlchemy + PostgreSQL/SQLite
  6. Command Handler (Cogs) — Teaching, configuration, queries
  7. Configuration Layer — Environment variables for secrets

Data Flow (Simplified):

Vivi's Message (via webhook)
  ↓ [PluralKit detection]
Emoji Parser (regex extraction)
  ↓ [order-preserving]
Database Lookup (O(1) via index)
  ↓ [emoji→meaning]
Translation Composition
  ↓ [natural language]
Discord Reply

Database Schema (Two Core Tables):

  • emoji_dictionary: emoji_string (PK) → meaning + metadata (created_at, updated_by, confidence)
  • server_configuration: guild_id (PK) → auto_translate (boolean) + created_at

Key Design Decisions:

  • Global shared emoji dictionary (Phase 1-3) — simplifies MVP; per-server overrides deferred to Phase 4
  • Async-first (aiosqlite/asyncpg) — prevents blocking bot's event loop
  • Primary key on emoji_string, secondary index on custom_emoji_id — enables O(1) lookups
  • Webhook detection first, API verification second — reduces API calls, catches non-PluralKit webhooks

Suggested Build Order (5 Phases):

  1. Phase 1 (Weeks 1-2): Foundation — Discord client + PluralKit detection + database setup
  2. Phase 2 (Weeks 3-4): Emoji parsing & translation — regex + lookup + reply formatting
  3. Phase 3 (Weeks 5-6): Teaching system — /teach, /meaning, /correct commands
  4. Phase 4 (Week 7): Per-server config — auto/on-demand toggle, /config command
  5. Phase 5 (Week 8+): Polish — caching, logging, edge cases, error handling

Scaling Path:

  • MVP (Single bot, <10 servers): SQLite, local development
  • Production (100-1000 servers): PostgreSQL on Railway, connection pooling
  • Enterprise (1000+ servers): Add Redis caching layer, implement Discord sharding

Confidence: HIGH — Architecture mirrors successful production Discord bots (MEE6, Logiq, etc.). Component boundaries are clean, async patterns are standard.


From PITFALLS.md: Risks to Prevent

Top 8 Pitfalls with Phase Assignment:

  1. Message Detection Reliability (Phase 1 - CRITICAL)

    • Risk: False positives (translates non-Vivi messages) or false negatives (misses Vivi)
    • Cause: Mixing webhook detection methods, edge case in PluralKit proxying
    • Prevention: Use webhook creator ID as source of truth, cache member names, test reproxy edge cases, log failures
    • Cost if ignored: Bot unreliable from day one, loses user trust
  2. Message Content Intent Denial (Phase 1 - CRITICAL)

    • Risk: Bot designed for passive message scanning; approval denied at 75 servers
    • Cause: Assuming Discord approval is guaranteed
    • Prevention: Design for slash commands first (/translate emoji), treat message content intent as optional
    • Cost if ignored: Architectural rewrite mid-project
  3. Dictionary Quality Degradation (Phase 3 - HIGH)

    • Risk: User-taught emoji meanings become nonsensical (typos, trolls, conflicts)
    • Cause: No validation, no audit trail, no approval workflow
    • Prevention: Validate meaning length/content, log every change, flag conflicts, require mod approval for shared meanings
    • Cost if ignored: Translations become unreliable by month 2-3
  4. Teaching Interface Too Complex (Phase 3 - HIGH)

    • Risk: Vivi (with dysgraphia) avoids using teaching system; feature becomes unused
    • Cause: Text-heavy commands, complex syntax, no visual confirmation
    • Prevention: Ultra-simple commands (/teach emoji meaning), show emoji in response, keep responses under 2 sentences
    • Cost if ignored: Bot cannot learn, static dictionary limits usefulness
  5. Rate Limiting (Phase 2+ - MEDIUM)

    • Risk: Bot goes silent during peak usage (Discord or PluralKit API limits hit)
    • Cause: Naive request patterns, no caching, no exponential backoff
    • Prevention: Cache emoji translations, batch lookups, implement exponential backoff, monitor rate limit headers
    • Cost if ignored: Intermittent outages, poor user experience
  6. Emoji Parsing Edge Cases (Phase 2 - MEDIUM)

    • Risk: Complex emoji (skin tones, ZWJ sequences, variation selectors) break parsing
    • Cause: Naive string operations, incorrect regex patterns
    • Prevention: Use emoji library (not manual regex), normalize input (NFD), test with families/skin tones/flags
    • Cost if ignored: Some emoji don't translate or get corrupted
  7. Authorization & Security (Phase 3 - HIGH)

    • Risk: Non-mods can teach emoji, trolls corrupt dictionary, no audit trail
    • Cause: No permission checks, no input validation, no logging
    • Prevention: Whitelist who can teach (Vivi + trusted), validate input, log everything, support /undo or revert
    • Cost if ignored: Dictionary spam, loss of data integrity
  8. Webhook Race Conditions (Phase 2+ - MEDIUM)

    • Risk: Vivi edits her message while bot edits its translation; both fail or corrupt
    • Cause: Simultaneous edits via same webhook
    • Prevention: Post new translation instead of editing; queue requests with 1-sec delay if edit detected
    • Cost if ignored: Occasional translation failures and message corruption

Confidence: MEDIUM-HIGH — Pitfalls are well-documented in Discord bot literature. Phase assignments are defensible but require validation during planning.


Implications for Roadmap

Suggested Phase Structure (5 Phases, ~8 Weeks)

Phase 1: Foundation (Weeks 1-2) — Detect Vivi

  • Goal: Prove we can reliably detect Vivi's PluralKit-proxied messages
  • Features:
    • Discord bot initialization (discord.py, intents, token)
    • Webhook detection (message.webhook_id)
    • PluralKit API verification (GET /v2/messages/{id})
    • Member ID verification (compare to Vivi's ID)
    • Database schema + tables (emoji_dictionary, server_configuration)
  • Deliverable: Bot logs every Vivi message to console (doesn't respond yet)
  • Critical Pitfalls to Avoid: Message detection reliability, message content intent design, authorization design
  • Research Needed: None — STACK.md and PITFALLS.md are definitive
  • Success Criteria:
    • Bot detects Vivi's messages with >99% accuracy
    • No false positives (ignores non-Vivi webhooks)
    • Handles reproxy, edits, and DMs correctly

Phase 2: Emoji Parsing & Translation (Weeks 3-4) — Make Vivi Understood

  • Goal: Turn emoji into natural language; deploy MVP
  • Features:
    • Emoji parser (regex for Unicode + custom emoji)
    • Database lookups (O(1) via primary key)
    • Response composition ("Vivi says: [translation]")
    • Auto-translate toggle per server
    • Basic error handling (unknown emoji, rate limits)
  • Deliverable: Bot translates Vivi's emoji in channels and DMs
  • Critical Pitfalls to Avoid: Emoji edge cases, rate limiting, webhook race conditions
  • Research Needed: None — ARCHITECTURE.md covers this thoroughly
  • Success Criteria:
    • All Unicode emoji parse correctly (including skin tones, ZWJ)
    • Custom Discord emoji supported
    • Translations appear in <500ms
    • Accessible format (plain text, no emoji-only responses)

Phase 3: Teaching System (Weeks 5-6) — Enable Growth

  • Goal: Let users and Vivi teach emoji meanings; enable sustainable growth
  • Features:
    • /teach emoji meaning command
    • /meaning emoji or /what emoji query
    • /correct emoji new_meaning updates
    • Input validation (length, content, duplicates)
    • Audit trail (logged changes with user_id, timestamp)
    • Reaction-based feedback (/ on translations)
    • Permission checks (whitelist who can teach)
  • Deliverable: Users can teach emoji via simple one-liner commands; bot confirms with visual emoji
  • Critical Pitfalls to Avoid: Dictionary degradation, interface complexity (dysgraphia UX), authorization bypass, emoji conflicts
  • Research Needed: ⚠️ Potential research needed on Vivi's specific dysgraphia constraints and optimal UI patterns
  • Success Criteria:
    • Vivi finds teaching interface usable (simple syntax, visual confirmation)
    • 50+ emoji taught in first week of beta
    • No troll edits (proper permission checks)
    • Audit trail enables revert if needed

Phase 4: Per-Server Configuration & Scaling (Week 7) — Customize & Optimize

  • Goal: Let servers customize translation behavior; add basic caching
  • Features:
    • /config auto-translate [on|off] per-server toggle
    • /translate emoji on-demand command
    • Redis caching layer (optional, for hot emoji)
    • PostgreSQL migration (if MVP showed >1000 emoji, scaling needed)
    • Basic statistics (/emoji-stats)
  • Deliverable: Different servers can choose auto vs on-demand translation; bot performance optimized
  • Critical Pitfalls to Avoid: Global dictionary conflicts (document limitation; defer per-server overrides to Phase 5+)
  • Research Needed: ⚠️ Performance profiling may reveal caching needs earlier than expected
  • Success Criteria:
    • Servers can customize behavior via commands
    • Bot remains responsive at 100+ servers with 1000+ emoji
    • Average response time <250ms (including PluralKit API call)

Phase 5: Polish & Production Hardening (Week 8+) — Stabilize

  • Goal: Make bot production-ready with comprehensive error handling, logging, monitoring
  • Features:
    • Structured logging (all errors, API calls, performance metrics)
    • Sentry or equivalent error tracking
    • Graceful degradation (serve cached meanings if DB down)
    • Edge case handling (message edits, deletions, permission changes)
    • Documentation and runbooks
  • Deliverable: Production-grade bot with <99% uptime, full observability
  • Research Needed: None — standard DevOps practices
  • Success Criteria:
    • <0.1% error rate on translations
    • All errors logged and alertable
    • Can diagnose issues within 5 minutes from logs

Research Flags & Validation Needs

High-Confidence Areas (Skip Deeper Research)

  • Stack: discord.py 2.6.4, SQLite→PostgreSQL, Railway hosting — all production-proven
  • Architecture: Component design, data flows, build order — standard Discord bot patterns
  • PluralKit Integration: Webhook dispatch vs API polling tradeoff is well-documented

Medium-Confidence Areas (Validate During Planning)

  • Phase 3 UX: Dysgraphia accessibility — validate teaching interface usability with Vivi early
  • Phase 2 Performance: Emoji parser edge cases — comprehensive test suite needed before Phase 2
  • Phase 4 Scaling: PostgreSQL migration point — may happen sooner/later than expected based on emoji volume

Areas Requiring Phase-Specific Research

  • Phase 3: Optimal teaching UX for dysgraphia (interview Vivi, iterate prototype)
  • Phase 4: Per-server override system design (if pursued; currently deferred to Phase 5+)
  • Phase 5: Sentry configuration, structured logging patterns (standard practice, low risk)

Confidence Assessment

Area Level Rationale Gaps
Tech Stack VERY HIGH discord.py 2.6.4, SQLite/PostgreSQL, Railway all production-standard in 2025; no experimental choices None — all recommendations are proven
Architecture VERY HIGH Component design mirrors MEE6, Logiq, other production bots; async patterns well-documented None — patterns are industry-standard
Features & MVP Scope HIGH Rule-based learning is transparent, debuggable, and explicitly preferred over ML; feature scope is tight Dysgraphia UX needs validation; confirm Vivi's preferences
Pitfalls MEDIUM-HIGH Most pitfalls are documented in Discord bot literature; prioritization is defensible Message detection reliability needs testing; rate limiting impact unknown until scale testing
Roadmap Phases HIGH Build order is logical (detection → translation → teaching → config → polish); each phase delivers value Phase 3 timing may shift based on Vivi's teaching interface feedback
PluralKit Integration VERY HIGH Webhook dispatch approach is efficient, well-documented; API endpoints are stable None — integration is straightforward
Accessibility MEDIUM General accessibility principles are sound; dysgraphia-specific UX patterns need user testing Vivi's exact preferences (command aliases, visual feedback styles, response length) unknown

Overall Confidence: HIGH

This project has clear requirements, proven technology choices, and manageable scope. The main confidence gap is teaching interface UX for dysgraphia — validate early in Phase 3 planning with Vivi's direct feedback.


Gaps to Address During Requirements Definition

  1. Vivi's Teaching UX Preferences (High Priority)

    • What syntax feels easiest? (/teach emoji meaning vs /learn emoji meaning vs /add emoji meaning?)
    • How should bot confirm back? (Show emoji only? Emoji + text? How many words max?)
    • Are reaction buttons easier than typing? (React ✓ vs type "yes")
    • What emoji naming system? (Unicode names? Custom? Both?)
    • Action: Interview Vivi early in requirements phase; prototype 2-3 UI patterns
  2. Exact Emoji Coverage (Medium Priority)

    • Does Vivi use only standard Unicode emoji, or custom Discord emoji, or both?
    • Are there specific emoji types (families, flags, keycaps) that are critical?
    • Does she use ZWJ sequences (👨‍👩‍👧)?
    • Action: Ask Vivi to share examples of emoji she commonly uses
  3. Moderation & Teaching Permissions (Medium Priority)

    • Who should be allowed to teach emoji? (Only Vivi? Vivi + alters? Trusted friends? Everyone?)
    • How should conflicts be resolved if two people teach different meanings for same emoji?
    • Is there a mod team to approve meanings, or trust-first approach?
    • Action: Clarify with Vivi's community (or proxy representative)
  4. Multi-System Scope (Low Priority)

    • Is this bot only for Vivi's system, or will it serve multiple DID systems?
    • If multiple systems, how do we handle different emoji meanings per system?
    • Action: Clarify scope; if multi-system, defer to Phase 4+ for per-system overrides
  5. Response Format Preferences (Low Priority)

    • Should bot translate emoji-only, or include surrounding context?
    • Example: Vivi posts "😷 2 🍑 " → Bot says "sick, two, peach, no" OR "Vivi: sick, feeling crappy about two things, and definitely not"?
    • Action: Test both formats with Vivi; let her choose style

Cost Breakdown & Go/No-Go Criteria

MVP Monthly Cost:

  • Bot Hosting (Railway): $0 (free tier)
  • Database (SQLite, local): $0
  • Total: $0

Production Monthly Cost (100+ servers):

  • Bot Hosting (Railway): $5
  • PostgreSQL (Railway): $15
  • Logging/Monitoring (optional): $0-50
  • Total: $20-70

Go Criteria (Phase 1 Completion):

  • Message detection >99% accurate
  • No false positives or negatives
  • Database queries <50ms
  • Code reviewed and documented

No-Go Criteria (Stop if True):

  • PluralKit API rate limits prevent scaling to 10+ servers
  • Discord denies message content intent AND no viable slash command path
  • Vivi finds teaching interface unusable after Phase 3 testing

Sources & References

Research Files (Synthesized)

  • .planning/research/STACK.md — Technology recommendations, rationale, alternatives
  • .planning/research/FEATURES.md — Feature scope, learning approach, accessibility, anti-features
  • .planning/research/ARCHITECTURE.md — Component design, data flows, database schema, scaling
  • .planning/research/PITFALLS.md — Common mistakes, prevention strategies, phase assignments

External References


Next Steps for Roadmap Creator

  1. Read this SUMMARY.md (5 min) — Understand the synthesis
  2. Review PITFALLS.md (15 min) — Understand phase-specific risks
  3. Clarify gaps with Vivi (async) — Teaching UX, emoji coverage, permissions
  4. Map phases to sprints — Assign timelines, team, success criteria
  5. Create requirements document — Expand phase descriptions into user stories
  6. Begin Phase 1 development — Foundation: Discord client + PluralKit detection

Status: Ready for Requirements Definition Synthesized by: GSD Research Synthesizer Date: January 29, 2025 Confidence Level: HIGH

Proceed to roadmap creation. Prioritize Vivi interview for teaching UX validation.