Files

Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:02:32 -05:00

22 KiB

Raw Blame History

Research Summary: Vivi Speech Translator

Synthesized from: STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md Date: January 29, 2025 Status: Ready for Requirements Definition

Executive Summary

Vivi Speech Translator is a rule-based emoji-to-text translation Discord bot built for a specific user system (Vivi via PluralKit). The recommended 2025 stack is discord.py 2.6.4 + PostgreSQL/SQLite + webhook-driven PluralKit integration, prioritizing simplicity, reliability, and community-driven learning over complex AI. The core differentiator is transparent, learnable translation with strong accessibility for users with dysgraphia—the bot becomes more valuable as users teach it emoji meanings, creating positive network effects.

The project succeeds by staying focused: detect Vivi's PluralKit-proxied messages, parse emoji sequences, translate via a persistent dictionary, and enable users to grow that dictionary through simple commands. Avoid context inference, cross-Discord generalization, and real-time chat simulation. This narrowly-scoped approach maximizes shipping speed while maintaining high confidence in architectural decisions.

Key Risk: PluralKit webhook detection is load-bearing. Message detection must be bulletproof (Phase 1) before scaling. Secondary risk: keeping the teaching interface simple enough for a user with dysgraphia to adopt comfortably.

Key Findings

From STACK.md: Technology Recommendations

Recommended Stack (2025):

Component	Choice	Rationale
Language	Python 3.10+	Richest emoji/text processing libraries, largest Discord bot community
Framework	discord.py 2.6.4	Actively maintained (October 2025), mature 7-year ecosystem, async-first, native slash commands
Database (MVP)	SQLite 3 + aiosqlite	Zero setup, single file, sufficient for MVP testing (<10K emoji mappings)
Database (Prod)	PostgreSQL 15+ + asyncpg	Scales to millions of mappings, native array types, connection pooling, $0-15/mo on Railway
PluralKit Integration	pluralkit.py + webhook dispatch	Use event-driven webhooks (instant, free) vs API polling (expensive, slow)
Hosting	Railway Cloud	$0-5/mo free tier, auto-deploys from Git, built-in PostgreSQL, public webhook URL for PluralKit
Key Libraries	emoji 2.11.0, pydantic 2.5.0, aiohttp 3.9.0	Unicode 15.0 support, async-native, data validation

Critical Avoids:

❌ Pycord (py-cord) — Inactive since 2023, no PyPI releases
❌ Message content intent as primary architecture — Design for slash commands, treat intent as optional
❌ REST API polling for PluralKit — Use webhook dispatch instead (rate limits: 2 req/sec vs unlimited webhooks)
❌ Synchronous database libraries (sqlite3, psycopg2) — Block bot event loop; use aiosqlite/asyncpg

Confidence: VERY HIGH — All recommendations are current, production-proven, and community-standard in early 2025.

From FEATURES.md: What to Build

Table Stakes (Must-Have):

Message detection + emoji parsing
Reply/response infrastructure
Slash command interface (not prefix commands)
Per-server configuration (auto vs on-demand mode)
Rate limiting + error handling

Differentiators (Should-Have):

Learning system: /teach emoji meaning → stores in database
Emoji sequence detection (e.g., "👩‍💻📱" = compound concept)
Query system: /meaning emoji or /what emoji
Correction system: /correct emoji new_meaning
Reaction-based feedback (✅/❌ on translations)
Accessibility: plain text output, no emoji-only responses, visual confirmation

PluralKit Integration (Critical for Scope):

Detect webhook proxy via message.webhook_id
Verify member_id via GET /v2/messages/{id} API
Enable "Vivi says: [translation]" style responses

Never Build (Out of Scope):

❌ Context-based inference ("infer emoji meaning from conversation")
❌ Cross-Discord emoji translation ("same meaning everywhere")
❌ Real-time chat simulation ("bot generates new emoji sequences")
❌ Full NLP context analysis ("understand subtle tone shifts")

MVP Feature Set (Phases 1-3):

Message detection & emoji parsing
Rule-based translation (no ML)
/teach, /meaning, /correct commands
Auto/on-demand toggle per server
Accessible output (plain text, emoji names)

Roadmap Implication: Build in layers. Phase 1-2 deliver value (users see translations). Phase 3 enables growth (users can teach). Phase 4+ adds refinement (caching, stats, multi-server overrides).

Confidence: HIGH — Feature research grounded in Discord bot best practices, accessibility standards, and PluralKit integration patterns.

From ARCHITECTURE.md: Component Design

7-Component System:

Discord Client — Maintains WebSocket, initializes event loop
Message Event Handler — Filters for webhook, queries PluralKit, verifies Vivi
Emoji Parser — Extracts emoji sequences via regex, preserves order
Translation Engine — Looks up emoji meanings, composes natural language
Database Layer — Async SQLAlchemy + PostgreSQL/SQLite
Command Handler (Cogs) — Teaching, configuration, queries
Configuration Layer — Environment variables for secrets

Data Flow (Simplified):

Vivi's Message (via webhook)
  ↓ [PluralKit detection]
Emoji Parser (regex extraction)
  ↓ [order-preserving]
Database Lookup (O(1) via index)
  ↓ [emoji→meaning]
Translation Composition
  ↓ [natural language]
Discord Reply

Database Schema (Two Core Tables):

emoji_dictionary: emoji_string (PK) → meaning + metadata (created_at, updated_by, confidence)
server_configuration: guild_id (PK) → auto_translate (boolean) + created_at

Key Design Decisions:

Global shared emoji dictionary (Phase 1-3) — simplifies MVP; per-server overrides deferred to Phase 4
Async-first (aiosqlite/asyncpg) — prevents blocking bot's event loop
Primary key on emoji_string, secondary index on custom_emoji_id — enables O(1) lookups
Webhook detection first, API verification second — reduces API calls, catches non-PluralKit webhooks

Suggested Build Order (5 Phases):

Phase 1 (Weeks 1-2): Foundation — Discord client + PluralKit detection + database setup
Phase 2 (Weeks 3-4): Emoji parsing & translation — regex + lookup + reply formatting
Phase 3 (Weeks 5-6): Teaching system — /teach, /meaning, /correct commands
Phase 4 (Week 7): Per-server config — auto/on-demand toggle, /config command
Phase 5 (Week 8+): Polish — caching, logging, edge cases, error handling

Scaling Path:

MVP (Single bot, <10 servers): SQLite, local development
Production (100-1000 servers): PostgreSQL on Railway, connection pooling
Enterprise (1000+ servers): Add Redis caching layer, implement Discord sharding

Confidence: HIGH — Architecture mirrors successful production Discord bots (MEE6, Logiq, etc.). Component boundaries are clean, async patterns are standard.

From PITFALLS.md: Risks to Prevent

Top 8 Pitfalls with Phase Assignment:

Message Detection Reliability (Phase 1 - CRITICAL)
- Risk: False positives (translates non-Vivi messages) or false negatives (misses Vivi)
- Cause: Mixing webhook detection methods, edge case in PluralKit proxying
- Prevention: Use webhook creator ID as source of truth, cache member names, test reproxy edge cases, log failures
- Cost if ignored: Bot unreliable from day one, loses user trust
Message Content Intent Denial (Phase 1 - CRITICAL)
- Risk: Bot designed for passive message scanning; approval denied at 75 servers
- Cause: Assuming Discord approval is guaranteed
- Prevention: Design for slash commands first (/translate emoji), treat message content intent as optional
- Cost if ignored: Architectural rewrite mid-project
Dictionary Quality Degradation (Phase 3 - HIGH)
- Risk: User-taught emoji meanings become nonsensical (typos, trolls, conflicts)
- Cause: No validation, no audit trail, no approval workflow
- Prevention: Validate meaning length/content, log every change, flag conflicts, require mod approval for shared meanings
- Cost if ignored: Translations become unreliable by month 2-3
Teaching Interface Too Complex (Phase 3 - HIGH)
- Risk: Vivi (with dysgraphia) avoids using teaching system; feature becomes unused
- Cause: Text-heavy commands, complex syntax, no visual confirmation
- Prevention: Ultra-simple commands (/teach emoji meaning), show emoji in response, keep responses under 2 sentences
- Cost if ignored: Bot cannot learn, static dictionary limits usefulness
Rate Limiting (Phase 2+ - MEDIUM)
- Risk: Bot goes silent during peak usage (Discord or PluralKit API limits hit)
- Cause: Naive request patterns, no caching, no exponential backoff
- Prevention: Cache emoji translations, batch lookups, implement exponential backoff, monitor rate limit headers
- Cost if ignored: Intermittent outages, poor user experience
Emoji Parsing Edge Cases (Phase 2 - MEDIUM)
- Risk: Complex emoji (skin tones, ZWJ sequences, variation selectors) break parsing
- Cause: Naive string operations, incorrect regex patterns
- Prevention: Use emoji library (not manual regex), normalize input (NFD), test with families/skin tones/flags
- Cost if ignored: Some emoji don't translate or get corrupted
Authorization & Security (Phase 3 - HIGH)
- Risk: Non-mods can teach emoji, trolls corrupt dictionary, no audit trail
- Cause: No permission checks, no input validation, no logging
- Prevention: Whitelist who can teach (Vivi + trusted), validate input, log everything, support /undo or revert
- Cost if ignored: Dictionary spam, loss of data integrity
Webhook Race Conditions (Phase 2+ - MEDIUM)
- Risk: Vivi edits her message while bot edits its translation; both fail or corrupt
- Cause: Simultaneous edits via same webhook
- Prevention: Post new translation instead of editing; queue requests with 1-sec delay if edit detected
- Cost if ignored: Occasional translation failures and message corruption

Confidence: MEDIUM-HIGH — Pitfalls are well-documented in Discord bot literature. Phase assignments are defensible but require validation during planning.

Implications for Roadmap

Suggested Phase Structure (5 Phases, ~8 Weeks)

Phase 1: Foundation (Weeks 1-2) — Detect Vivi

Goal: Prove we can reliably detect Vivi's PluralKit-proxied messages
Features:
- Discord bot initialization (discord.py, intents, token)
- Webhook detection (message.webhook_id)
- PluralKit API verification (GET /v2/messages/{id})
- Member ID verification (compare to Vivi's ID)
- Database schema + tables (emoji_dictionary, server_configuration)
Deliverable: Bot logs every Vivi message to console (doesn't respond yet)
Critical Pitfalls to Avoid: Message detection reliability, message content intent design, authorization design
Research Needed: ❌ None — STACK.md and PITFALLS.md are definitive
Success Criteria:
- ✅ Bot detects Vivi's messages with >99% accuracy
- ✅ No false positives (ignores non-Vivi webhooks)
- ✅ Handles reproxy, edits, and DMs correctly

Phase 2: Emoji Parsing & Translation (Weeks 3-4) — Make Vivi Understood

Goal: Turn emoji into natural language; deploy MVP
Features:
- Emoji parser (regex for Unicode + custom emoji)
- Database lookups (O(1) via primary key)
- Response composition ("Vivi says: [translation]")
- Auto-translate toggle per server
- Basic error handling (unknown emoji, rate limits)
Deliverable: Bot translates Vivi's emoji in channels and DMs
Critical Pitfalls to Avoid: Emoji edge cases, rate limiting, webhook race conditions
Research Needed: ❌ None — ARCHITECTURE.md covers this thoroughly
Success Criteria:
- ✅ All Unicode emoji parse correctly (including skin tones, ZWJ)
- ✅ Custom Discord emoji supported
- ✅ Translations appear in <500ms
- ✅ Accessible format (plain text, no emoji-only responses)

Phase 3: Teaching System (Weeks 5-6) — Enable Growth

Goal: Let users and Vivi teach emoji meanings; enable sustainable growth
Features:
- /teach emoji meaning command
- /meaning emoji or /what emoji query
- /correct emoji new_meaning updates
- Input validation (length, content, duplicates)
- Audit trail (logged changes with user_id, timestamp)
- Reaction-based feedback (✅/❌ on translations)
- Permission checks (whitelist who can teach)
Deliverable: Users can teach emoji via simple one-liner commands; bot confirms with visual emoji
Critical Pitfalls to Avoid: Dictionary degradation, interface complexity (dysgraphia UX), authorization bypass, emoji conflicts
Research Needed: ⚠️ Potential research needed on Vivi's specific dysgraphia constraints and optimal UI patterns
Success Criteria:
- ✅ Vivi finds teaching interface usable (simple syntax, visual confirmation)
- ✅ 50+ emoji taught in first week of beta
- ✅ No troll edits (proper permission checks)
- ✅ Audit trail enables revert if needed

Phase 4: Per-Server Configuration & Scaling (Week 7) — Customize & Optimize

Goal: Let servers customize translation behavior; add basic caching
Features:
- /config auto-translate [on|off] per-server toggle
- /translate emoji on-demand command
- Redis caching layer (optional, for hot emoji)
- PostgreSQL migration (if MVP showed >1000 emoji, scaling needed)
- Basic statistics (/emoji-stats)
Deliverable: Different servers can choose auto vs on-demand translation; bot performance optimized
Critical Pitfalls to Avoid: Global dictionary conflicts (document limitation; defer per-server overrides to Phase 5+)
Research Needed: ⚠️ Performance profiling may reveal caching needs earlier than expected
Success Criteria:
- ✅ Servers can customize behavior via commands
- ✅ Bot remains responsive at 100+ servers with 1000+ emoji
- ✅ Average response time <250ms (including PluralKit API call)

Phase 5: Polish & Production Hardening (Week 8+) — Stabilize

Goal: Make bot production-ready with comprehensive error handling, logging, monitoring
Features:
- Structured logging (all errors, API calls, performance metrics)
- Sentry or equivalent error tracking
- Graceful degradation (serve cached meanings if DB down)
- Edge case handling (message edits, deletions, permission changes)
- Documentation and runbooks
Deliverable: Production-grade bot with <99% uptime, full observability
Research Needed: ❌ None — standard DevOps practices
Success Criteria:
- ✅ <0.1% error rate on translations
- ✅ All errors logged and alertable
- ✅ Can diagnose issues within 5 minutes from logs

Research Flags & Validation Needs

High-Confidence Areas (Skip Deeper Research)

Stack: discord.py 2.6.4, SQLite→PostgreSQL, Railway hosting — all production-proven
Architecture: Component design, data flows, build order — standard Discord bot patterns
PluralKit Integration: Webhook dispatch vs API polling tradeoff is well-documented

Medium-Confidence Areas (Validate During Planning)

Phase 3 UX: Dysgraphia accessibility — validate teaching interface usability with Vivi early
Phase 2 Performance: Emoji parser edge cases — comprehensive test suite needed before Phase 2
Phase 4 Scaling: PostgreSQL migration point — may happen sooner/later than expected based on emoji volume

Areas Requiring Phase-Specific Research

Phase 3: Optimal teaching UX for dysgraphia (interview Vivi, iterate prototype)
Phase 4: Per-server override system design (if pursued; currently deferred to Phase 5+)
Phase 5: Sentry configuration, structured logging patterns (standard practice, low risk)

Confidence Assessment

Area	Level	Rationale	Gaps
Tech Stack	⭐⭐⭐ VERY HIGH	discord.py 2.6.4, SQLite/PostgreSQL, Railway all production-standard in 2025; no experimental choices	None — all recommendations are proven
Architecture	⭐⭐⭐ VERY HIGH	Component design mirrors MEE6, Logiq, other production bots; async patterns well-documented	None — patterns are industry-standard
Features & MVP Scope	⭐⭐⭐ HIGH	Rule-based learning is transparent, debuggable, and explicitly preferred over ML; feature scope is tight	Dysgraphia UX needs validation; confirm Vivi's preferences
Pitfalls	⭐⭐ MEDIUM-HIGH	Most pitfalls are documented in Discord bot literature; prioritization is defensible	Message detection reliability needs testing; rate limiting impact unknown until scale testing
Roadmap Phases	⭐⭐⭐ HIGH	Build order is logical (detection → translation → teaching → config → polish); each phase delivers value	Phase 3 timing may shift based on Vivi's teaching interface feedback
PluralKit Integration	⭐⭐⭐ VERY HIGH	Webhook dispatch approach is efficient, well-documented; API endpoints are stable	None — integration is straightforward
Accessibility	⭐⭐ MEDIUM	General accessibility principles are sound; dysgraphia-specific UX patterns need user testing	Vivi's exact preferences (command aliases, visual feedback styles, response length) unknown

Overall Confidence: ⭐⭐⭐ HIGH

This project has clear requirements, proven technology choices, and manageable scope. The main confidence gap is teaching interface UX for dysgraphia — validate early in Phase 3 planning with Vivi's direct feedback.

Gaps to Address During Requirements Definition

Vivi's Teaching UX Preferences (High Priority)
- What syntax feels easiest? (/teach emoji meaning vs /learn emoji meaning vs /add emoji meaning?)
- How should bot confirm back? (Show emoji only? Emoji + text? How many words max?)
- Are reaction buttons easier than typing? (React ✓ vs type "yes")
- What emoji naming system? (Unicode names? Custom? Both?)
- Action: Interview Vivi early in requirements phase; prototype 2-3 UI patterns
Exact Emoji Coverage (Medium Priority)
- Does Vivi use only standard Unicode emoji, or custom Discord emoji, or both?
- Are there specific emoji types (families, flags, keycaps) that are critical?
- Does she use ZWJ sequences (👨‍👩‍👧)?
- Action: Ask Vivi to share examples of emoji she commonly uses
Moderation & Teaching Permissions (Medium Priority)
- Who should be allowed to teach emoji? (Only Vivi? Vivi + alters? Trusted friends? Everyone?)
- How should conflicts be resolved if two people teach different meanings for same emoji?
- Is there a mod team to approve meanings, or trust-first approach?
- Action: Clarify with Vivi's community (or proxy representative)
Multi-System Scope (Low Priority)
- Is this bot only for Vivi's system, or will it serve multiple DID systems?
- If multiple systems, how do we handle different emoji meanings per system?
- Action: Clarify scope; if multi-system, defer to Phase 4+ for per-system overrides
Response Format Preferences (Low Priority)
- Should bot translate emoji-only, or include surrounding context?
- Example: Vivi posts "😷 2️⃣ 🍑 ❌" → Bot says "sick, two, peach, no" OR "Vivi: sick, feeling crappy about two things, and definitely not"?
- Action: Test both formats with Vivi; let her choose style

Cost Breakdown & Go/No-Go Criteria

MVP Monthly Cost:

Bot Hosting (Railway): $0 (free tier)
Database (SQLite, local): $0
Total: $0

Production Monthly Cost (100+ servers):

Bot Hosting (Railway): $5
PostgreSQL (Railway): $15
Logging/Monitoring (optional): $0-50
Total: $20-70

Go Criteria (Phase 1 Completion):

✅ Message detection >99% accurate
✅ No false positives or negatives
✅ Database queries <50ms
✅ Code reviewed and documented

No-Go Criteria (Stop if True):

❌ PluralKit API rate limits prevent scaling to 10+ servers
❌ Discord denies message content intent AND no viable slash command path
❌ Vivi finds teaching interface unusable after Phase 3 testing

Sources & References

Research Files (Synthesized)

.planning/research/STACK.md — Technology recommendations, rationale, alternatives
.planning/research/FEATURES.md — Feature scope, learning approach, accessibility, anti-features
.planning/research/ARCHITECTURE.md — Component design, data flows, database schema, scaling
.planning/research/PITFALLS.md — Common mistakes, prevention strategies, phase assignments

External References

discord.py Documentation — Latest 2.6.4
PluralKit API Reference
Railway Cloud Platform
Discord Bot Security Best Practices 2025
Accessibility for Dysgraphia

Next Steps for Roadmap Creator

Read this SUMMARY.md (5 min) — Understand the synthesis
Review PITFALLS.md (15 min) — Understand phase-specific risks
Clarify gaps with Vivi (async) — Teaching UX, emoji coverage, permissions
Map phases to sprints — Assign timelines, team, success criteria
Create requirements document — Expand phase descriptions into user stories
Begin Phase 1 development — Foundation: Discord client + PluralKit detection

Status: ✅ Ready for Requirements Definition Synthesized by: GSD Research Synthesizer Date: January 29, 2025 Confidence Level: ⭐⭐⭐ HIGH

Proceed to roadmap creation. Prioritize Vivi interview for teaching UX validation.

22 KiB Raw Blame History Unescape Escape