docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Dani B
2026-01-29 11:02:32 -05:00
parent 2f5131434e
commit 901574f8c8
8 changed files with 3559 additions and 0 deletions

View File

@@ -0,0 +1,411 @@
# Research Summary: Vivi Speech Translator
**Synthesized from:** STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md
**Date:** January 29, 2025
**Status:** Ready for Requirements Definition
---
## Executive Summary
Vivi Speech Translator is a rule-based emoji-to-text translation Discord bot built for a specific user system (Vivi via PluralKit). The recommended 2025 stack is **discord.py 2.6.4 + PostgreSQL/SQLite + webhook-driven PluralKit integration**, prioritizing simplicity, reliability, and community-driven learning over complex AI. The core differentiator is transparent, learnable translation with strong accessibility for users with dysgraphia—the bot becomes more valuable as users teach it emoji meanings, creating positive network effects.
The project succeeds by staying focused: detect Vivi's PluralKit-proxied messages, parse emoji sequences, translate via a persistent dictionary, and enable users to grow that dictionary through simple commands. Avoid context inference, cross-Discord generalization, and real-time chat simulation. This narrowly-scoped approach maximizes shipping speed while maintaining high confidence in architectural decisions.
**Key Risk:** PluralKit webhook detection is load-bearing. Message detection must be bulletproof (Phase 1) before scaling. Secondary risk: keeping the teaching interface simple enough for a user with dysgraphia to adopt comfortably.
---
## Key Findings
### From STACK.md: Technology Recommendations
**Recommended Stack (2025):**
| Component | Choice | Rationale |
|-----------|--------|-----------|
| **Language** | Python 3.10+ | Richest emoji/text processing libraries, largest Discord bot community |
| **Framework** | discord.py 2.6.4 | Actively maintained (October 2025), mature 7-year ecosystem, async-first, native slash commands |
| **Database (MVP)** | SQLite 3 + aiosqlite | Zero setup, single file, sufficient for MVP testing (<10K emoji mappings) |
| **Database (Prod)** | PostgreSQL 15+ + asyncpg | Scales to millions of mappings, native array types, connection pooling, $0-15/mo on Railway |
| **PluralKit Integration** | pluralkit.py + webhook dispatch | Use event-driven webhooks (instant, free) vs API polling (expensive, slow) |
| **Hosting** | Railway Cloud | $0-5/mo free tier, auto-deploys from Git, built-in PostgreSQL, public webhook URL for PluralKit |
| **Key Libraries** | emoji 2.11.0, pydantic 2.5.0, aiohttp 3.9.0 | Unicode 15.0 support, async-native, data validation |
**Critical Avoids:**
- ❌ Pycord (py-cord) — Inactive since 2023, no PyPI releases
- ❌ Message content intent as primary architecture — Design for slash commands, treat intent as optional
- ❌ REST API polling for PluralKit — Use webhook dispatch instead (rate limits: 2 req/sec vs unlimited webhooks)
- ❌ Synchronous database libraries (sqlite3, psycopg2) — Block bot event loop; use aiosqlite/asyncpg
**Confidence:** **VERY HIGH** — All recommendations are current, production-proven, and community-standard in early 2025.
---
### From FEATURES.md: What to Build
**Table Stakes (Must-Have):**
- Message detection + emoji parsing
- Reply/response infrastructure
- Slash command interface (not prefix commands)
- Per-server configuration (auto vs on-demand mode)
- Rate limiting + error handling
**Differentiators (Should-Have):**
- Learning system: `/teach emoji meaning` → stores in database
- Emoji sequence detection (e.g., "👩‍💻📱" = compound concept)
- Query system: `/meaning emoji` or `/what emoji`
- Correction system: `/correct emoji new_meaning`
- Reaction-based feedback (✅/❌ on translations)
- Accessibility: plain text output, no emoji-only responses, visual confirmation
**PluralKit Integration (Critical for Scope):**
- Detect webhook proxy via `message.webhook_id`
- Verify member_id via `GET /v2/messages/{id}` API
- Enable "Vivi says: [translation]" style responses
**Never Build (Out of Scope):**
- ❌ Context-based inference ("infer emoji meaning from conversation")
- ❌ Cross-Discord emoji translation ("same meaning everywhere")
- ❌ Real-time chat simulation ("bot generates new emoji sequences")
- ❌ Full NLP context analysis ("understand subtle tone shifts")
**MVP Feature Set (Phases 1-3):**
- Message detection & emoji parsing
- Rule-based translation (no ML)
- `/teach`, `/meaning`, `/correct` commands
- Auto/on-demand toggle per server
- Accessible output (plain text, emoji names)
**Roadmap Implication:** Build in layers. Phase 1-2 deliver value (users see translations). Phase 3 enables growth (users can teach). Phase 4+ adds refinement (caching, stats, multi-server overrides).
**Confidence:** **HIGH** — Feature research grounded in Discord bot best practices, accessibility standards, and PluralKit integration patterns.
---
### From ARCHITECTURE.md: Component Design
**7-Component System:**
1. **Discord Client** — Maintains WebSocket, initializes event loop
2. **Message Event Handler** — Filters for webhook, queries PluralKit, verifies Vivi
3. **Emoji Parser** — Extracts emoji sequences via regex, preserves order
4. **Translation Engine** — Looks up emoji meanings, composes natural language
5. **Database Layer** — Async SQLAlchemy + PostgreSQL/SQLite
6. **Command Handler (Cogs)** — Teaching, configuration, queries
7. **Configuration Layer** — Environment variables for secrets
**Data Flow (Simplified):**
```
Vivi's Message (via webhook)
↓ [PluralKit detection]
Emoji Parser (regex extraction)
↓ [order-preserving]
Database Lookup (O(1) via index)
↓ [emoji→meaning]
Translation Composition
↓ [natural language]
Discord Reply
```
**Database Schema (Two Core Tables):**
- **emoji_dictionary**: emoji_string (PK) → meaning + metadata (created_at, updated_by, confidence)
- **server_configuration**: guild_id (PK) → auto_translate (boolean) + created_at
**Key Design Decisions:**
- Global shared emoji dictionary (Phase 1-3) — simplifies MVP; per-server overrides deferred to Phase 4
- Async-first (aiosqlite/asyncpg) — prevents blocking bot's event loop
- Primary key on emoji_string, secondary index on custom_emoji_id — enables O(1) lookups
- Webhook detection first, API verification second — reduces API calls, catches non-PluralKit webhooks
**Suggested Build Order (5 Phases):**
1. **Phase 1 (Weeks 1-2):** Foundation — Discord client + PluralKit detection + database setup
2. **Phase 2 (Weeks 3-4):** Emoji parsing & translation — regex + lookup + reply formatting
3. **Phase 3 (Weeks 5-6):** Teaching system — `/teach`, `/meaning`, `/correct` commands
4. **Phase 4 (Week 7):** Per-server config — auto/on-demand toggle, `/config` command
5. **Phase 5 (Week 8+):** Polish — caching, logging, edge cases, error handling
**Scaling Path:**
- **MVP (Single bot, <10 servers):** SQLite, local development
- **Production (100-1000 servers):** PostgreSQL on Railway, connection pooling
- **Enterprise (1000+ servers):** Add Redis caching layer, implement Discord sharding
**Confidence:** **HIGH** — Architecture mirrors successful production Discord bots (MEE6, Logiq, etc.). Component boundaries are clean, async patterns are standard.
---
### From PITFALLS.md: Risks to Prevent
**Top 8 Pitfalls with Phase Assignment:**
1. **Message Detection Reliability (Phase 1 - CRITICAL)**
- Risk: False positives (translates non-Vivi messages) or false negatives (misses Vivi)
- Cause: Mixing webhook detection methods, edge case in PluralKit proxying
- Prevention: Use webhook creator ID as source of truth, cache member names, test reproxy edge cases, log failures
- Cost if ignored: Bot unreliable from day one, loses user trust
2. **Message Content Intent Denial (Phase 1 - CRITICAL)**
- Risk: Bot designed for passive message scanning; approval denied at 75 servers
- Cause: Assuming Discord approval is guaranteed
- Prevention: Design for slash commands first (`/translate emoji`), treat message content intent as optional
- Cost if ignored: Architectural rewrite mid-project
3. **Dictionary Quality Degradation (Phase 3 - HIGH)**
- Risk: User-taught emoji meanings become nonsensical (typos, trolls, conflicts)
- Cause: No validation, no audit trail, no approval workflow
- Prevention: Validate meaning length/content, log every change, flag conflicts, require mod approval for shared meanings
- Cost if ignored: Translations become unreliable by month 2-3
4. **Teaching Interface Too Complex (Phase 3 - HIGH)**
- Risk: Vivi (with dysgraphia) avoids using teaching system; feature becomes unused
- Cause: Text-heavy commands, complex syntax, no visual confirmation
- Prevention: Ultra-simple commands (`/teach emoji meaning`), show emoji in response, keep responses under 2 sentences
- Cost if ignored: Bot cannot learn, static dictionary limits usefulness
5. **Rate Limiting (Phase 2+ - MEDIUM)**
- Risk: Bot goes silent during peak usage (Discord or PluralKit API limits hit)
- Cause: Naive request patterns, no caching, no exponential backoff
- Prevention: Cache emoji translations, batch lookups, implement exponential backoff, monitor rate limit headers
- Cost if ignored: Intermittent outages, poor user experience
6. **Emoji Parsing Edge Cases (Phase 2 - MEDIUM)**
- Risk: Complex emoji (skin tones, ZWJ sequences, variation selectors) break parsing
- Cause: Naive string operations, incorrect regex patterns
- Prevention: Use emoji library (not manual regex), normalize input (NFD), test with families/skin tones/flags
- Cost if ignored: Some emoji don't translate or get corrupted
7. **Authorization & Security (Phase 3 - HIGH)**
- Risk: Non-mods can teach emoji, trolls corrupt dictionary, no audit trail
- Cause: No permission checks, no input validation, no logging
- Prevention: Whitelist who can teach (Vivi + trusted), validate input, log everything, support `/undo` or revert
- Cost if ignored: Dictionary spam, loss of data integrity
8. **Webhook Race Conditions (Phase 2+ - MEDIUM)**
- Risk: Vivi edits her message while bot edits its translation; both fail or corrupt
- Cause: Simultaneous edits via same webhook
- Prevention: Post new translation instead of editing; queue requests with 1-sec delay if edit detected
- Cost if ignored: Occasional translation failures and message corruption
**Confidence:** **MEDIUM-HIGH** — Pitfalls are well-documented in Discord bot literature. Phase assignments are defensible but require validation during planning.
---
## Implications for Roadmap
### Suggested Phase Structure (5 Phases, ~8 Weeks)
**Phase 1: Foundation (Weeks 1-2) — Detect Vivi**
- **Goal:** Prove we can reliably detect Vivi's PluralKit-proxied messages
- **Features:**
- Discord bot initialization (discord.py, intents, token)
- Webhook detection (`message.webhook_id`)
- PluralKit API verification (`GET /v2/messages/{id}`)
- Member ID verification (compare to Vivi's ID)
- Database schema + tables (emoji_dictionary, server_configuration)
- **Deliverable:** Bot logs every Vivi message to console (doesn't respond yet)
- **Critical Pitfalls to Avoid:** Message detection reliability, message content intent design, authorization design
- **Research Needed:** ❌ None — STACK.md and PITFALLS.md are definitive
- **Success Criteria:**
- ✅ Bot detects Vivi's messages with >99% accuracy
- ✅ No false positives (ignores non-Vivi webhooks)
- ✅ Handles reproxy, edits, and DMs correctly
**Phase 2: Emoji Parsing & Translation (Weeks 3-4) — Make Vivi Understood**
- **Goal:** Turn emoji into natural language; deploy MVP
- **Features:**
- Emoji parser (regex for Unicode + custom emoji)
- Database lookups (O(1) via primary key)
- Response composition ("Vivi says: [translation]")
- Auto-translate toggle per server
- Basic error handling (unknown emoji, rate limits)
- **Deliverable:** Bot translates Vivi's emoji in channels and DMs
- **Critical Pitfalls to Avoid:** Emoji edge cases, rate limiting, webhook race conditions
- **Research Needed:** ❌ None — ARCHITECTURE.md covers this thoroughly
- **Success Criteria:**
- ✅ All Unicode emoji parse correctly (including skin tones, ZWJ)
- ✅ Custom Discord emoji supported
- ✅ Translations appear in <500ms
- ✅ Accessible format (plain text, no emoji-only responses)
**Phase 3: Teaching System (Weeks 5-6) — Enable Growth**
- **Goal:** Let users and Vivi teach emoji meanings; enable sustainable growth
- **Features:**
- `/teach emoji meaning` command
- `/meaning emoji` or `/what emoji` query
- `/correct emoji new_meaning` updates
- Input validation (length, content, duplicates)
- Audit trail (logged changes with user_id, timestamp)
- Reaction-based feedback (✅/❌ on translations)
- Permission checks (whitelist who can teach)
- **Deliverable:** Users can teach emoji via simple one-liner commands; bot confirms with visual emoji
- **Critical Pitfalls to Avoid:** Dictionary degradation, interface complexity (dysgraphia UX), authorization bypass, emoji conflicts
- **Research Needed:** ⚠️ Potential research needed on Vivi's specific dysgraphia constraints and optimal UI patterns
- **Success Criteria:**
- ✅ Vivi finds teaching interface usable (simple syntax, visual confirmation)
- ✅ 50+ emoji taught in first week of beta
- ✅ No troll edits (proper permission checks)
- ✅ Audit trail enables revert if needed
**Phase 4: Per-Server Configuration & Scaling (Week 7) — Customize & Optimize**
- **Goal:** Let servers customize translation behavior; add basic caching
- **Features:**
- `/config auto-translate [on|off]` per-server toggle
- `/translate emoji` on-demand command
- Redis caching layer (optional, for hot emoji)
- PostgreSQL migration (if MVP showed >1000 emoji, scaling needed)
- Basic statistics (`/emoji-stats`)
- **Deliverable:** Different servers can choose auto vs on-demand translation; bot performance optimized
- **Critical Pitfalls to Avoid:** Global dictionary conflicts (document limitation; defer per-server overrides to Phase 5+)
- **Research Needed:** ⚠️ Performance profiling may reveal caching needs earlier than expected
- **Success Criteria:**
- ✅ Servers can customize behavior via commands
- ✅ Bot remains responsive at 100+ servers with 1000+ emoji
- ✅ Average response time <250ms (including PluralKit API call)
**Phase 5: Polish & Production Hardening (Week 8+) — Stabilize**
- **Goal:** Make bot production-ready with comprehensive error handling, logging, monitoring
- **Features:**
- Structured logging (all errors, API calls, performance metrics)
- Sentry or equivalent error tracking
- Graceful degradation (serve cached meanings if DB down)
- Edge case handling (message edits, deletions, permission changes)
- Documentation and runbooks
- **Deliverable:** Production-grade bot with <99% uptime, full observability
- **Research Needed:** ❌ None — standard DevOps practices
- **Success Criteria:**
- ✅ <0.1% error rate on translations
- ✅ All errors logged and alertable
- ✅ Can diagnose issues within 5 minutes from logs
---
## Research Flags & Validation Needs
### High-Confidence Areas (Skip Deeper Research)
- **Stack:** discord.py 2.6.4, SQLite→PostgreSQL, Railway hosting — all production-proven
- **Architecture:** Component design, data flows, build order — standard Discord bot patterns
- **PluralKit Integration:** Webhook dispatch vs API polling tradeoff is well-documented
### Medium-Confidence Areas (Validate During Planning)
- **Phase 3 UX:** Dysgraphia accessibility — validate teaching interface usability with Vivi early
- **Phase 2 Performance:** Emoji parser edge cases — comprehensive test suite needed before Phase 2
- **Phase 4 Scaling:** PostgreSQL migration point — may happen sooner/later than expected based on emoji volume
### Areas Requiring Phase-Specific Research
- **Phase 3:** Optimal teaching UX for dysgraphia (interview Vivi, iterate prototype)
- **Phase 4:** Per-server override system design (if pursued; currently deferred to Phase 5+)
- **Phase 5:** Sentry configuration, structured logging patterns (standard practice, low risk)
---
## Confidence Assessment
| Area | Level | Rationale | Gaps |
|------|-------|-----------|------|
| **Tech Stack** | ⭐⭐⭐ VERY HIGH | discord.py 2.6.4, SQLite/PostgreSQL, Railway all production-standard in 2025; no experimental choices | None — all recommendations are proven |
| **Architecture** | ⭐⭐⭐ VERY HIGH | Component design mirrors MEE6, Logiq, other production bots; async patterns well-documented | None — patterns are industry-standard |
| **Features & MVP Scope** | ⭐⭐⭐ HIGH | Rule-based learning is transparent, debuggable, and explicitly preferred over ML; feature scope is tight | Dysgraphia UX needs validation; confirm Vivi's preferences |
| **Pitfalls** | ⭐⭐ MEDIUM-HIGH | Most pitfalls are documented in Discord bot literature; prioritization is defensible | Message detection reliability needs testing; rate limiting impact unknown until scale testing |
| **Roadmap Phases** | ⭐⭐⭐ HIGH | Build order is logical (detection → translation → teaching → config → polish); each phase delivers value | Phase 3 timing may shift based on Vivi's teaching interface feedback |
| **PluralKit Integration** | ⭐⭐⭐ VERY HIGH | Webhook dispatch approach is efficient, well-documented; API endpoints are stable | None — integration is straightforward |
| **Accessibility** | ⭐⭐ MEDIUM | General accessibility principles are sound; dysgraphia-specific UX patterns need user testing | Vivi's exact preferences (command aliases, visual feedback styles, response length) unknown |
**Overall Confidence: ⭐⭐⭐ HIGH**
This project has clear requirements, proven technology choices, and manageable scope. The main confidence gap is **teaching interface UX for dysgraphia** — validate early in Phase 3 planning with Vivi's direct feedback.
---
## Gaps to Address During Requirements Definition
1. **Vivi's Teaching UX Preferences** (High Priority)
- What syntax feels easiest? (`/teach emoji meaning` vs `/learn emoji meaning` vs `/add emoji meaning`?)
- How should bot confirm back? (Show emoji only? Emoji + text? How many words max?)
- Are reaction buttons easier than typing? (React ✓ vs type "yes")
- What emoji naming system? (Unicode names? Custom? Both?)
- **Action:** Interview Vivi early in requirements phase; prototype 2-3 UI patterns
2. **Exact Emoji Coverage** (Medium Priority)
- Does Vivi use only standard Unicode emoji, or custom Discord emoji, or both?
- Are there specific emoji types (families, flags, keycaps) that are critical?
- Does she use ZWJ sequences (👨‍👩‍👧)?
- **Action:** Ask Vivi to share examples of emoji she commonly uses
3. **Moderation & Teaching Permissions** (Medium Priority)
- Who should be allowed to teach emoji? (Only Vivi? Vivi + alters? Trusted friends? Everyone?)
- How should conflicts be resolved if two people teach different meanings for same emoji?
- Is there a mod team to approve meanings, or trust-first approach?
- **Action:** Clarify with Vivi's community (or proxy representative)
4. **Multi-System Scope** (Low Priority)
- Is this bot only for Vivi's system, or will it serve multiple DID systems?
- If multiple systems, how do we handle different emoji meanings per system?
- **Action:** Clarify scope; if multi-system, defer to Phase 4+ for per-system overrides
5. **Response Format Preferences** (Low Priority)
- Should bot translate emoji-only, or include surrounding context?
- Example: Vivi posts "😷 2⃣ 🍑 ❌" → Bot says "sick, two, peach, no" OR "Vivi: sick, feeling crappy about two things, and definitely not"?
- **Action:** Test both formats with Vivi; let her choose style
---
## Cost Breakdown & Go/No-Go Criteria
**MVP Monthly Cost:**
- Bot Hosting (Railway): $0 (free tier)
- Database (SQLite, local): $0
- **Total: $0**
**Production Monthly Cost (100+ servers):**
- Bot Hosting (Railway): $5
- PostgreSQL (Railway): $15
- Logging/Monitoring (optional): $0-50
- **Total: $20-70**
**Go Criteria (Phase 1 Completion):**
- ✅ Message detection >99% accurate
- ✅ No false positives or negatives
- ✅ Database queries <50ms
- ✅ Code reviewed and documented
**No-Go Criteria (Stop if True):**
- ❌ PluralKit API rate limits prevent scaling to 10+ servers
- ❌ Discord denies message content intent AND no viable slash command path
- ❌ Vivi finds teaching interface unusable after Phase 3 testing
---
## Sources & References
### Research Files (Synthesized)
- `.planning/research/STACK.md` — Technology recommendations, rationale, alternatives
- `.planning/research/FEATURES.md` — Feature scope, learning approach, accessibility, anti-features
- `.planning/research/ARCHITECTURE.md` — Component design, data flows, database schema, scaling
- `.planning/research/PITFALLS.md` — Common mistakes, prevention strategies, phase assignments
### External References
- [discord.py Documentation](https://discordpy.readthedocs.io/) — Latest 2.6.4
- [PluralKit API Reference](https://pluralkit.me/api/)
- [Railway Cloud Platform](https://railway.app/)
- [Discord Bot Security Best Practices 2025](https://friendify.net/blog/discord-bot-security-best-practices-2025.html)
- [Accessibility for Dysgraphia](https://top5accessibility.com/blog/orthographic-dyslexia-dysgraphia/)
---
## Next Steps for Roadmap Creator
1. **Read this SUMMARY.md** (5 min) — Understand the synthesis
2. **Review PITFALLS.md** (15 min) — Understand phase-specific risks
3. **Clarify gaps with Vivi** (async) — Teaching UX, emoji coverage, permissions
4. **Map phases to sprints** — Assign timelines, team, success criteria
5. **Create requirements document** — Expand phase descriptions into user stories
6. **Begin Phase 1 development** — Foundation: Discord client + PluralKit detection
---
**Status:** ✅ Ready for Requirements Definition
**Synthesized by:** GSD Research Synthesizer
**Date:** January 29, 2025
**Confidence Level:** ⭐⭐⭐ HIGH
Proceed to roadmap creation. Prioritize Vivi interview for teaching UX validation.