Synthesized research findings from 4 parallel researcher agents: Key Findings: - Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration - Architecture: 7-component system with clear separation of concerns, async-native - Features: Rule-based learning system starting simple, avoiding context inference and ML - Pitfalls: 8 critical risks identified with phase assignments and prevention strategies Recommended Approach: - 5-phase build order (detection → translation → teaching → config → polish) - Focus on dysgraphia accessibility for teaching interface - Start with message detection reliability (Phase 1, load-bearing) - Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+ Confidence Levels: - Tech Stack: VERY HIGH (all production-proven, no experimental choices) - Architecture: VERY HIGH (mirrors successful production bots) - Features: HIGH (tight scope, transparent approach) - Roadmap: HIGH (logical phase progression with value delivery) Gaps to Address in Requirements: - Vivi's teaching UX preferences (dysgraphia-specific patterns) - Exact emoji coverage and naming conventions - Moderation/teaching permissions model - Multi-system scope and per-system customization needs Ready for requirements definition and roadmap creation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
24 KiB
Pitfalls Research: Vivi Speech Translator
A comprehensive analysis of common mistakes in Discord bot development, with specific focus on PluralKit integration, emoji translation, and learning systems.
PluralKit Integration Pitfalls
Pitfall: Unreliable Vivi Message Detection (Webhook vs Direct Author Check)
What goes wrong: PluralKit proxies messages through webhooks with the member's name. Bots can detect Vivi's messages by either:
- Checking if the author is the user ID (unreliable if webhook proxying is used)
- Parsing the webhook username (fragile - can be modified)
- Checking for proxy tags in message content (only works with bracket-style proxies)
The core issue: If detection logic mixes these approaches or doesn't account for webhook proxying edge cases, you'll get false positives (bot responds to non-Vivi messages) and false negatives (Vivi's messages don't translate).
Warning signs:
- Bot randomly responds to messages from other PluralKit members
- Vivi's proxied messages don't trigger translation but manually typed messages do
- Bot responds during PluralKit testing/repoxying (reproxy command)
- Inconsistent detection across channels with different permissions
Prevention:
- Consistent source of truth: Decide on ONE reliable detection method - either webhook creator ID (most reliable) or member username in webhook
- Query PluralKit API: When uncertain, use PluralKit's REST API to verify if a message came from Vivi's system (3/second rate limit for updates)
- Cache member names: Store known proxy tag patterns for Vivi locally to reduce API calls
- Test edge cases: Reproxy, message edits, reactions on webhook messages, and DMs from Vivi
- Log detection failures: When detection fails, log the message author, webhook info, and detected proxy tags for debugging
Which phase should address it: Phase 1 (core message detection) - must be bulletproof before teaching system
API Rate Limit Note: PluralKit enforces:
- 10/second for GET requests (member lookup)
- 3/second for POST/PATCH/DELETE (system updates)
- Use Dispatch Webhooks for event-driven updates instead of polling
Pitfall: Webhook Editing Race Conditions
What goes wrong: When Vivi edits her proxied message, your bot attempts to edit its translation simultaneously. Discord webhooks don't handle concurrent edits well:
- Message edit by original proxy webhook and bot translation edit can conflict
- Race conditions can cause message state corruption
- Edited messages may revert to old content or show partial updates
- Reactions added during the edit window may be lost
Warning signs:
- Translations occasionally show old/incorrect emoji meanings after Vivi edits her message
- Bot throws 500 errors when trying to edit webhook messages
- Translation accuracy degrades when Vivi edits quickly
- Missing reactions after Vivi's message is edited
Prevention:
- Don't edit webhook messages directly: Instead, post a new translation message and delete the old one
- Add edit detection: Use
message.edited_atto detect changes, but DON'T race with Vivi - Queue edit requests: If Vivi edits, queue a job to re-translate after a 1-second delay to avoid simultaneous edits
- Handle 500s gracefully: Treat webhook edit failures as "post new translation instead"
- Add edit timestamps: Show "(edited)" in translation to indicate it's a response to an edit
Which phase should address it: Phase 2 (message handling) or Phase 3 (refinement)
Discord API Pitfalls
Pitfall: Message Content Intent Denial and Architecture Lock-In
What goes wrong:
Discord requires message_content privileged intent to read emoji content in messages. Bots in 75+ servers must apply for approval. Common mistakes:
- Building the bot assuming you'll get approval (you might not)
- Designing architecture around passive message scanning instead of interactions
- Failing to plan alternatives when approval is denied
- Using
@bot.event async def on_message()which conflicts with slash command handlers
Warning signs:
- Bot works in testing but stops working after hitting 75 servers
- Approval denial with no fallback plan
- Message content suddenly inaccessible mid-development
- Architecture rewrite needed to add slash commands later
Prevention:
- Design for slash commands first: Use
/translate <emoji>instead of passive scanning - Use Interactions API: Buttons, select menus, and slash commands don't require message content intent
- Plan for denial: Have a fallback UI (buttons to trigger translation, not automatic)
- Unverified bots get free access: Stay under 75 servers during development, or use unverified bot for testing
- Document intent usage: Be ready to explain why your emoji translation bot needs message content (you'll need to for 75+ servers)
- Prepare alternatives: Reactions or buttons as fallback if approval is denied
Which phase should address it: Phase 1 (architectural decisions)
Pitfall: Rate Limiting and Burst Requests
What goes wrong: Discord enforces global (50 req/sec) and per-route rate limits. Vivi Speech can hit limits when:
- Translating messages with many emojis (multiple API lookups)
- Multiple users triggering translations simultaneously
- Teaching system saving entries rapidly
- PluralKit API queries in addition to Discord API calls
Warning signs:
- Bot suddenly goes silent (no responses)
- 429 (Too Many Requests) errors in logs
- Delayed translations (multi-second latency)
- Inconsistent behavior during peak usage
Prevention:
- Cache emoji translations: Store learned meanings in-memory with TTL (time-to-live)
- Batch emoji lookups: If translating a message with 5 emojis, don't make 5 API calls - batch them
- Implement exponential backoff: When rate limited, wait with exponential delays (1s, 2s, 4s...)
- Queue teaching commands: Don't save to database on every teach attempt - queue and batch writes
- Monitor rate limit headers: Parse
X-RateLimit-RemainingandX-RateLimit-Reset-Afterheaders - Shard properly: Maintain ~1,000 guilds per shard maximum
- Use caching layers: Redis or in-memory LRU cache for frequently translated emojis
Which phase should address it: Phase 2 (scaling) and Phase 3 (optimization)
Pitfall: Privilege and Permission Confusion
What goes wrong: Bots need specific permissions but devs often either:
- Request too many permissions (users won't invite)
- Request insufficient permissions (bot fails silently)
- Don't verify permissions before action (command fails with cryptic error)
- Don't check user permissions before teaching (malicious edits to dictionary)
Warning signs:
- Bot invited but can't send messages
- Teaching commands work in DM but not in server
- Translation attempts fail silently with no error message
- Non-mods can change emoji meanings
Prevention:
- Minimal permission set: Only request
send_messages,manage_messages(for deleting own messages),read_message_history - Check before acting: Verify bot has required permissions using
message.channel.permissions_for(bot.user) - User permission checks: Only allow trusted users (mods, Vivi herself) to teach emoji meanings
- Clear error messages: "I don't have permission to send messages here" instead of silent failure
- Test on new servers: Invite bot to a test server with minimal permissions and verify all features work
Which phase should address it: Phase 1 (setup) and Phase 3 (moderation)
Learning System Pitfalls
Pitfall: Dictionary Quality Degradation Over Time
What goes wrong: User-contributed learning systems fail when:
- Users add typos, slang, or inside jokes as emoji meanings
- Duplicate or conflicting meanings accumulate (😀 = "happy", "smile", "goofy face")
- Rarely-used emojis have outdated or weird meanings
- No audit trail - can't track who broke the dictionary
- Stale entries that were never useful remain forever
Warning signs:
- Translations become nonsensical or off-topic
- Conflicting definitions for same emoji (confusing translations)
- Many emoji with zero meaningful translation
- Teaching system abused by trolls or internal conflicts
Prevention:
- Validation on teach: Check for minimum length (3 chars), no excessive emojis in meaning, no URLs
- Audit trail: Log every emoji meaning change with
timestamp,user_id,old_value,new_value - Review process: For shared systems, flag new meanings for mod approval before going live
- Meaning versioning: Keep multiple meanings, let users vote/rank them (future: Phase 4)
- Freshness markers: Track last-used date; prompt for re-confirmation if unused for 90+ days
- Duplicate detection: Warn if adding a meaning similar to existing ones
- Clear command output: Show current meaning before accepting new one, ask for confirmation
Which phase should address it: Phase 3 (learning) - implement audit trail from day one
Pitfall: Teaching Interface Too Complex or Text-Heavy
What goes wrong: Learning systems fail when the teaching UI is hard to use:
- Complex command syntax that users forget
- Too many options/flags (overwhelming)
- No confirmation of what was taught
- Text-heavy responses (bad for users with dysgraphia like Vivi)
- No visual feedback (emoji shown in response)
Warning signs:
- Few emoji meanings actually get added
- Users give up and stop teaching
- Confusion about command syntax
- Vivi avoids using teaching feature
- Other system members always teach instead
Prevention:
- Simple one-liner commands:
/teach 😀 happynot/teach --emoji 😀 --meaning "happy" --priority high - Visual confirmation: Include the emoji in the response ("Learned: 😀 = happy")
- Show current meaning: "😀 currently means: happy | Update it? Type:
/teach 😀 new meaning" - Short responses: Keep bot responses under 2 lines when possible
- Use buttons over typing: React with checkmark/X for confirmation instead of "yes/no"
- Emoji picker: If possible, allow selecting emoji by reaction instead of typing
- Accessible syntax: Support aliases -
/learn 😀 happysame as/teach 😀 happy
Which phase should address it: Phase 3 (learning system design)
Accessibility Note: Vivi has Dysgraphia, which affects writing ability. Keep commands short, use visual confirmation (emoji in responses), and minimize text output.
Pitfall: Scope Creep in Learning Features
What goes wrong: Learning systems start simple but can grow uncontrollably:
- "Let's add multiple meanings per emoji" → complexity explosion
- "Different meanings for different contexts" → database redesign
- "Per-server emoji dictionaries" → multi-tenancy complexity
- "Emoji meaning versioning and rollback" → audit log nightmare
- "Machine learning to auto-generate meanings" → maintenance burden
Warning signs:
- Feature backlog grows faster than you can implement
- Core translation becomes slow/unreliable
- Code becomes hard to understand and modify
- New features break old ones
- Team gets overwhelmed with edge cases
Prevention:
- MVP scope: Phase 1-3 = simple one-meaning-per-emoji, all servers share same dictionary
- Clear phase boundaries: Document what's in each phase; don't add features mid-phase
- Say no to feature requests: Politely defer to Phase 4 or beyond
- Keep it simple: One meaning per emoji, user teaches it once, it sticks
- Future extensibility: Design database schema to support multiple meanings later, but don't implement it yet
- Regular scope reviews: Every 2 weeks, ask: "Is this feature essential to core translation?"
Which phase should address it: Phase 1 (planning) - establish clear phase gates
Multi-Server and Data Pitfalls
Pitfall: Global Dictionary Conflicts Across Servers
What goes wrong: A shared emoji dictionary works well until different communities use the same emoji differently:
- Server A uses 🎉 for "party"; Server B uses it for "achievement"
- Emoji meanings don't match community context
- Users from Server B are confused by Server A's translations
- No way to override meanings per-server
Warning signs:
- Users report wrong translations in their server
- Emoji meanings conflict between communities
- No way to customize meanings per-guild
- One server's trolls ruin translations for everyone
Prevention:
- Phase 1-2: Global only: Accept that all servers share one dictionary
- Phase 4 planning: Design per-server override system (store in
server_id:emojikey) - Document limitation: "Emoji meanings are shared across all servers - curate carefully"
- Moderation: Have a trusted team that curates the global dictionary
- Community rules: Require consensus or voting before changing popular emoji meanings
- Meaning context: Store both the meaning AND its frequency/reliability (crowdsourced)
Which phase should address it: Phase 4 (advanced) - keep Phase 1-3 global-only
Emoji Handling Pitfalls
Pitfall: Unicode Representation Edge Cases and Combining Characters
What goes wrong: Emoji are more complex than they appear:
- Some "single" emoji are multi-codepoint: 👨👩👧 (family) = 7 codepoints with zero-width joiners
- Variation selectors (️) change emoji appearance: ❤ vs ❤️
- Skin tone modifiers add extra codepoints: 👋 vs 👋🏻
- Regex fails on complex emoji
- String length in Python != visual emoji count
Warning signs:
- Some emojis don't parse or get corrupted
- Emoji combinations disappear or get split
- Search for specific emoji sometimes fails
- Emoji with skin tones treated as separate emojis
Prevention:
- Use emoji library: Don't parse manually - use
emojipackage which understands combining characters - Normalize input: Normalize emoji to canonical form before storage/lookup (NFD normalization)
- Test edge cases: Include in test suite:
- Family emoji (👨👩👧)
- Skin tone modifiers (👋🏻 through 👋🏿)
- Gendered variants (👨⚕️ vs 👩⚕️)
- Flags (🇺🇸 = 2 regional indicators)
- Keycap sequences (1️⃣)
- Store as text, not codepoints: Keep emoji as Unicode strings in database, not split into codepoints
- Validate emoji: Check if input is actually a valid emoji using
emoji.is_emoji()before storing - Document supported emoji: Be explicit about which emoji are supported (all Unicode emoji, or subset?)
Which phase should address it: Phase 2 (core translation)
Reference: Complex emoji like 👨👩👧👦 (family) consist of multiple code points: ['0x00000031', '0x0000fe0f', '0x000020e3'] pattern. Never assume 1 emoji = 1 character.
Security Pitfalls
Pitfall: Hidden Command Privilege Escalation and Authorization Bypass
What goes wrong: Learning systems allow users to modify bot data. Common authorization mistakes:
- No permission check - any user can teach emoji
- No hierarchy check - regular user can override mod's meanings
- Teaching command accepts unsafe input (SQL injection, command injection)
- Audit trail incomplete - can't prove who made unauthorized changes
- Bot token in environment exposed - full compromise
Warning signs:
- Non-mods can modify emoji dictionary
- Troll edits spread before mods notice
- No way to revert malicious changes
- Bot behaves unexpectedly with suspicious permissions
- Dictionary contains offensive or misleading entries
Prevention:
- Permission check every command: Verify user is mod/trusted before
/teach - Whitelist approach: Only specific users (Vivi, trusted friends) can teach, not everyone
- Input validation: Sanitize meaning text - no special chars, max length, filter profanity
- Audit everything: Log
user_id,timestamp,emoji,old_meaning,new_meaning,was_approved - Immutable audit log: Once written, audit entries can't be modified
- Reversibility: Always support
/undoor/revert <emoji>for recent changes - No bot token exposure: Use
.envfile (gitignored), not hardcoded secrets - Rate limit teaching: Prevent spam - one teach per user per 5 seconds
- Approval workflow: For shared systems, new meanings require mod approval before going live
Which phase should address it: Phase 3 (learning system)
Pitfall: PluralKit API Data Privacy and Personal Information Leakage
What goes wrong: When querying PluralKit API to verify Vivi's identity:
- System info becomes visible to other users through bot queries
- Member details (pronouns, description) could be displayed accidentally
- API errors expose system ID in stack traces
- Bot caches PluralKit data but doesn't respect privacy settings
Warning signs:
- System info visible when not intended
- Other users can query Vivi's system through bot commands
- Sensitive member data appears in error messages
- Bot stores outdated PluralKit data
Prevention:
- Minimal API queries: Only fetch what you need (member ID, not full profile)
- Cache respectfully: Store only user ID verification, not personal details
- Error handling: Don't expose system IDs or member names in error messages
- Privacy by default: Don't display any system info unless Vivi explicitly allows it
- Respect privacy settings: If Vivi's system is private, don't query it
- No logging of personal data: Filter logs to remove member names, descriptions
- Clear API use policy: Document what data you collect and why (for Vivi's consent)
Which phase should address it: Phase 1 (architecture) - design for privacy from the start
Translation Quality Pitfalls
Pitfall: Translations Feel Robotic or Lose Context
What goes wrong: Simple concatenation of emoji meanings produces awkward, stilted translations:
- "😀😍🎉" becomes "happy + love + party" (grammatically weird)
- Emoji in sequence don't flow naturally
- Context is lost (is this celebratory? sarcastic? sad?)
- Complex emoji (👨👩👧) get broken into confusing pieces
Warning signs:
- Translations feel hard to read
- Users prefer the original emoji over bot's translation
- Emoji sequences don't combine logically
- Accessibility readers struggle with the output
Prevention:
- Adaptive formatting: Group related emoji:
- Multiple emoji of same type → comma-separated ("happy, joyful, excited")
- Verb + object → natural phrasing ("loves X", "celebrates with")
- Emoji + punctuation → handle specially
- Context awareness: If Vivi teaches "😀 = amused at something", use that context
- Order matters: Preserve emoji order in translation, not alphabetical
- Natural language: Use connectors ("and", "with") not just commas
- Test readability: Read translation aloud - does it sound natural?
- User testing: Show translations to Vivi - do they capture intent?
Which phase should address it: Phase 3 (refinement) or Phase 4 (NLP)
Accessibility Pitfalls
Pitfall: Teaching System Too Text-Heavy for Users with Dysgraphia
What goes wrong: Emoji learning systems assume users can easily type complex commands and read long responses. For Vivi (with Dysgraphia):
- Typing is laborious and produces errors
- Long text responses are hard to parse
- No visual confirmation of what was taught
- Complex command syntax is hard to remember
- Alternative input methods not supported
Warning signs:
- Vivi avoids using teaching feature entirely
- Alternate system members always teach instead
- Teaching commands are frequently mistyped
- Vivi asks for repetition of bot responses
- Few emoji get added to dictionary
Prevention:
- Minimal typing required:
/teach 😀 happynot/teach --emoji 😀 --meaning "happy" --tags emotion - Visual confirmation: Show emoji in response for confirmation (eyes can process faster than text)
- Short responses: Max 1-2 sentences, not paragraphs
- Command aliases: Both
/teachand/learnwork for same function - Text-to-speech friendly: Use punctuation, avoid abbreviations, clear structure
- Reaction-based UI: "React with ✓ to confirm" instead of "Type 'yes'"
- Error recovery: If typo in emoji, bot suggests correction with reaction buttons
- Accessible defaults: Large, clear emoji in responses; emoji codes visible in text form too
- Alternative confirmation: "This emoji now means X. React ✓ to keep, or I'll delete in 5 sec"
Which phase should address it: Phase 3 (learning system UX)
Context: Vivi has Dysgraphia, affecting her ability to write and type. Design for minimal text input and maximum visual feedback.
Hosting and Infrastructure Pitfalls
Pitfall: Inadequate Infrastructure Leads to Downtime
What goes wrong: Small Discord bots often run on free/cheap hosting that can't handle scale:
- Heroku, Replit, Glitch shut down or have uptime issues
- Database goes down → translations stop working
- No monitoring → crashes go unnoticed for hours
- Single-point failure → bot death = translations unavailable
Warning signs:
- Bot goes offline unpredictably
- Slow response times during peak hours
- Database connection timeouts
- No way to know if bot is running
Prevention:
- Use paid hosting: AWS, Digital Ocean, Google Cloud - reliable infrastructure
- Database backup: Regular automated backups of emoji dictionary
- Health checks: Bot pings itself regularly; alerts if no response
- Logging: All errors logged to persistent storage, not just console
- Redundancy (future): For Phase 4+, consider running bot on 2 servers with failover
- Monitoring: Use tools like Sentry or DataDog to track crashes
- Graceful degradation: If database is down, serve cached emoji meanings
Which phase should address it: Phase 1 (setup) - establish reliable hosting immediately
Summary: Top Pitfalls to Watch For
1. Message Detection Reliability (Phase 1 - CRITICAL)
Unreliable detection of Vivi's messages through PluralKit webhook proxying leads to missed translations or false positives. Use webhook creator ID as source of truth, implement proper caching, and test edge cases.
2. Message Content Intent and Architecture (Phase 1 - CRITICAL)
Designing bot around passive message scanning when you may not get privileged intent approval. Plan for slash commands and interactions from the start; treat message content as optional.
3. Dictionary Quality Degradation (Phase 3 - HIGH)
User-contributed emoji meanings become nonsensical over time without validation, audit trails, and review processes. Implement these from day one, not as an afterthought.
4. Teaching Interface Complexity (Phase 3 - HIGH)
Text-heavy, complex teaching commands discourage use and frustrate Vivi (with Dysgraphia). Keep it simple: /teach emoji meaning. Show visual confirmation.
5. Rate Limiting and Scaling (Phase 2+ - MEDIUM)
Discord and PluralKit API limits hit unexpectedly when translating messages with many emoji. Implement caching, batch requests, and exponential backoff from Phase 2 onward.
6. Emoji Edge Cases (Phase 2 - MEDIUM)
Complex emoji with combining characters, skin tones, and zero-width joiners break naive parsing. Use proper emoji library, normalize input, test thoroughly.
7. Authorization and Security (Phase 3 - HIGH)
Teaching system without permission checks or audit trails leads to troll edits and data corruption. Require authentication, validate input, log everything.
8. Webhook Race Conditions (Phase 2+ - MEDIUM)
Simultaneous edits by Vivi and bot translation cause corruption. Post new translations instead of editing; queue requests with delay to avoid races.