Files
Vivi-Speech/.planning/research/PITFALLS.md
Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:02:32 -05:00

24 KiB
Raw Blame History

Pitfalls Research: Vivi Speech Translator

A comprehensive analysis of common mistakes in Discord bot development, with specific focus on PluralKit integration, emoji translation, and learning systems.


PluralKit Integration Pitfalls

Pitfall: Unreliable Vivi Message Detection (Webhook vs Direct Author Check)

What goes wrong: PluralKit proxies messages through webhooks with the member's name. Bots can detect Vivi's messages by either:

  • Checking if the author is the user ID (unreliable if webhook proxying is used)
  • Parsing the webhook username (fragile - can be modified)
  • Checking for proxy tags in message content (only works with bracket-style proxies)

The core issue: If detection logic mixes these approaches or doesn't account for webhook proxying edge cases, you'll get false positives (bot responds to non-Vivi messages) and false negatives (Vivi's messages don't translate).

Warning signs:

  • Bot randomly responds to messages from other PluralKit members
  • Vivi's proxied messages don't trigger translation but manually typed messages do
  • Bot responds during PluralKit testing/repoxying (reproxy command)
  • Inconsistent detection across channels with different permissions

Prevention:

  • Consistent source of truth: Decide on ONE reliable detection method - either webhook creator ID (most reliable) or member username in webhook
  • Query PluralKit API: When uncertain, use PluralKit's REST API to verify if a message came from Vivi's system (3/second rate limit for updates)
  • Cache member names: Store known proxy tag patterns for Vivi locally to reduce API calls
  • Test edge cases: Reproxy, message edits, reactions on webhook messages, and DMs from Vivi
  • Log detection failures: When detection fails, log the message author, webhook info, and detected proxy tags for debugging

Which phase should address it: Phase 1 (core message detection) - must be bulletproof before teaching system

API Rate Limit Note: PluralKit enforces:

  • 10/second for GET requests (member lookup)
  • 3/second for POST/PATCH/DELETE (system updates)
  • Use Dispatch Webhooks for event-driven updates instead of polling

Pitfall: Webhook Editing Race Conditions

What goes wrong: When Vivi edits her proxied message, your bot attempts to edit its translation simultaneously. Discord webhooks don't handle concurrent edits well:

  • Message edit by original proxy webhook and bot translation edit can conflict
  • Race conditions can cause message state corruption
  • Edited messages may revert to old content or show partial updates
  • Reactions added during the edit window may be lost

Warning signs:

  • Translations occasionally show old/incorrect emoji meanings after Vivi edits her message
  • Bot throws 500 errors when trying to edit webhook messages
  • Translation accuracy degrades when Vivi edits quickly
  • Missing reactions after Vivi's message is edited

Prevention:

  • Don't edit webhook messages directly: Instead, post a new translation message and delete the old one
  • Add edit detection: Use message.edited_at to detect changes, but DON'T race with Vivi
  • Queue edit requests: If Vivi edits, queue a job to re-translate after a 1-second delay to avoid simultaneous edits
  • Handle 500s gracefully: Treat webhook edit failures as "post new translation instead"
  • Add edit timestamps: Show "(edited)" in translation to indicate it's a response to an edit

Which phase should address it: Phase 2 (message handling) or Phase 3 (refinement)


Discord API Pitfalls

Pitfall: Message Content Intent Denial and Architecture Lock-In

What goes wrong: Discord requires message_content privileged intent to read emoji content in messages. Bots in 75+ servers must apply for approval. Common mistakes:

  • Building the bot assuming you'll get approval (you might not)
  • Designing architecture around passive message scanning instead of interactions
  • Failing to plan alternatives when approval is denied
  • Using @bot.event async def on_message() which conflicts with slash command handlers

Warning signs:

  • Bot works in testing but stops working after hitting 75 servers
  • Approval denial with no fallback plan
  • Message content suddenly inaccessible mid-development
  • Architecture rewrite needed to add slash commands later

Prevention:

  • Design for slash commands first: Use /translate <emoji> instead of passive scanning
  • Use Interactions API: Buttons, select menus, and slash commands don't require message content intent
  • Plan for denial: Have a fallback UI (buttons to trigger translation, not automatic)
  • Unverified bots get free access: Stay under 75 servers during development, or use unverified bot for testing
  • Document intent usage: Be ready to explain why your emoji translation bot needs message content (you'll need to for 75+ servers)
  • Prepare alternatives: Reactions or buttons as fallback if approval is denied

Which phase should address it: Phase 1 (architectural decisions)


Pitfall: Rate Limiting and Burst Requests

What goes wrong: Discord enforces global (50 req/sec) and per-route rate limits. Vivi Speech can hit limits when:

  • Translating messages with many emojis (multiple API lookups)
  • Multiple users triggering translations simultaneously
  • Teaching system saving entries rapidly
  • PluralKit API queries in addition to Discord API calls

Warning signs:

  • Bot suddenly goes silent (no responses)
  • 429 (Too Many Requests) errors in logs
  • Delayed translations (multi-second latency)
  • Inconsistent behavior during peak usage

Prevention:

  • Cache emoji translations: Store learned meanings in-memory with TTL (time-to-live)
  • Batch emoji lookups: If translating a message with 5 emojis, don't make 5 API calls - batch them
  • Implement exponential backoff: When rate limited, wait with exponential delays (1s, 2s, 4s...)
  • Queue teaching commands: Don't save to database on every teach attempt - queue and batch writes
  • Monitor rate limit headers: Parse X-RateLimit-Remaining and X-RateLimit-Reset-After headers
  • Shard properly: Maintain ~1,000 guilds per shard maximum
  • Use caching layers: Redis or in-memory LRU cache for frequently translated emojis

Which phase should address it: Phase 2 (scaling) and Phase 3 (optimization)


Pitfall: Privilege and Permission Confusion

What goes wrong: Bots need specific permissions but devs often either:

  • Request too many permissions (users won't invite)
  • Request insufficient permissions (bot fails silently)
  • Don't verify permissions before action (command fails with cryptic error)
  • Don't check user permissions before teaching (malicious edits to dictionary)

Warning signs:

  • Bot invited but can't send messages
  • Teaching commands work in DM but not in server
  • Translation attempts fail silently with no error message
  • Non-mods can change emoji meanings

Prevention:

  • Minimal permission set: Only request send_messages, manage_messages (for deleting own messages), read_message_history
  • Check before acting: Verify bot has required permissions using message.channel.permissions_for(bot.user)
  • User permission checks: Only allow trusted users (mods, Vivi herself) to teach emoji meanings
  • Clear error messages: "I don't have permission to send messages here" instead of silent failure
  • Test on new servers: Invite bot to a test server with minimal permissions and verify all features work

Which phase should address it: Phase 1 (setup) and Phase 3 (moderation)


Learning System Pitfalls

Pitfall: Dictionary Quality Degradation Over Time

What goes wrong: User-contributed learning systems fail when:

  • Users add typos, slang, or inside jokes as emoji meanings
  • Duplicate or conflicting meanings accumulate (😀 = "happy", "smile", "goofy face")
  • Rarely-used emojis have outdated or weird meanings
  • No audit trail - can't track who broke the dictionary
  • Stale entries that were never useful remain forever

Warning signs:

  • Translations become nonsensical or off-topic
  • Conflicting definitions for same emoji (confusing translations)
  • Many emoji with zero meaningful translation
  • Teaching system abused by trolls or internal conflicts

Prevention:

  • Validation on teach: Check for minimum length (3 chars), no excessive emojis in meaning, no URLs
  • Audit trail: Log every emoji meaning change with timestamp, user_id, old_value, new_value
  • Review process: For shared systems, flag new meanings for mod approval before going live
  • Meaning versioning: Keep multiple meanings, let users vote/rank them (future: Phase 4)
  • Freshness markers: Track last-used date; prompt for re-confirmation if unused for 90+ days
  • Duplicate detection: Warn if adding a meaning similar to existing ones
  • Clear command output: Show current meaning before accepting new one, ask for confirmation

Which phase should address it: Phase 3 (learning) - implement audit trail from day one


Pitfall: Teaching Interface Too Complex or Text-Heavy

What goes wrong: Learning systems fail when the teaching UI is hard to use:

  • Complex command syntax that users forget
  • Too many options/flags (overwhelming)
  • No confirmation of what was taught
  • Text-heavy responses (bad for users with dysgraphia like Vivi)
  • No visual feedback (emoji shown in response)

Warning signs:

  • Few emoji meanings actually get added
  • Users give up and stop teaching
  • Confusion about command syntax
  • Vivi avoids using teaching feature
  • Other system members always teach instead

Prevention:

  • Simple one-liner commands: /teach 😀 happy not /teach --emoji 😀 --meaning "happy" --priority high
  • Visual confirmation: Include the emoji in the response ("Learned: 😀 = happy")
  • Show current meaning: "😀 currently means: happy | Update it? Type: /teach 😀 new meaning"
  • Short responses: Keep bot responses under 2 lines when possible
  • Use buttons over typing: React with checkmark/X for confirmation instead of "yes/no"
  • Emoji picker: If possible, allow selecting emoji by reaction instead of typing
  • Accessible syntax: Support aliases - /learn 😀 happy same as /teach 😀 happy

Which phase should address it: Phase 3 (learning system design)

Accessibility Note: Vivi has Dysgraphia, which affects writing ability. Keep commands short, use visual confirmation (emoji in responses), and minimize text output.


Pitfall: Scope Creep in Learning Features

What goes wrong: Learning systems start simple but can grow uncontrollably:

  • "Let's add multiple meanings per emoji" → complexity explosion
  • "Different meanings for different contexts" → database redesign
  • "Per-server emoji dictionaries" → multi-tenancy complexity
  • "Emoji meaning versioning and rollback" → audit log nightmare
  • "Machine learning to auto-generate meanings" → maintenance burden

Warning signs:

  • Feature backlog grows faster than you can implement
  • Core translation becomes slow/unreliable
  • Code becomes hard to understand and modify
  • New features break old ones
  • Team gets overwhelmed with edge cases

Prevention:

  • MVP scope: Phase 1-3 = simple one-meaning-per-emoji, all servers share same dictionary
  • Clear phase boundaries: Document what's in each phase; don't add features mid-phase
  • Say no to feature requests: Politely defer to Phase 4 or beyond
  • Keep it simple: One meaning per emoji, user teaches it once, it sticks
  • Future extensibility: Design database schema to support multiple meanings later, but don't implement it yet
  • Regular scope reviews: Every 2 weeks, ask: "Is this feature essential to core translation?"

Which phase should address it: Phase 1 (planning) - establish clear phase gates


Multi-Server and Data Pitfalls

Pitfall: Global Dictionary Conflicts Across Servers

What goes wrong: A shared emoji dictionary works well until different communities use the same emoji differently:

  • Server A uses 🎉 for "party"; Server B uses it for "achievement"
  • Emoji meanings don't match community context
  • Users from Server B are confused by Server A's translations
  • No way to override meanings per-server

Warning signs:

  • Users report wrong translations in their server
  • Emoji meanings conflict between communities
  • No way to customize meanings per-guild
  • One server's trolls ruin translations for everyone

Prevention:

  • Phase 1-2: Global only: Accept that all servers share one dictionary
  • Phase 4 planning: Design per-server override system (store in server_id:emoji key)
  • Document limitation: "Emoji meanings are shared across all servers - curate carefully"
  • Moderation: Have a trusted team that curates the global dictionary
  • Community rules: Require consensus or voting before changing popular emoji meanings
  • Meaning context: Store both the meaning AND its frequency/reliability (crowdsourced)

Which phase should address it: Phase 4 (advanced) - keep Phase 1-3 global-only


Emoji Handling Pitfalls

Pitfall: Unicode Representation Edge Cases and Combining Characters

What goes wrong: Emoji are more complex than they appear:

  • Some "single" emoji are multi-codepoint: 👨‍👩‍👧 (family) = 7 codepoints with zero-width joiners
  • Variation selectors () change emoji appearance: ❤ vs ❤️
  • Skin tone modifiers add extra codepoints: 👋 vs 👋🏻
  • Regex fails on complex emoji
  • String length in Python != visual emoji count

Warning signs:

  • Some emojis don't parse or get corrupted
  • Emoji combinations disappear or get split
  • Search for specific emoji sometimes fails
  • Emoji with skin tones treated as separate emojis

Prevention:

  • Use emoji library: Don't parse manually - use emoji package which understands combining characters
  • Normalize input: Normalize emoji to canonical form before storage/lookup (NFD normalization)
  • Test edge cases: Include in test suite:
    • Family emoji (👨‍👩‍👧)
    • Skin tone modifiers (👋🏻 through 👋🏿)
    • Gendered variants (👨‍⚕️ vs 👩‍⚕️)
    • Flags (🇺🇸 = 2 regional indicators)
    • Keycap sequences (1)
  • Store as text, not codepoints: Keep emoji as Unicode strings in database, not split into codepoints
  • Validate emoji: Check if input is actually a valid emoji using emoji.is_emoji() before storing
  • Document supported emoji: Be explicit about which emoji are supported (all Unicode emoji, or subset?)

Which phase should address it: Phase 2 (core translation)

Reference: Complex emoji like 👨‍👩‍👧‍👦 (family) consist of multiple code points: ['0x00000031', '0x0000fe0f', '0x000020e3'] pattern. Never assume 1 emoji = 1 character.


Security Pitfalls

Pitfall: Hidden Command Privilege Escalation and Authorization Bypass

What goes wrong: Learning systems allow users to modify bot data. Common authorization mistakes:

  • No permission check - any user can teach emoji
  • No hierarchy check - regular user can override mod's meanings
  • Teaching command accepts unsafe input (SQL injection, command injection)
  • Audit trail incomplete - can't prove who made unauthorized changes
  • Bot token in environment exposed - full compromise

Warning signs:

  • Non-mods can modify emoji dictionary
  • Troll edits spread before mods notice
  • No way to revert malicious changes
  • Bot behaves unexpectedly with suspicious permissions
  • Dictionary contains offensive or misleading entries

Prevention:

  • Permission check every command: Verify user is mod/trusted before /teach
  • Whitelist approach: Only specific users (Vivi, trusted friends) can teach, not everyone
  • Input validation: Sanitize meaning text - no special chars, max length, filter profanity
  • Audit everything: Log user_id, timestamp, emoji, old_meaning, new_meaning, was_approved
  • Immutable audit log: Once written, audit entries can't be modified
  • Reversibility: Always support /undo or /revert <emoji> for recent changes
  • No bot token exposure: Use .env file (gitignored), not hardcoded secrets
  • Rate limit teaching: Prevent spam - one teach per user per 5 seconds
  • Approval workflow: For shared systems, new meanings require mod approval before going live

Which phase should address it: Phase 3 (learning system)


Pitfall: PluralKit API Data Privacy and Personal Information Leakage

What goes wrong: When querying PluralKit API to verify Vivi's identity:

  • System info becomes visible to other users through bot queries
  • Member details (pronouns, description) could be displayed accidentally
  • API errors expose system ID in stack traces
  • Bot caches PluralKit data but doesn't respect privacy settings

Warning signs:

  • System info visible when not intended
  • Other users can query Vivi's system through bot commands
  • Sensitive member data appears in error messages
  • Bot stores outdated PluralKit data

Prevention:

  • Minimal API queries: Only fetch what you need (member ID, not full profile)
  • Cache respectfully: Store only user ID verification, not personal details
  • Error handling: Don't expose system IDs or member names in error messages
  • Privacy by default: Don't display any system info unless Vivi explicitly allows it
  • Respect privacy settings: If Vivi's system is private, don't query it
  • No logging of personal data: Filter logs to remove member names, descriptions
  • Clear API use policy: Document what data you collect and why (for Vivi's consent)

Which phase should address it: Phase 1 (architecture) - design for privacy from the start


Translation Quality Pitfalls

Pitfall: Translations Feel Robotic or Lose Context

What goes wrong: Simple concatenation of emoji meanings produces awkward, stilted translations:

  • "😀😍🎉" becomes "happy + love + party" (grammatically weird)
  • Emoji in sequence don't flow naturally
  • Context is lost (is this celebratory? sarcastic? sad?)
  • Complex emoji (👨‍👩‍👧) get broken into confusing pieces

Warning signs:

  • Translations feel hard to read
  • Users prefer the original emoji over bot's translation
  • Emoji sequences don't combine logically
  • Accessibility readers struggle with the output

Prevention:

  • Adaptive formatting: Group related emoji:
    • Multiple emoji of same type → comma-separated ("happy, joyful, excited")
    • Verb + object → natural phrasing ("loves X", "celebrates with")
    • Emoji + punctuation → handle specially
  • Context awareness: If Vivi teaches "😀 = amused at something", use that context
  • Order matters: Preserve emoji order in translation, not alphabetical
  • Natural language: Use connectors ("and", "with") not just commas
  • Test readability: Read translation aloud - does it sound natural?
  • User testing: Show translations to Vivi - do they capture intent?

Which phase should address it: Phase 3 (refinement) or Phase 4 (NLP)


Accessibility Pitfalls

Pitfall: Teaching System Too Text-Heavy for Users with Dysgraphia

What goes wrong: Emoji learning systems assume users can easily type complex commands and read long responses. For Vivi (with Dysgraphia):

  • Typing is laborious and produces errors
  • Long text responses are hard to parse
  • No visual confirmation of what was taught
  • Complex command syntax is hard to remember
  • Alternative input methods not supported

Warning signs:

  • Vivi avoids using teaching feature entirely
  • Alternate system members always teach instead
  • Teaching commands are frequently mistyped
  • Vivi asks for repetition of bot responses
  • Few emoji get added to dictionary

Prevention:

  • Minimal typing required: /teach 😀 happy not /teach --emoji 😀 --meaning "happy" --tags emotion
  • Visual confirmation: Show emoji in response for confirmation (eyes can process faster than text)
  • Short responses: Max 1-2 sentences, not paragraphs
  • Command aliases: Both /teach and /learn work for same function
  • Text-to-speech friendly: Use punctuation, avoid abbreviations, clear structure
  • Reaction-based UI: "React with ✓ to confirm" instead of "Type 'yes'"
  • Error recovery: If typo in emoji, bot suggests correction with reaction buttons
  • Accessible defaults: Large, clear emoji in responses; emoji codes visible in text form too
  • Alternative confirmation: "This emoji now means X. React ✓ to keep, or I'll delete in 5 sec"

Which phase should address it: Phase 3 (learning system UX)

Context: Vivi has Dysgraphia, affecting her ability to write and type. Design for minimal text input and maximum visual feedback.


Hosting and Infrastructure Pitfalls

Pitfall: Inadequate Infrastructure Leads to Downtime

What goes wrong: Small Discord bots often run on free/cheap hosting that can't handle scale:

  • Heroku, Replit, Glitch shut down or have uptime issues
  • Database goes down → translations stop working
  • No monitoring → crashes go unnoticed for hours
  • Single-point failure → bot death = translations unavailable

Warning signs:

  • Bot goes offline unpredictably
  • Slow response times during peak hours
  • Database connection timeouts
  • No way to know if bot is running

Prevention:

  • Use paid hosting: AWS, Digital Ocean, Google Cloud - reliable infrastructure
  • Database backup: Regular automated backups of emoji dictionary
  • Health checks: Bot pings itself regularly; alerts if no response
  • Logging: All errors logged to persistent storage, not just console
  • Redundancy (future): For Phase 4+, consider running bot on 2 servers with failover
  • Monitoring: Use tools like Sentry or DataDog to track crashes
  • Graceful degradation: If database is down, serve cached emoji meanings

Which phase should address it: Phase 1 (setup) - establish reliable hosting immediately


Summary: Top Pitfalls to Watch For

1. Message Detection Reliability (Phase 1 - CRITICAL)

Unreliable detection of Vivi's messages through PluralKit webhook proxying leads to missed translations or false positives. Use webhook creator ID as source of truth, implement proper caching, and test edge cases.

2. Message Content Intent and Architecture (Phase 1 - CRITICAL)

Designing bot around passive message scanning when you may not get privileged intent approval. Plan for slash commands and interactions from the start; treat message content as optional.

3. Dictionary Quality Degradation (Phase 3 - HIGH)

User-contributed emoji meanings become nonsensical over time without validation, audit trails, and review processes. Implement these from day one, not as an afterthought.

4. Teaching Interface Complexity (Phase 3 - HIGH)

Text-heavy, complex teaching commands discourage use and frustrate Vivi (with Dysgraphia). Keep it simple: /teach emoji meaning. Show visual confirmation.

5. Rate Limiting and Scaling (Phase 2+ - MEDIUM)

Discord and PluralKit API limits hit unexpectedly when translating messages with many emoji. Implement caching, batch requests, and exponential backoff from Phase 2 onward.

6. Emoji Edge Cases (Phase 2 - MEDIUM)

Complex emoji with combining characters, skin tones, and zero-width joiners break naive parsing. Use proper emoji library, normalize input, test thoroughly.

7. Authorization and Security (Phase 3 - HIGH)

Teaching system without permission checks or audit trails leads to troll edits and data corruption. Require authentication, validate input, log everything.

8. Webhook Race Conditions (Phase 2+ - MEDIUM)

Simultaneous edits by Vivi and bot translation cause corruption. Post new translations instead of editing; queue requests with delay to avoid races.


Research Sources