Files

Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:02:32 -05:00

24 KiB

Raw Blame History

Pitfalls Research: Vivi Speech Translator

A comprehensive analysis of common mistakes in Discord bot development, with specific focus on PluralKit integration, emoji translation, and learning systems.

PluralKit Integration Pitfalls

Pitfall: Unreliable Vivi Message Detection (Webhook vs Direct Author Check)

What goes wrong: PluralKit proxies messages through webhooks with the member's name. Bots can detect Vivi's messages by either:

Checking if the author is the user ID (unreliable if webhook proxying is used)
Parsing the webhook username (fragile - can be modified)
Checking for proxy tags in message content (only works with bracket-style proxies)

The core issue: If detection logic mixes these approaches or doesn't account for webhook proxying edge cases, you'll get false positives (bot responds to non-Vivi messages) and false negatives (Vivi's messages don't translate).

Warning signs:

Bot randomly responds to messages from other PluralKit members
Vivi's proxied messages don't trigger translation but manually typed messages do
Bot responds during PluralKit testing/repoxying (reproxy command)
Inconsistent detection across channels with different permissions

Prevention:

Consistent source of truth: Decide on ONE reliable detection method - either webhook creator ID (most reliable) or member username in webhook
Query PluralKit API: When uncertain, use PluralKit's REST API to verify if a message came from Vivi's system (3/second rate limit for updates)
Cache member names: Store known proxy tag patterns for Vivi locally to reduce API calls
Test edge cases: Reproxy, message edits, reactions on webhook messages, and DMs from Vivi
Log detection failures: When detection fails, log the message author, webhook info, and detected proxy tags for debugging

Which phase should address it: Phase 1 (core message detection) - must be bulletproof before teaching system

API Rate Limit Note: PluralKit enforces:

10/second for GET requests (member lookup)
3/second for POST/PATCH/DELETE (system updates)
Use Dispatch Webhooks for event-driven updates instead of polling

Pitfall: Webhook Editing Race Conditions

What goes wrong: When Vivi edits her proxied message, your bot attempts to edit its translation simultaneously. Discord webhooks don't handle concurrent edits well:

Message edit by original proxy webhook and bot translation edit can conflict
Race conditions can cause message state corruption
Edited messages may revert to old content or show partial updates
Reactions added during the edit window may be lost

Warning signs:

Translations occasionally show old/incorrect emoji meanings after Vivi edits her message
Bot throws 500 errors when trying to edit webhook messages
Translation accuracy degrades when Vivi edits quickly
Missing reactions after Vivi's message is edited

Prevention:

Don't edit webhook messages directly: Instead, post a new translation message and delete the old one
Add edit detection: Use message.edited_at to detect changes, but DON'T race with Vivi
Queue edit requests: If Vivi edits, queue a job to re-translate after a 1-second delay to avoid simultaneous edits
Handle 500s gracefully: Treat webhook edit failures as "post new translation instead"
Add edit timestamps: Show "(edited)" in translation to indicate it's a response to an edit

Which phase should address it: Phase 2 (message handling) or Phase 3 (refinement)

Discord API Pitfalls

Pitfall: Message Content Intent Denial and Architecture Lock-In

What goes wrong: Discord requires message_content privileged intent to read emoji content in messages. Bots in 75+ servers must apply for approval. Common mistakes:

Building the bot assuming you'll get approval (you might not)
Designing architecture around passive message scanning instead of interactions
Failing to plan alternatives when approval is denied
Using @bot.event async def on_message() which conflicts with slash command handlers

Warning signs:

Bot works in testing but stops working after hitting 75 servers
Approval denial with no fallback plan
Message content suddenly inaccessible mid-development
Architecture rewrite needed to add slash commands later

Prevention:

Design for slash commands first: Use /translate <emoji> instead of passive scanning
Use Interactions API: Buttons, select menus, and slash commands don't require message content intent
Plan for denial: Have a fallback UI (buttons to trigger translation, not automatic)
Unverified bots get free access: Stay under 75 servers during development, or use unverified bot for testing
Document intent usage: Be ready to explain why your emoji translation bot needs message content (you'll need to for 75+ servers)
Prepare alternatives: Reactions or buttons as fallback if approval is denied

Which phase should address it: Phase 1 (architectural decisions)

Pitfall: Rate Limiting and Burst Requests

What goes wrong: Discord enforces global (50 req/sec) and per-route rate limits. Vivi Speech can hit limits when:

Translating messages with many emojis (multiple API lookups)
Multiple users triggering translations simultaneously
Teaching system saving entries rapidly
PluralKit API queries in addition to Discord API calls

Warning signs:

Bot suddenly goes silent (no responses)
429 (Too Many Requests) errors in logs
Delayed translations (multi-second latency)
Inconsistent behavior during peak usage

Prevention:

Cache emoji translations: Store learned meanings in-memory with TTL (time-to-live)
Batch emoji lookups: If translating a message with 5 emojis, don't make 5 API calls - batch them
Implement exponential backoff: When rate limited, wait with exponential delays (1s, 2s, 4s...)
Queue teaching commands: Don't save to database on every teach attempt - queue and batch writes
Monitor rate limit headers: Parse X-RateLimit-Remaining and X-RateLimit-Reset-After headers
Shard properly: Maintain ~1,000 guilds per shard maximum
Use caching layers: Redis or in-memory LRU cache for frequently translated emojis

Which phase should address it: Phase 2 (scaling) and Phase 3 (optimization)

Pitfall: Privilege and Permission Confusion

What goes wrong: Bots need specific permissions but devs often either:

Request too many permissions (users won't invite)
Request insufficient permissions (bot fails silently)
Don't verify permissions before action (command fails with cryptic error)
Don't check user permissions before teaching (malicious edits to dictionary)

Warning signs:

Bot invited but can't send messages
Teaching commands work in DM but not in server
Translation attempts fail silently with no error message
Non-mods can change emoji meanings

Prevention:

Minimal permission set: Only request send_messages, manage_messages (for deleting own messages), read_message_history
Check before acting: Verify bot has required permissions using message.channel.permissions_for(bot.user)
User permission checks: Only allow trusted users (mods, Vivi herself) to teach emoji meanings
Clear error messages: "I don't have permission to send messages here" instead of silent failure
Test on new servers: Invite bot to a test server with minimal permissions and verify all features work

Which phase should address it: Phase 1 (setup) and Phase 3 (moderation)

Learning System Pitfalls

Pitfall: Dictionary Quality Degradation Over Time

What goes wrong: User-contributed learning systems fail when:

Users add typos, slang, or inside jokes as emoji meanings
Duplicate or conflicting meanings accumulate (😀 = "happy", "smile", "goofy face")
Rarely-used emojis have outdated or weird meanings
No audit trail - can't track who broke the dictionary
Stale entries that were never useful remain forever

Warning signs:

Translations become nonsensical or off-topic
Conflicting definitions for same emoji (confusing translations)
Many emoji with zero meaningful translation
Teaching system abused by trolls or internal conflicts

Prevention:

Validation on teach: Check for minimum length (3 chars), no excessive emojis in meaning, no URLs
Audit trail: Log every emoji meaning change with timestamp, user_id, old_value, new_value
Review process: For shared systems, flag new meanings for mod approval before going live
Meaning versioning: Keep multiple meanings, let users vote/rank them (future: Phase 4)
Freshness markers: Track last-used date; prompt for re-confirmation if unused for 90+ days
Duplicate detection: Warn if adding a meaning similar to existing ones
Clear command output: Show current meaning before accepting new one, ask for confirmation

Which phase should address it: Phase 3 (learning) - implement audit trail from day one

Pitfall: Teaching Interface Too Complex or Text-Heavy

What goes wrong: Learning systems fail when the teaching UI is hard to use:

Complex command syntax that users forget
Too many options/flags (overwhelming)
No confirmation of what was taught
Text-heavy responses (bad for users with dysgraphia like Vivi)
No visual feedback (emoji shown in response)

Warning signs:

Few emoji meanings actually get added
Users give up and stop teaching
Confusion about command syntax
Vivi avoids using teaching feature
Other system members always teach instead

Prevention:

Simple one-liner commands: /teach 😀 happy not /teach --emoji 😀 --meaning "happy" --priority high
Visual confirmation: Include the emoji in the response ("Learned: 😀 = happy")
Show current meaning: "😀 currently means: happy | Update it? Type: /teach 😀 new meaning"
Short responses: Keep bot responses under 2 lines when possible
Use buttons over typing: React with checkmark/X for confirmation instead of "yes/no"
Emoji picker: If possible, allow selecting emoji by reaction instead of typing
Accessible syntax: Support aliases - /learn 😀 happy same as /teach 😀 happy

Which phase should address it: Phase 3 (learning system design)

Accessibility Note: Vivi has Dysgraphia, which affects writing ability. Keep commands short, use visual confirmation (emoji in responses), and minimize text output.

Pitfall: Scope Creep in Learning Features

What goes wrong: Learning systems start simple but can grow uncontrollably:

"Let's add multiple meanings per emoji" → complexity explosion
"Different meanings for different contexts" → database redesign
"Per-server emoji dictionaries" → multi-tenancy complexity
"Emoji meaning versioning and rollback" → audit log nightmare
"Machine learning to auto-generate meanings" → maintenance burden

Warning signs:

Feature backlog grows faster than you can implement
Core translation becomes slow/unreliable
Code becomes hard to understand and modify
New features break old ones
Team gets overwhelmed with edge cases

Prevention:

MVP scope: Phase 1-3 = simple one-meaning-per-emoji, all servers share same dictionary
Clear phase boundaries: Document what's in each phase; don't add features mid-phase
Say no to feature requests: Politely defer to Phase 4 or beyond
Keep it simple: One meaning per emoji, user teaches it once, it sticks
Future extensibility: Design database schema to support multiple meanings later, but don't implement it yet
Regular scope reviews: Every 2 weeks, ask: "Is this feature essential to core translation?"

Which phase should address it: Phase 1 (planning) - establish clear phase gates

Multi-Server and Data Pitfalls

Pitfall: Global Dictionary Conflicts Across Servers

What goes wrong: A shared emoji dictionary works well until different communities use the same emoji differently:

Server A uses 🎉 for "party"; Server B uses it for "achievement"
Emoji meanings don't match community context
Users from Server B are confused by Server A's translations
No way to override meanings per-server

Warning signs:

Users report wrong translations in their server
Emoji meanings conflict between communities
No way to customize meanings per-guild
One server's trolls ruin translations for everyone

Prevention:

Phase 1-2: Global only: Accept that all servers share one dictionary
Phase 4 planning: Design per-server override system (store in server_id:emoji key)
Document limitation: "Emoji meanings are shared across all servers - curate carefully"
Moderation: Have a trusted team that curates the global dictionary
Community rules: Require consensus or voting before changing popular emoji meanings
Meaning context: Store both the meaning AND its frequency/reliability (crowdsourced)

Which phase should address it: Phase 4 (advanced) - keep Phase 1-3 global-only

Emoji Handling Pitfalls

Pitfall: Unicode Representation Edge Cases and Combining Characters

What goes wrong: Emoji are more complex than they appear:

Some "single" emoji are multi-codepoint: 👨‍👩‍👧 (family) = 7 codepoints with zero-width joiners
Variation selectors (️) change emoji appearance: ❤ vs ❤️
Skin tone modifiers add extra codepoints: 👋 vs 👋🏻
Regex fails on complex emoji
String length in Python != visual emoji count

Warning signs:

Some emojis don't parse or get corrupted
Emoji combinations disappear or get split
Search for specific emoji sometimes fails
Emoji with skin tones treated as separate emojis

Prevention:

Use emoji library: Don't parse manually - use emoji package which understands combining characters
Normalize input: Normalize emoji to canonical form before storage/lookup (NFD normalization)
Test edge cases: Include in test suite:
- Family emoji (👨‍👩‍👧)
- Skin tone modifiers (👋🏻 through 👋🏿)
- Gendered variants (👨‍⚕️ vs 👩‍⚕️)
- Flags (🇺🇸 = 2 regional indicators)
- Keycap sequences (1️⃣)
Store as text, not codepoints: Keep emoji as Unicode strings in database, not split into codepoints
Validate emoji: Check if input is actually a valid emoji using emoji.is_emoji() before storing
Document supported emoji: Be explicit about which emoji are supported (all Unicode emoji, or subset?)

Which phase should address it: Phase 2 (core translation)

Reference: Complex emoji like 👨‍👩‍👧‍👦 (family) consist of multiple code points: ['0x00000031', '0x0000fe0f', '0x000020e3'] pattern. Never assume 1 emoji = 1 character.

Security Pitfalls

Pitfall: Hidden Command Privilege Escalation and Authorization Bypass

What goes wrong: Learning systems allow users to modify bot data. Common authorization mistakes:

No permission check - any user can teach emoji
No hierarchy check - regular user can override mod's meanings
Teaching command accepts unsafe input (SQL injection, command injection)
Audit trail incomplete - can't prove who made unauthorized changes
Bot token in environment exposed - full compromise

Warning signs:

Non-mods can modify emoji dictionary
Troll edits spread before mods notice
No way to revert malicious changes
Bot behaves unexpectedly with suspicious permissions
Dictionary contains offensive or misleading entries

Prevention:

Permission check every command: Verify user is mod/trusted before /teach
Whitelist approach: Only specific users (Vivi, trusted friends) can teach, not everyone
Input validation: Sanitize meaning text - no special chars, max length, filter profanity
Audit everything: Log user_id, timestamp, emoji, old_meaning, new_meaning, was_approved
Immutable audit log: Once written, audit entries can't be modified
Reversibility: Always support /undo or /revert <emoji> for recent changes
No bot token exposure: Use .env file (gitignored), not hardcoded secrets
Rate limit teaching: Prevent spam - one teach per user per 5 seconds
Approval workflow: For shared systems, new meanings require mod approval before going live

Which phase should address it: Phase 3 (learning system)

Pitfall: PluralKit API Data Privacy and Personal Information Leakage

What goes wrong: When querying PluralKit API to verify Vivi's identity:

System info becomes visible to other users through bot queries
Member details (pronouns, description) could be displayed accidentally
API errors expose system ID in stack traces
Bot caches PluralKit data but doesn't respect privacy settings

Warning signs:

System info visible when not intended
Other users can query Vivi's system through bot commands
Sensitive member data appears in error messages
Bot stores outdated PluralKit data

Prevention:

Minimal API queries: Only fetch what you need (member ID, not full profile)
Cache respectfully: Store only user ID verification, not personal details
Error handling: Don't expose system IDs or member names in error messages
Privacy by default: Don't display any system info unless Vivi explicitly allows it
Respect privacy settings: If Vivi's system is private, don't query it
No logging of personal data: Filter logs to remove member names, descriptions
Clear API use policy: Document what data you collect and why (for Vivi's consent)

Which phase should address it: Phase 1 (architecture) - design for privacy from the start

Translation Quality Pitfalls

Pitfall: Translations Feel Robotic or Lose Context

What goes wrong: Simple concatenation of emoji meanings produces awkward, stilted translations:

"😀😍🎉" becomes "happy + love + party" (grammatically weird)
Emoji in sequence don't flow naturally
Context is lost (is this celebratory? sarcastic? sad?)
Complex emoji (👨‍👩‍👧) get broken into confusing pieces

Warning signs:

Translations feel hard to read
Users prefer the original emoji over bot's translation
Emoji sequences don't combine logically
Accessibility readers struggle with the output

Prevention:

Adaptive formatting: Group related emoji:
- Multiple emoji of same type → comma-separated ("happy, joyful, excited")
- Verb + object → natural phrasing ("loves X", "celebrates with")
- Emoji + punctuation → handle specially
Context awareness: If Vivi teaches "😀 = amused at something", use that context
Order matters: Preserve emoji order in translation, not alphabetical
Natural language: Use connectors ("and", "with") not just commas
Test readability: Read translation aloud - does it sound natural?
User testing: Show translations to Vivi - do they capture intent?

Which phase should address it: Phase 3 (refinement) or Phase 4 (NLP)

Accessibility Pitfalls

Pitfall: Teaching System Too Text-Heavy for Users with Dysgraphia

What goes wrong: Emoji learning systems assume users can easily type complex commands and read long responses. For Vivi (with Dysgraphia):

Typing is laborious and produces errors
Long text responses are hard to parse
No visual confirmation of what was taught
Complex command syntax is hard to remember
Alternative input methods not supported

Warning signs:

Vivi avoids using teaching feature entirely
Alternate system members always teach instead
Teaching commands are frequently mistyped
Vivi asks for repetition of bot responses
Few emoji get added to dictionary

Prevention:

Minimal typing required: /teach 😀 happy not /teach --emoji 😀 --meaning "happy" --tags emotion
Visual confirmation: Show emoji in response for confirmation (eyes can process faster than text)
Short responses: Max 1-2 sentences, not paragraphs
Command aliases: Both /teach and /learn work for same function
Text-to-speech friendly: Use punctuation, avoid abbreviations, clear structure
Reaction-based UI: "React with ✓ to confirm" instead of "Type 'yes'"
Error recovery: If typo in emoji, bot suggests correction with reaction buttons
Accessible defaults: Large, clear emoji in responses; emoji codes visible in text form too
Alternative confirmation: "This emoji now means X. React ✓ to keep, or I'll delete in 5 sec"

Which phase should address it: Phase 3 (learning system UX)

Context: Vivi has Dysgraphia, affecting her ability to write and type. Design for minimal text input and maximum visual feedback.

Hosting and Infrastructure Pitfalls

Pitfall: Inadequate Infrastructure Leads to Downtime

What goes wrong: Small Discord bots often run on free/cheap hosting that can't handle scale:

Heroku, Replit, Glitch shut down or have uptime issues
Database goes down → translations stop working
No monitoring → crashes go unnoticed for hours
Single-point failure → bot death = translations unavailable

Warning signs:

Bot goes offline unpredictably
Slow response times during peak hours
Database connection timeouts
No way to know if bot is running

Prevention:

Use paid hosting: AWS, Digital Ocean, Google Cloud - reliable infrastructure
Database backup: Regular automated backups of emoji dictionary
Health checks: Bot pings itself regularly; alerts if no response
Logging: All errors logged to persistent storage, not just console
Redundancy (future): For Phase 4+, consider running bot on 2 servers with failover
Monitoring: Use tools like Sentry or DataDog to track crashes
Graceful degradation: If database is down, serve cached emoji meanings

Which phase should address it: Phase 1 (setup) - establish reliable hosting immediately

Summary: Top Pitfalls to Watch For

1. Message Detection Reliability (Phase 1 - CRITICAL)

Unreliable detection of Vivi's messages through PluralKit webhook proxying leads to missed translations or false positives. Use webhook creator ID as source of truth, implement proper caching, and test edge cases.

2. Message Content Intent and Architecture (Phase 1 - CRITICAL)

Designing bot around passive message scanning when you may not get privileged intent approval. Plan for slash commands and interactions from the start; treat message content as optional.

3. Dictionary Quality Degradation (Phase 3 - HIGH)

User-contributed emoji meanings become nonsensical over time without validation, audit trails, and review processes. Implement these from day one, not as an afterthought.

4. Teaching Interface Complexity (Phase 3 - HIGH)

Text-heavy, complex teaching commands discourage use and frustrate Vivi (with Dysgraphia). Keep it simple: /teach emoji meaning. Show visual confirmation.

5. Rate Limiting and Scaling (Phase 2+ - MEDIUM)

Discord and PluralKit API limits hit unexpectedly when translating messages with many emoji. Implement caching, batch requests, and exponential backoff from Phase 2 onward.

6. Emoji Edge Cases (Phase 2 - MEDIUM)

Complex emoji with combining characters, skin tones, and zero-width joiners break naive parsing. Use proper emoji library, normalize input, test thoroughly.

7. Authorization and Security (Phase 3 - HIGH)

Teaching system without permission checks or audit trails leads to troll edits and data corruption. Require authentication, validate input, log everything.

8. Webhook Race Conditions (Phase 2+ - MEDIUM)

Simultaneous edits by Vivi and bot translation cause corruption. Post new translations instead of editing; queue requests with delay to avoid races.

24 KiB Raw Blame History Unescape Escape

Pitfalls Research: Vivi Speech Translator

PluralKit Integration Pitfalls

Pitfall: Unreliable Vivi Message Detection (Webhook vs Direct Author Check)

Pitfall: Webhook Editing Race Conditions

Discord API Pitfalls

Pitfall: Message Content Intent Denial and Architecture Lock-In

Pitfall: Rate Limiting and Burst Requests

Pitfall: Privilege and Permission Confusion

Learning System Pitfalls

Pitfall: Dictionary Quality Degradation Over Time

Pitfall: Teaching Interface Too Complex or Text-Heavy

Pitfall: Scope Creep in Learning Features

Multi-Server and Data Pitfalls

Pitfall: Global Dictionary Conflicts Across Servers

Emoji Handling Pitfalls

Pitfall: Unicode Representation Edge Cases and Combining Characters

Security Pitfalls

Pitfall: Hidden Command Privilege Escalation and Authorization Bypass

Pitfall: PluralKit API Data Privacy and Personal Information Leakage

Translation Quality Pitfalls

Pitfall: Translations Feel Robotic or Lose Context

Accessibility Pitfalls

Pitfall: Teaching System Too Text-Heavy for Users with Dysgraphia

Hosting and Infrastructure Pitfalls

Pitfall: Inadequate Infrastructure Leads to Downtime

Summary: Top Pitfalls to Watch For

1. Message Detection Reliability (Phase 1 - CRITICAL)

2. Message Content Intent and Architecture (Phase 1 - CRITICAL)

3. Dictionary Quality Degradation (Phase 3 - HIGH)

4. Teaching Interface Complexity (Phase 3 - HIGH)

5. Rate Limiting and Scaling (Phase 2+ - MEDIUM)

6. Emoji Edge Cases (Phase 2 - MEDIUM)

7. Authorization and Security (Phase 3 - HIGH)

8. Webhook Race Conditions (Phase 2+ - MEDIUM)

Research Sources

24 KiB

Raw Blame History