Files

Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:02:32 -05:00

23 KiB

Raw Blame History

Features Research: Vivi Speech Translator

Executive Summary

The Vivi Speech Translator should combine table-stakes Discord bot features with a hybrid translation approach that starts rule-based but can learn and improve over time. The core differentiator is intelligent emoji-to-text translation that understands Vivi's unique communication system, enabled by a learning mechanism that teaches the bot new emoji meanings while maintaining deterministic, debuggable behavior for known sequences.

Table Stakes Features

These are features Discord bot users expect and that are necessary for basic functionality:

Message Event Handling

Description: Detect and respond to messages containing emojis from Vivi
Complexity: Low
Dependencies: Discord.py or discord.js framework
Why Essential: Without this, the bot cannot observe the messages it needs to translate
Implementation: Monitor on_message events and filter for messages containing emoji sequences

Emoji Parsing

Description: Parse both standard Unicode emojis and Discord custom emojis from message content
Complexity: Low
Dependencies: Emoji library (emoji, discord.py emoji utilities)
Why Essential: Must accurately identify which emojis are present to begin translation
Implementation: Use regex patterns to find \d+ custom emoji IDs and standard emoji Unicode

Reply/Response Messages

Description: Send translation responses back to chat or as replies
Complexity: Low
Dependencies: Message API (Discord.py message.reply() or create_message())
Why Essential: Users need to see the translation output
Implementation: Format translations as clear, readable text messages; optionally use embeds for rich formatting

Command Interface

Description: Allow teaching the bot new emoji meanings via commands
Complexity: Medium
Dependencies: Command parser, permission checking
Why Essential: Enables the learning system that makes the bot scale without hardcoding every emoji
Implementation: Slash commands or hybrid prefix/slash commands (see "Message Handling Modes" below)

Per-Server Configuration

Description: Store server-specific settings (translation mode, custom emoji meanings)
Complexity: Medium
Dependencies: Database (SQLite for small scale, PostgreSQL for production)
Why Essential: Different servers have different preferences for how verbose translations should be
Implementation: Simple key-value store per guild_id (server ID)

Rate Limiting & Error Handling

Description: Gracefully handle Discord API limits and network errors
Complexity: Medium
Dependencies: Framework error handling
Why Essential: Prevents bot crashes and makes service reliable
Implementation: Exponential backoff for failed API calls, catch timeouts

Differentiating Features

These features set Vivi's translator apart from generic Discord bots:

Learning System (Rule-Based Foundation with Growth)

Description: Bot learns emoji meanings when explicitly taught by users
Complexity: Medium
Why This Differentiates: Makes the translator sustainable for Vivi's unique and evolving emoji language without requiring the developer to manually hardcode every sequence
Constraints:
- No implicit learning: Never infer emoji meanings from context—require explicit teaching
- Explicit confirmation: Always confirm back what was learned so users can verify correctness
- Easy correction: Provide /correct :emoji: new_meaning for when meanings change
- Community contribution: Allow anyone in the server to teach, not just Vivi
Database: Store emoji meanings with metadata (who taught it, when, how many uses)
Feedback loop: Track which translations are most helpful, identify ambiguous emojis

Emoji Sequence Detection

Description: Recognize sequences of emojis that form compound meanings (e.g., 👩‍💻📱 = "coding on phone")
Complexity: Medium
Why This Differentiates: Vivi likely uses emoji clusters; simple one-to-one mapping isn't enough
Implementation Approach:
- Define sequence patterns (consecutive emojis with or without separators)
- Store meanings for common sequences
- Fall back to individual emoji meanings if no sequence match
- Allow users to teach sequences via /teach :emoji1: :emoji2: compound_meaning

Context-Aware Translation Formatting

Description: Vary translation output based on conversational context
Complexity: High
Why This Differentiates: Translations that feel natural, not robotic; adapts tone to channel context
Examples:
- In #general: "Vivi is having fun 😊"
- In #serious-discussion: "Expressing contentment and readiness"
- Response variation: Sometimes expand, sometimes summarize based on recent context
Implementation: Store channel context settings; analyze surrounding messages for tone

PluralKit Integration (Optional but Recommended)

Description: Detect which alter is communicating via PluralKit webhook metadata
Complexity: High
Why This Differentiates: Essential for communities where Vivi shares an account with headmates; respects system identity
How It Works:
- PluralKit creates webhooks with system/member metadata in username
- Bot can parse webhook source_guild_id and custom embed footers to identify system member
- Enables "Vivi says: [translation]" vs generic bot response
Limitations: Requires PluralKit to be active in server; only works with PluralKit proxy format

Translation Quality Tracking

Description: Monitor which translations get positive vs negative reactions
Complexity: Medium
Why This Differentiates: Enables continuous improvement and identifies ambiguous emojis needing clarification
Implementation:
- Store emoji usage statistics (frequency, accuracy ratings)
- Provide admin dashboard /emoji_stats to see problematic emojis
- Optionally flag ambiguous emojis for human review

Global Dictionary Option (Future Differentiator)

Description: Share emoji meanings across servers with opt-in
Complexity: High
Why This Differentiates: Could benefit other systems and communities; positions Vivi's system as a standard
Constraints:
- Privacy-first: Only share if opted-in by server admin
- Vivi-specific: Focus on Vivi's emoji system, not generic emoji translation
- Conflict resolution: Last-teaching-wins or voting system for disagreements

Translation Approaches Analyzed

Rule-Based Pattern Matching

How it works: Explicit rules define emoji → text mappings. New mappings must be explicitly added.

Pros:

Fully deterministic and debuggable
Fast performance (no ML overhead)
Transparent to users (users understand why emoji means what)
Easy to correct mistakes (just update the rule)

Cons:

Requires explicit teaching for every emoji/sequence
Doesn't generalize to patterns the bot hasn't seen
Becomes unwieldy with hundreds of emoji combinations

Statistical/ML-Based Approach

How it works: Train a model on known emoji → text pairs, predict meanings for unknown emojis using similarity or context.

Pros:

Can handle novel emoji combinations through pattern inference
Generalizes from limited training data
Captures semantic relationships between emoji meanings

Cons:

Black-box behavior ("why did the bot translate it that way?")
Requires significant training data to work well
Harder to debug when translations are wrong
Users don't understand or trust the mechanism

Recommended: Hybrid Approach (Rule-Based + Fallback)

Phase 1 (MVP): Rule-Based with Community Learning

Start with hardcoded emoji meanings for common sequences
Allow users to teach new emojis via /teach command
100% transparent: users know exactly why each translation happens
Fast, reliable, debuggable

Phase 2 (Future): Statistical Fallback

Analyze emoji usage patterns in learned meanings
If emoji appears in multiple compounds, infer partial meanings
Use embedding-based similarity to suggest translations for unknown emoji sequences
Always show confidence scores; require confirmation before using inferred meanings

Phase 3 (Long-term): Continuous Learning

Track user corrections and positive/negative reactions
Retrain fallback model on accepted vs rejected translations
Identify consistently ambiguous emojis for human review
Adjust translation format based on what's most helpful per server

Why This Is Best:

Starts simple and user-friendly
Scales to hundreds of emojis through learning
Maintains trust through transparency
Enables improvement over time without requiring ML expertise

Message Handling Modes

Discord bots can respond to messages in different ways. Choose the approach that best serves Vivi's community:

Mode 1: Automatic Translation (On Every Message)

How it works: Bot automatically translates every message from Vivi (or messages with emoji content)

Pros:

Instant understanding without extra steps
No friction for casual readers
Good for channels where translation is the main purpose

Cons:

Can be noisy in mixed-audience channels
Spoils the "reading Vivi directly" experience for community members who prefer it
Uses Discord API quota faster

Best For: Translation channels (#vivi-translated or similar)

Mode 2: On-Demand Translation (Reaction or Command)

How it works: Users react with a specific emoji or use /translate command to request translation

Pros:

Keeps channels clean by default
Respects users who want to interpret emojis themselves
Lower API usage
More intentional interaction

Cons:

Extra step for users
May miss important messages if people forget to request
Less discoverable for new community members

Best For: Social channels where emoji is part of fun, not solely for understanding

Mode 3: Toggle (Per-Server Setting)

How it works: Server admins choose between automatic or on-demand via /settings

Pros:

Respects different community preferences
Maximizes adoption across servers with different cultures
Can differentiate channels (auto in #vivi-translations, on-demand in #general)

Cons:

More complex to implement
Users must learn about settings

Recommendation for V1: Implement Mode 3 with default Mode 1 (automatic). Let server admins customize via:

/settings translation-mode [auto|on-demand]
/settings translate-channels [list of channel IDs] for auto mode

Implementation: Message Handling

Event: Discord on_message event
Filter: Check for emoji content using regex: <a?:[a-zA-Z0-9_]{1,32}:[0-9]{18,20}> (custom) and Unicode emoji
Action: Call translation engine, format output, send reply

Command Interface: Slash Commands vs Prefix

Recommendation: Use slash commands as primary, offer hybrid support

Why slash commands:

Modern Discord standard (easier discoverability)
No Message Content intent required (better privacy)
Built-in autocomplete for parameters
Better for /teach :emoji: meaning (emoji picker integration)

Key slash commands for Vivi:

/teach :emoji: meaning — Add emoji to dictionary
/translate [emoji-string] — Manually trigger translation
/what :emoji: — Look up emoji meaning
/correct :emoji: new_meaning — Fix a taught emoji
/settings — Server configuration
/emoji-stats — View accuracy/usage statistics

Learning Interface & Feedback System

The learning system is crucial for scalability and adoption. Here's the recommended approach:

Teaching Commands

/teach :emoji1: meaning
→ "Learned! 🎓 I'll translate :emoji1: as 'meaning'"

/teach :emoji1: :emoji2: compound meaning
→ "Learned! 🎓 I'll translate :emoji1: :emoji2: as 'compound meaning'"

Correction System

/correct :emoji: new_meaning
→ "Updated! ✏️ I'll now translate :emoji: as 'new_meaning' (was 'old meaning')"

Query System

/what :emoji:
→ "Emoji meaning for :emoji: is 'definition'\nTaught by @User on 2024-01-15\nUsed 47 times this month"

Validation & Confirmation

Always repeat back what was learned
Show who taught it and when (build community recognition)
Optionally show confidence if from ML fallback
Highlight ambiguous meanings if same emoji has multiple teachings

Feedback Mechanisms

Reaction-based: Users react with ✅/❌ to translations
- Track which emojis get positive vs negative reactions
- Identify consistently wrong translations for correction
Correction Commands: /correct :emoji: new_meaning explicitly fixes errors
- Creates audit trail of meaning changes
- Enables tracking learning over time
Conflict Resolution: If multiple teachings for same emoji
- Show all known meanings with vote counts
- Use most recent teaching by default, surface conflicts
- Option: /disambiguate :emoji: choose_meaning to select preferred one

Best Practices

Community over individual: Encourage anyone to teach, not just Vivi or admins
Transparency: Always show source of taught meanings
Auditability: Maintain history of meaning changes
Disambiguation: Flag emojis with conflicting meanings early
Escalation: Provide /report-ambiguous :emoji: for admin review

Accessibility Considerations

Translation bots serve accessibility functions, so they must be accessible themselves:

Text Accessibility

Plain text output: Always provide plain text translations, not just embeds
No emoji-only responses: Never respond with just emoji; always include text
Clear language: Use simple, direct language in translations (avoid jargon)
Consistent formatting: Same emoji always translates the same way (aids screen reader prediction)

Discord Accessibility

Slash commands: Easier for keyboard navigation than prefix commands
Accessible embeds: If using embeds for formatted output:
- Include plain text alternative in message content
- Avoid using embeds for critical information
- Note: Discord embeds cannot have alt-text for images—only use text-based embeds

Emoji descriptions: Include what each emoji is called (e.g., "woman technologist emoji")
Sequence clarity: When translating compound sequences, explain the combination
No hidden information: Never put crucial meaning in embed footers or nested fields

Examples of Accessible Responses

Good:

Vivi: 👩‍💻📱
Bot: (woman technologist emoji, mobile phone emoji)
Translates to: "coding on phone" or "responding to work messages"

Bad:

Vivi: 👩‍💻📱
Bot: [embed with only emoji in footer, no text explanation]

Implementation

Test output with screen readers (NVDA, JAWS)
Provide alternative text format via /translate --verbose for complex sequences
Include emoji names in debug/development output

Anti-Features (What NOT to Build)

These features sound good but should be avoided:

Anti-Feature 1: Persistent Context Learning

What it is: Bot infers emoji meanings from conversation context without explicit teaching
Why not:
- Creates non-deterministic behavior (same emoji means different things in different contexts)
- Impossible to debug or correct
- Users don't understand the bot's logic
- High error rate leads to loss of trust
Better approach: Explicit /teach commands only

Anti-Feature 2: Cross-Discord Emoji Translation

What it is: Translate emojis the same way across all Discord servers
Why not:
- Emoji meanings are highly personal and context-dependent
- Vivi's system is specific to her community
- Would bloat the dictionary with conflicting meanings
- Not scalable for other users' emoji systems
Better approach: Per-server dictionaries with optional public sharing for Vivi's specific system

Anti-Feature 3: Real-Time Chat Simulation

What it is: Bot attempts to continue Vivi's conversation or generate new emoji sequences
Why not:
- Out of scope (translation, not generation)
- Risk of impersonation
- Confusion about what Vivi actually said vs what bot generated
- Community prefers Vivi's authentic communication
Better approach: Stick to translating Vivi's actual messages

Anti-Feature 4: Full NLP Context Analysis

What it is: Use complex NLP to understand message context and vary translations
Why not:
- Over-engineering for the problem
- Adds maintenance burden
- Makes behavior unpredictable
- Initial rule-based approach is more trustworthy
Better approach: Simple context hints (channel type, time of day) with explicit teaching

Configuration Per-Server

Different servers may have different translation preferences:

Server Settings to Store

guild_id: 12345678
settings:
  translation_mode: "auto"  # or "on-demand"
  auto_channels: [chan_id_1, chan_id_2]  # channels where auto-translation is enabled
  verbose_translations: false  # expand vs summarize
  show_confidence: false  # show certainty for learned meanings
  allow_community_teaching: true  # can non-mods teach emoji meanings?
  default_language: "en"  # future: support other languages
  include_emoji_names: true  # include emoji name for accessibility

Database Schema

Emoji_Meanings:
  id: UUID
  guild_id: int
  emoji_unicode: str (or custom_emoji_id)
  meaning: str
  taught_by_user_id: int
  taught_at: timestamp
  usage_count: int
  accuracy_rating: float (0-1, from reactions)
  is_sequence: bool
  confidence: float (1.0 for taught, < 1.0 for inferred)

Guild_Settings:
  guild_id: int
  translation_mode: str
  auto_channels: json
  ...

Emoji_Statistics:
  emoji_id: UUID
  guild_id: int
  total_uses: int
  positive_reactions: int
  negative_reactions: int
  conflicts_count: int
  last_updated: timestamp

Database Choice Recommendation

Development/Small Scale (< 100 servers): SQLite with Keyv abstraction
Production (100+ servers): PostgreSQL with connection pooling
Real-Time Stats (future): Redis for caching popular emoji definitions

Configuration Commands

/settings translation-mode [auto|on-demand]
/settings auto-channels [#channel-list]
/settings verbose [true|false]
/settings allow-teaching [true|false]

Implementation Roadmap

MVP (Phase 1): Foundation

Features: Message detection, basic emoji dictionary, /teach command, on-demand translation
Complexity: Medium
Timeline: 2-3 weeks
Core: Rule-based translations only; no ML

V1 (Phase 2): Polished & Learning-Ready

Add: Per-server settings, /what query, /correct command, reaction-based feedback
Add: Accessibility improvements (emoji names, plain text)
Add: Basic statistics (/emoji-stats)
Timeline: 3-4 weeks
Focus: Community testing and meaning refinement

V2 (Phase 3): Smart Fallback (Optional)

Add: Statistical fallback for unknown emoji sequences
Add: Confidence scores for inferred meanings
Add: Emoji conflict detection and disambiguation
Add: Optional global dictionary sharing
Timeline: 4-6 weeks
Focus: Scalability and reduced manual maintenance

Summary

The Vivi Speech Translator should be built as a hybrid system:

Start with deterministic, rule-based translation that's fully transparent and debuggable
Enable community learning via simple /teach commands that grow the dictionary organically
Provide feedback mechanisms (reactions, corrections) to improve accuracy over time
Remain focused on Vivi's specific emoji system, not generic emoji translation
Prioritize accessibility since translation itself is an accessibility feature
Leave room for future ML enhancement but don't build it until needed

The core differentiator is not sophisticated AI, but intentional design for learning and community participation. The bot becomes more valuable as more people teach it, creating network effects that benefit the whole community.

Recommended Feature Set for V1

✅ Message detection and emoji parsing
✅ Rule-based translation with /teach command
✅ Per-server configuration (auto vs on-demand mode)
✅ Correction and query commands (/what, /correct)
✅ Reaction-based feedback (✅/❌)
✅ Accessible output (plain text, emoji names)
✅ PluralKit integration (if Vivi's community uses it)
⏭️ Statistics dashboard (Phase 2)
⏭️ ML fallback (Phase 3+, if needed)

23 KiB Raw Blame History

Features Research: Vivi Speech Translator

Executive Summary

Table Stakes Features

Message Event Handling

Emoji Parsing

Reply/Response Messages

Command Interface

Per-Server Configuration

Rate Limiting & Error Handling

Differentiating Features

Learning System (Rule-Based Foundation with Growth)

Emoji Sequence Detection

Context-Aware Translation Formatting

PluralKit Integration (Optional but Recommended)

Translation Quality Tracking

Global Dictionary Option (Future Differentiator)

Translation Approaches Analyzed

Rule-Based Pattern Matching

Statistical/ML-Based Approach

Recommended: Hybrid Approach (Rule-Based + Fallback)

Message Handling Modes

Mode 1: Automatic Translation (On Every Message)

Mode 2: On-Demand Translation (Reaction or Command)

Mode 3: Toggle (Per-Server Setting)

Implementation: Message Handling

Command Interface: Slash Commands vs Prefix

Learning Interface & Feedback System

Teaching Commands

Correction System

Query System

Validation & Confirmation

Feedback Mechanisms

Best Practices

Accessibility Considerations

Text Accessibility

Discord Accessibility

Screen Reader Compatibility

Examples of Accessible Responses

Implementation

Anti-Features (What NOT to Build)

Anti-Feature 1: Persistent Context Learning

Anti-Feature 2: Cross-Discord Emoji Translation

Anti-Feature 3: Real-Time Chat Simulation

Anti-Feature 4: Full NLP Context Analysis

Configuration Per-Server

Server Settings to Store

Database Schema

Database Choice Recommendation

Configuration Commands

Implementation Roadmap

MVP (Phase 1): Foundation

V1 (Phase 2): Polished & Learning-Ready

V2 (Phase 3): Smart Fallback (Optional)

Summary

Recommended Feature Set for V1

Sources & References

Discord Bot Development

Slash Commands & Command Interfaces

Emoji & NLP Processing

Translation Approaches

Accessibility

PluralKit Integration

Database & Configuration

Feedback Systems

23 KiB

Raw Blame History