Files
Vivi-Speech/.planning/research/FEATURES.md
Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:02:32 -05:00

547 lines
23 KiB
Markdown

# Features Research: Vivi Speech Translator
## Executive Summary
The Vivi Speech Translator should combine table-stakes Discord bot features with a hybrid translation approach that starts rule-based but can learn and improve over time. The core differentiator is intelligent emoji-to-text translation that understands Vivi's unique communication system, enabled by a learning mechanism that teaches the bot new emoji meanings while maintaining deterministic, debuggable behavior for known sequences.
## Table Stakes Features
These are features Discord bot users expect and that are necessary for basic functionality:
### Message Event Handling
- **Description**: Detect and respond to messages containing emojis from Vivi
- **Complexity**: Low
- **Dependencies**: Discord.py or discord.js framework
- **Why Essential**: Without this, the bot cannot observe the messages it needs to translate
- **Implementation**: Monitor `on_message` events and filter for messages containing emoji sequences
### Emoji Parsing
- **Description**: Parse both standard Unicode emojis and Discord custom emojis from message content
- **Complexity**: Low
- **Dependencies**: Emoji library (emoji, discord.py emoji utilities)
- **Why Essential**: Must accurately identify which emojis are present to begin translation
- **Implementation**: Use regex patterns to find `\d+` custom emoji IDs and standard emoji Unicode
### Reply/Response Messages
- **Description**: Send translation responses back to chat or as replies
- **Complexity**: Low
- **Dependencies**: Message API (Discord.py message.reply() or create_message())
- **Why Essential**: Users need to see the translation output
- **Implementation**: Format translations as clear, readable text messages; optionally use embeds for rich formatting
### Command Interface
- **Description**: Allow teaching the bot new emoji meanings via commands
- **Complexity**: Medium
- **Dependencies**: Command parser, permission checking
- **Why Essential**: Enables the learning system that makes the bot scale without hardcoding every emoji
- **Implementation**: Slash commands or hybrid prefix/slash commands (see "Message Handling Modes" below)
### Per-Server Configuration
- **Description**: Store server-specific settings (translation mode, custom emoji meanings)
- **Complexity**: Medium
- **Dependencies**: Database (SQLite for small scale, PostgreSQL for production)
- **Why Essential**: Different servers have different preferences for how verbose translations should be
- **Implementation**: Simple key-value store per guild_id (server ID)
### Rate Limiting & Error Handling
- **Description**: Gracefully handle Discord API limits and network errors
- **Complexity**: Medium
- **Dependencies**: Framework error handling
- **Why Essential**: Prevents bot crashes and makes service reliable
- **Implementation**: Exponential backoff for failed API calls, catch timeouts
---
## Differentiating Features
These features set Vivi's translator apart from generic Discord bots:
### Learning System (Rule-Based Foundation with Growth)
- **Description**: Bot learns emoji meanings when explicitly taught by users
- **Complexity**: Medium
- **Why This Differentiates**: Makes the translator sustainable for Vivi's unique and evolving emoji language without requiring the developer to manually hardcode every sequence
- **Constraints**:
- **No implicit learning**: Never infer emoji meanings from context—require explicit teaching
- **Explicit confirmation**: Always confirm back what was learned so users can verify correctness
- **Easy correction**: Provide `/correct :emoji: new_meaning` for when meanings change
- **Community contribution**: Allow anyone in the server to teach, not just Vivi
- **Database**: Store emoji meanings with metadata (who taught it, when, how many uses)
- **Feedback loop**: Track which translations are most helpful, identify ambiguous emojis
### Emoji Sequence Detection
- **Description**: Recognize sequences of emojis that form compound meanings (e.g., 👩‍💻📱 = "coding on phone")
- **Complexity**: Medium
- **Why This Differentiates**: Vivi likely uses emoji clusters; simple one-to-one mapping isn't enough
- **Implementation Approach**:
- Define sequence patterns (consecutive emojis with or without separators)
- Store meanings for common sequences
- Fall back to individual emoji meanings if no sequence match
- Allow users to teach sequences via `/teach :emoji1: :emoji2: compound_meaning`
### Context-Aware Translation Formatting
- **Description**: Vary translation output based on conversational context
- **Complexity**: High
- **Why This Differentiates**: Translations that feel natural, not robotic; adapts tone to channel context
- **Examples**:
- In #general: "Vivi is having fun 😊"
- In #serious-discussion: "Expressing contentment and readiness"
- Response variation: Sometimes expand, sometimes summarize based on recent context
- **Implementation**: Store channel context settings; analyze surrounding messages for tone
### PluralKit Integration (Optional but Recommended)
- **Description**: Detect which alter is communicating via PluralKit webhook metadata
- **Complexity**: High
- **Why This Differentiates**: Essential for communities where Vivi shares an account with headmates; respects system identity
- **How It Works**:
- PluralKit creates webhooks with system/member metadata in username
- Bot can parse webhook source_guild_id and custom embed footers to identify system member
- Enables "Vivi says: [translation]" vs generic bot response
- **Limitations**: Requires PluralKit to be active in server; only works with PluralKit proxy format
### Translation Quality Tracking
- **Description**: Monitor which translations get positive vs negative reactions
- **Complexity**: Medium
- **Why This Differentiates**: Enables continuous improvement and identifies ambiguous emojis needing clarification
- **Implementation**:
- Store emoji usage statistics (frequency, accuracy ratings)
- Provide admin dashboard `/emoji_stats` to see problematic emojis
- Optionally flag ambiguous emojis for human review
### Global Dictionary Option (Future Differentiator)
- **Description**: Share emoji meanings across servers with opt-in
- **Complexity**: High
- **Why This Differentiates**: Could benefit other systems and communities; positions Vivi's system as a standard
- **Constraints**:
- Privacy-first: Only share if opted-in by server admin
- Vivi-specific: Focus on Vivi's emoji system, not generic emoji translation
- Conflict resolution: Last-teaching-wins or voting system for disagreements
---
## Translation Approaches Analyzed
### Rule-Based Pattern Matching
**How it works**: Explicit rules define emoji → text mappings. New mappings must be explicitly added.
**Pros**:
- Fully deterministic and debuggable
- Fast performance (no ML overhead)
- Transparent to users (users understand why emoji means what)
- Easy to correct mistakes (just update the rule)
**Cons**:
- Requires explicit teaching for every emoji/sequence
- Doesn't generalize to patterns the bot hasn't seen
- Becomes unwieldy with hundreds of emoji combinations
### Statistical/ML-Based Approach
**How it works**: Train a model on known emoji → text pairs, predict meanings for unknown emojis using similarity or context.
**Pros**:
- Can handle novel emoji combinations through pattern inference
- Generalizes from limited training data
- Captures semantic relationships between emoji meanings
**Cons**:
- Black-box behavior ("why did the bot translate it that way?")
- Requires significant training data to work well
- Harder to debug when translations are wrong
- Users don't understand or trust the mechanism
### Recommended: Hybrid Approach (Rule-Based + Fallback)
**Phase 1 (MVP): Rule-Based with Community Learning**
- Start with hardcoded emoji meanings for common sequences
- Allow users to teach new emojis via `/teach` command
- 100% transparent: users know exactly why each translation happens
- Fast, reliable, debuggable
**Phase 2 (Future): Statistical Fallback**
- Analyze emoji usage patterns in learned meanings
- If emoji appears in multiple compounds, infer partial meanings
- Use embedding-based similarity to suggest translations for unknown emoji sequences
- Always show confidence scores; require confirmation before using inferred meanings
**Phase 3 (Long-term): Continuous Learning**
- Track user corrections and positive/negative reactions
- Retrain fallback model on accepted vs rejected translations
- Identify consistently ambiguous emojis for human review
- Adjust translation format based on what's most helpful per server
**Why This Is Best**:
- Starts simple and user-friendly
- Scales to hundreds of emojis through learning
- Maintains trust through transparency
- Enables improvement over time without requiring ML expertise
---
## Message Handling Modes
Discord bots can respond to messages in different ways. Choose the approach that best serves Vivi's community:
### Mode 1: Automatic Translation (On Every Message)
**How it works**: Bot automatically translates every message from Vivi (or messages with emoji content)
**Pros**:
- Instant understanding without extra steps
- No friction for casual readers
- Good for channels where translation is the main purpose
**Cons**:
- Can be noisy in mixed-audience channels
- Spoils the "reading Vivi directly" experience for community members who prefer it
- Uses Discord API quota faster
**Best For**: Translation channels (#vivi-translated or similar)
### Mode 2: On-Demand Translation (Reaction or Command)
**How it works**: Users react with a specific emoji or use `/translate` command to request translation
**Pros**:
- Keeps channels clean by default
- Respects users who want to interpret emojis themselves
- Lower API usage
- More intentional interaction
**Cons**:
- Extra step for users
- May miss important messages if people forget to request
- Less discoverable for new community members
**Best For**: Social channels where emoji is part of fun, not solely for understanding
### Mode 3: Toggle (Per-Server Setting)
**How it works**: Server admins choose between automatic or on-demand via `/settings`
**Pros**:
- Respects different community preferences
- Maximizes adoption across servers with different cultures
- Can differentiate channels (auto in #vivi-translations, on-demand in #general)
**Cons**:
- More complex to implement
- Users must learn about settings
**Recommendation for V1**: Implement Mode 3 with default Mode 1 (automatic). Let server admins customize via:
- `/settings translation-mode [auto|on-demand]`
- `/settings translate-channels [list of channel IDs]` for auto mode
### Implementation: Message Handling
- **Event**: Discord `on_message` event
- **Filter**: Check for emoji content using regex: `<a?:[a-zA-Z0-9_]{1,32}:[0-9]{18,20}>` (custom) and Unicode emoji
- **Action**: Call translation engine, format output, send reply
### Command Interface: Slash Commands vs Prefix
**Recommendation**: Use **slash commands** as primary, offer hybrid support
**Why slash commands**:
- Modern Discord standard (easier discoverability)
- No Message Content intent required (better privacy)
- Built-in autocomplete for parameters
- Better for `/teach :emoji: meaning` (emoji picker integration)
**Key slash commands** for Vivi:
- `/teach :emoji: meaning` — Add emoji to dictionary
- `/translate [emoji-string]` — Manually trigger translation
- `/what :emoji:` — Look up emoji meaning
- `/correct :emoji: new_meaning` — Fix a taught emoji
- `/settings` — Server configuration
- `/emoji-stats` — View accuracy/usage statistics
---
## Learning Interface & Feedback System
The learning system is crucial for scalability and adoption. Here's the recommended approach:
### Teaching Commands
```
/teach :emoji1: meaning
→ "Learned! 🎓 I'll translate :emoji1: as 'meaning'"
/teach :emoji1: :emoji2: compound meaning
→ "Learned! 🎓 I'll translate :emoji1: :emoji2: as 'compound meaning'"
```
### Correction System
```
/correct :emoji: new_meaning
→ "Updated! ✏️ I'll now translate :emoji: as 'new_meaning' (was 'old meaning')"
```
### Query System
```
/what :emoji:
→ "Emoji meaning for :emoji: is 'definition'\nTaught by @User on 2024-01-15\nUsed 47 times this month"
```
### Validation & Confirmation
- Always repeat back what was learned
- Show who taught it and when (build community recognition)
- Optionally show confidence if from ML fallback
- Highlight ambiguous meanings if same emoji has multiple teachings
### Feedback Mechanisms
1. **Reaction-based**: Users react with ✅/❌ to translations
- Track which emojis get positive vs negative reactions
- Identify consistently wrong translations for correction
2. **Correction Commands**: `/correct :emoji: new_meaning` explicitly fixes errors
- Creates audit trail of meaning changes
- Enables tracking learning over time
3. **Conflict Resolution**: If multiple teachings for same emoji
- Show all known meanings with vote counts
- Use most recent teaching by default, surface conflicts
- Option: `/disambiguate :emoji: choose_meaning` to select preferred one
### Best Practices
- **Community over individual**: Encourage anyone to teach, not just Vivi or admins
- **Transparency**: Always show source of taught meanings
- **Auditability**: Maintain history of meaning changes
- **Disambiguation**: Flag emojis with conflicting meanings early
- **Escalation**: Provide `/report-ambiguous :emoji:` for admin review
---
## Accessibility Considerations
Translation bots serve accessibility functions, so they must be accessible themselves:
### Text Accessibility
- **Plain text output**: Always provide plain text translations, not just embeds
- **No emoji-only responses**: Never respond with just emoji; always include text
- **Clear language**: Use simple, direct language in translations (avoid jargon)
- **Consistent formatting**: Same emoji always translates the same way (aids screen reader prediction)
### Discord Accessibility
- **Slash commands**: Easier for keyboard navigation than prefix commands
- **Accessible embeds**: If using embeds for formatted output:
- Include plain text alternative in message content
- Avoid using embeds for critical information
- Note: Discord embeds cannot have alt-text for images—only use text-based embeds
### Screen Reader Compatibility
- **Emoji descriptions**: Include what each emoji is called (e.g., "woman technologist emoji")
- **Sequence clarity**: When translating compound sequences, explain the combination
- **No hidden information**: Never put crucial meaning in embed footers or nested fields
### Examples of Accessible Responses
**Good**:
```
Vivi: 👩‍💻📱
Bot: (woman technologist emoji, mobile phone emoji)
Translates to: "coding on phone" or "responding to work messages"
```
**Bad**:
```
Vivi: 👩‍💻📱
Bot: [embed with only emoji in footer, no text explanation]
```
### Implementation
- Test output with screen readers (NVDA, JAWS)
- Provide alternative text format via `/translate --verbose` for complex sequences
- Include emoji names in debug/development output
---
## Anti-Features (What NOT to Build)
These features sound good but should be avoided:
### Anti-Feature 1: Persistent Context Learning
- **What it is**: Bot infers emoji meanings from conversation context without explicit teaching
- **Why not**:
- Creates non-deterministic behavior (same emoji means different things in different contexts)
- Impossible to debug or correct
- Users don't understand the bot's logic
- High error rate leads to loss of trust
- **Better approach**: Explicit `/teach` commands only
### Anti-Feature 2: Cross-Discord Emoji Translation
- **What it is**: Translate emojis the same way across all Discord servers
- **Why not**:
- Emoji meanings are highly personal and context-dependent
- Vivi's system is specific to her community
- Would bloat the dictionary with conflicting meanings
- Not scalable for other users' emoji systems
- **Better approach**: Per-server dictionaries with optional public sharing for Vivi's specific system
### Anti-Feature 3: Real-Time Chat Simulation
- **What it is**: Bot attempts to continue Vivi's conversation or generate new emoji sequences
- **Why not**:
- Out of scope (translation, not generation)
- Risk of impersonation
- Confusion about what Vivi actually said vs what bot generated
- Community prefers Vivi's authentic communication
- **Better approach**: Stick to translating Vivi's actual messages
### Anti-Feature 4: Full NLP Context Analysis
- **What it is**: Use complex NLP to understand message context and vary translations
- **Why not**:
- Over-engineering for the problem
- Adds maintenance burden
- Makes behavior unpredictable
- Initial rule-based approach is more trustworthy
- **Better approach**: Simple context hints (channel type, time of day) with explicit teaching
---
## Configuration Per-Server
Different servers may have different translation preferences:
### Server Settings to Store
```yaml
guild_id: 12345678
settings:
translation_mode: "auto" # or "on-demand"
auto_channels: [chan_id_1, chan_id_2] # channels where auto-translation is enabled
verbose_translations: false # expand vs summarize
show_confidence: false # show certainty for learned meanings
allow_community_teaching: true # can non-mods teach emoji meanings?
default_language: "en" # future: support other languages
include_emoji_names: true # include emoji name for accessibility
```
### Database Schema
```
Emoji_Meanings:
id: UUID
guild_id: int
emoji_unicode: str (or custom_emoji_id)
meaning: str
taught_by_user_id: int
taught_at: timestamp
usage_count: int
accuracy_rating: float (0-1, from reactions)
is_sequence: bool
confidence: float (1.0 for taught, < 1.0 for inferred)
Guild_Settings:
guild_id: int
translation_mode: str
auto_channels: json
...
Emoji_Statistics:
emoji_id: UUID
guild_id: int
total_uses: int
positive_reactions: int
negative_reactions: int
conflicts_count: int
last_updated: timestamp
```
### Database Choice Recommendation
- **Development/Small Scale** (< 100 servers): SQLite with Keyv abstraction
- **Production** (100+ servers): PostgreSQL with connection pooling
- **Real-Time Stats** (future): Redis for caching popular emoji definitions
### Configuration Commands
```
/settings translation-mode [auto|on-demand]
/settings auto-channels [#channel-list]
/settings verbose [true|false]
/settings allow-teaching [true|false]
```
---
## Implementation Roadmap
### MVP (Phase 1): Foundation
- **Features**: Message detection, basic emoji dictionary, `/teach` command, on-demand translation
- **Complexity**: Medium
- **Timeline**: 2-3 weeks
- **Core**: Rule-based translations only; no ML
### V1 (Phase 2): Polished & Learning-Ready
- **Add**: Per-server settings, `/what` query, `/correct` command, reaction-based feedback
- **Add**: Accessibility improvements (emoji names, plain text)
- **Add**: Basic statistics (`/emoji-stats`)
- **Timeline**: 3-4 weeks
- **Focus**: Community testing and meaning refinement
### V2 (Phase 3): Smart Fallback (Optional)
- **Add**: Statistical fallback for unknown emoji sequences
- **Add**: Confidence scores for inferred meanings
- **Add**: Emoji conflict detection and disambiguation
- **Add**: Optional global dictionary sharing
- **Timeline**: 4-6 weeks
- **Focus**: Scalability and reduced manual maintenance
---
## Summary
The Vivi Speech Translator should be built as a **hybrid system**:
1. **Start with deterministic, rule-based translation** that's fully transparent and debuggable
2. **Enable community learning** via simple `/teach` commands that grow the dictionary organically
3. **Provide feedback mechanisms** (reactions, corrections) to improve accuracy over time
4. **Remain focused** on Vivi's specific emoji system, not generic emoji translation
5. **Prioritize accessibility** since translation itself is an accessibility feature
6. **Leave room for future ML enhancement** but don't build it until needed
The core differentiator is not sophisticated AI, but **intentional design for learning and community participation**. The bot becomes more valuable as more people teach it, creating network effects that benefit the whole community.
### Recommended Feature Set for V1
- ✅ Message detection and emoji parsing
- ✅ Rule-based translation with `/teach` command
- ✅ Per-server configuration (auto vs on-demand mode)
- ✅ Correction and query commands (`/what`, `/correct`)
- ✅ Reaction-based feedback (✅/❌)
- ✅ Accessible output (plain text, emoji names)
- ✅ PluralKit integration (if Vivi's community uses it)
- ⏭️ Statistics dashboard (Phase 2)
- ⏭️ ML fallback (Phase 3+, if needed)
---
## Sources & References
### Discord Bot Development
- [Best Discord Bots in 2026: Complete Guide](https://blog.communityone.io/best-discord-bots/)
- [Storing data with Keyv | discord.js Guide](https://discordjs.guide/keyv/)
- [Awaiting Messages & Reactions · A Guide to Discord Bots](https://maah.gitbooks.io/discord-bots/content/getting-started/awaiting-messages-and-reactions.html)
- [Discord.js - Responding to Messages](https://cratecode.com/info/discordjs-responding-to-messages)
### Slash Commands & Command Interfaces
- [Slash command prefixes · discord/discord-api-docs](https://github.com/discord/discord-api-docs/discussions/3744)
- [Discord Interactions | Pycord Guide](https://guide.pycord.dev/interactions)
### Emoji & NLP Processing
- [NLP Series: Day 5 — Handling Emojis: Strategies and Code Implementation](https://medium.com/@ebimsv/nlp-series-day-5-handling-emojis-strategies-and-code-implementation-0f8e77e3a25c)
- [Assessing Emoji Use in Modern Text Processing Tools](https://arxiv.org/pdf/2101.00430)
- [Emojinize: Enriching Any Text with Emoji Translations](https://arxiv.org/html/2403.03857v2)
### Translation Approaches
- [Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation](https://www.mdpi.com/2079-9282/9/2/201)
- [Rule-Based Machine Translation - Wikipedia](https://en.wikipedia.org/wiki/Rule-based-machine-translation)
- [Rule Based Approach in NLP - GeeksforGeeks](https://www.geeksforgeeks.org/nlp/rule-based-approach-in-nlp/)
### Accessibility
- [Discord: Accessibility in Web Apps Done Right](https://a11yup.com/articles/discord-accessibility-in-web-apps-done-right/)
- [Discord Accessibility for blind users](https://support.discord.com/hc/en-us/community/posts/360032435152-Discord-Accessibility-for-blind-users)
- [Using a Screen Reader on Discord](https://support.discord.com/hc/en-us/articles/7180791233559-Using-a-Screen-Reader-on-Discord)
- [GitHub - 9vult/Raiha: Raiha Discord Accessibility Bot](https://github.com/9vult/Raiha)
### PluralKit Integration
- [PluralKit - System Management Bot](https://pluralkit.me/)
- [Navigating PluralKit: A Guide to Discord's Unique Bot for System Management](https://www.oreateai.com/blog/navigating-pluralkit-a-guide-to-discord-unique-bot-for-system-management-31ce1863fda39661189c6b8c031c864b)
- [GitHub - PluralKit/PluralKit](https://github.com/PluralKit/PluralKit)
### Database & Configuration
- [How to Create a Database for Your Discord Bot](https://cybrancee.com/learn/knowledge-base/how-to-create-a-database-for-your-discord-bot/)
- [How I Host a Bot in 45,000 Discord Servers For Free](https://dev.to/mistval/how-i-host-a-bot-in-45000-discord-servers-for-free-5bk9)
### Feedback Systems
- [Automating User Feedback Monitoring on Discord Using AI](https://cohere.com/blog/automating-user-feedback-monitoring-on-discord-using-ai)
- [From Discord Chaos to Organized Feedback](https://betahub.io/blog/guides/2025/07/16/discord-community-feedback.html)