Vivi-Speech/.planning/research/FEATURES.md

# Features Research: Vivi Speech Translator

## Executive Summary

The Vivi Speech Translator should combine table-stakes Discord bot features with a hybrid translation approach that starts rule-based but can learn and improve over time. The core differentiator is intelligent emoji-to-text translation that understands Vivi's unique communication system, enabled by a learning mechanism that teaches the bot new emoji meanings while maintaining deterministic, debuggable behavior for known sequences.

## Table Stakes Features

These are features Discord bot users expect and that are necessary for basic functionality:

### Message Event Handling
- **Description**: Detect and respond to messages containing emojis from Vivi
- **Complexity**: Low
- **Dependencies**: Discord.py or discord.js framework
- **Why Essential**: Without this, the bot cannot observe the messages it needs to translate
- **Implementation**: Monitor `on_message` events and filter for messages containing emoji sequences

### Emoji Parsing
- **Description**: Parse both standard Unicode emojis and Discord custom emojis from message content
- **Complexity**: Low
- **Dependencies**: Emoji library (emoji, discord.py emoji utilities)
- **Why Essential**: Must accurately identify which emojis are present to begin translation
- **Implementation**: Use regex patterns to find `\d+` custom emoji IDs and standard emoji Unicode

### Reply/Response Messages
- **Description**: Send translation responses back to chat or as replies
- **Complexity**: Low
- **Dependencies**: Message API (Discord.py message.reply() or create_message())
- **Why Essential**: Users need to see the translation output
- **Implementation**: Format translations as clear, readable text messages; optionally use embeds for rich formatting

### Command Interface
- **Description**: Allow teaching the bot new emoji meanings via commands
- **Complexity**: Medium
- **Dependencies**: Command parser, permission checking
- **Why Essential**: Enables the learning system that makes the bot scale without hardcoding every emoji
- **Implementation**: Slash commands or hybrid prefix/slash commands (see "Message Handling Modes" below)

### Per-Server Configuration
- **Description**: Store server-specific settings (translation mode, custom emoji meanings)
- **Complexity**: Medium
- **Dependencies**: Database (SQLite for small scale, PostgreSQL for production)
- **Why Essential**: Different servers have different preferences for how verbose translations should be
- **Implementation**: Simple key-value store per guild_id (server ID)

### Rate Limiting & Error Handling
- **Description**: Gracefully handle Discord API limits and network errors
- **Complexity**: Medium
- **Dependencies**: Framework error handling
- **Why Essential**: Prevents bot crashes and makes service reliable
- **Implementation**: Exponential backoff for failed API calls, catch timeouts

---

## Differentiating Features

These features set Vivi's translator apart from generic Discord bots:

### Learning System (Rule-Based Foundation with Growth)
- **Description**: Bot learns emoji meanings when explicitly taught by users
- **Complexity**: Medium
- **Why This Differentiates**: Makes the translator sustainable for Vivi's unique and evolving emoji language without requiring the developer to manually hardcode every sequence
- **Constraints**:
  - **No implicit learning**: Never infer emoji meanings from context—require explicit teaching
  - **Explicit confirmation**: Always confirm back what was learned so users can verify correctness
  - **Easy correction**: Provide `/correct :emoji: new_meaning` for when meanings change
  - **Community contribution**: Allow anyone in the server to teach, not just Vivi
- **Database**: Store emoji meanings with metadata (who taught it, when, how many uses)
- **Feedback loop**: Track which translations are most helpful, identify ambiguous emojis

### Emoji Sequence Detection
- **Description**: Recognize sequences of emojis that form compound meanings (e.g., 👩‍💻📱 = "coding on phone")
- **Complexity**: Medium
- **Why This Differentiates**: Vivi likely uses emoji clusters; simple one-to-one mapping isn't enough
- **Implementation Approach**:
  - Define sequence patterns (consecutive emojis with or without separators)
  - Store meanings for common sequences
  - Fall back to individual emoji meanings if no sequence match
  - Allow users to teach sequences via `/teach :emoji1: :emoji2: compound_meaning`

### Context-Aware Translation Formatting
- **Description**: Vary translation output based on conversational context
- **Complexity**: High
- **Why This Differentiates**: Translations that feel natural, not robotic; adapts tone to channel context
- **Examples**:
  - In #general: "Vivi is having fun 😊"
  - In #serious-discussion: "Expressing contentment and readiness"
  - Response variation: Sometimes expand, sometimes summarize based on recent context
- **Implementation**: Store channel context settings; analyze surrounding messages for tone

### PluralKit Integration (Optional but Recommended)
- **Description**: Detect which alter is communicating via PluralKit webhook metadata
- **Complexity**: High
- **Why This Differentiates**: Essential for communities where Vivi shares an account with headmates; respects system identity
- **How It Works**:
  - PluralKit creates webhooks with system/member metadata in username
  - Bot can parse webhook source_guild_id and custom embed footers to identify system member
  - Enables "Vivi says: [translation]" vs generic bot response
- **Limitations**: Requires PluralKit to be active in server; only works with PluralKit proxy format

### Translation Quality Tracking
- **Description**: Monitor which translations get positive vs negative reactions
- **Complexity**: Medium
- **Why This Differentiates**: Enables continuous improvement and identifies ambiguous emojis needing clarification
- **Implementation**:
  - Store emoji usage statistics (frequency, accuracy ratings)
  - Provide admin dashboard `/emoji_stats` to see problematic emojis
  - Optionally flag ambiguous emojis for human review

### Global Dictionary Option (Future Differentiator)
- **Description**: Share emoji meanings across servers with opt-in
- **Complexity**: High
- **Why This Differentiates**: Could benefit other systems and communities; positions Vivi's system as a standard
- **Constraints**:
  - Privacy-first: Only share if opted-in by server admin
  - Vivi-specific: Focus on Vivi's emoji system, not generic emoji translation
  - Conflict resolution: Last-teaching-wins or voting system for disagreements

---

## Translation Approaches Analyzed

### Rule-Based Pattern Matching
**How it works**: Explicit rules define emoji → text mappings. New mappings must be explicitly added.

**Pros**:
- Fully deterministic and debuggable
- Fast performance (no ML overhead)
- Transparent to users (users understand why emoji means what)
- Easy to correct mistakes (just update the rule)

**Cons**:
- Requires explicit teaching for every emoji/sequence
- Doesn't generalize to patterns the bot hasn't seen
- Becomes unwieldy with hundreds of emoji combinations

### Statistical/ML-Based Approach
**How it works**: Train a model on known emoji → text pairs, predict meanings for unknown emojis using similarity or context.

**Pros**:
- Can handle novel emoji combinations through pattern inference
- Generalizes from limited training data
- Captures semantic relationships between emoji meanings

**Cons**:
- Black-box behavior ("why did the bot translate it that way?")
- Requires significant training data to work well
- Harder to debug when translations are wrong
- Users don't understand or trust the mechanism

### Recommended: Hybrid Approach (Rule-Based + Fallback)

**Phase 1 (MVP): Rule-Based with Community Learning**
- Start with hardcoded emoji meanings for common sequences
- Allow users to teach new emojis via `/teach` command
- 100% transparent: users know exactly why each translation happens
- Fast, reliable, debuggable

**Phase 2 (Future): Statistical Fallback**
- Analyze emoji usage patterns in learned meanings
- If emoji appears in multiple compounds, infer partial meanings
- Use embedding-based similarity to suggest translations for unknown emoji sequences
- Always show confidence scores; require confirmation before using inferred meanings

**Phase 3 (Long-term): Continuous Learning**
- Track user corrections and positive/negative reactions
- Retrain fallback model on accepted vs rejected translations
- Identify consistently ambiguous emojis for human review
- Adjust translation format based on what's most helpful per server

**Why This Is Best**:
- Starts simple and user-friendly
- Scales to hundreds of emojis through learning
- Maintains trust through transparency
- Enables improvement over time without requiring ML expertise

---

## Message Handling Modes

Discord bots can respond to messages in different ways. Choose the approach that best serves Vivi's community:

### Mode 1: Automatic Translation (On Every Message)
**How it works**: Bot automatically translates every message from Vivi (or messages with emoji content)

**Pros**:
- Instant understanding without extra steps
- No friction for casual readers
- Good for channels where translation is the main purpose

**Cons**:
- Can be noisy in mixed-audience channels
- Spoils the "reading Vivi directly" experience for community members who prefer it
- Uses Discord API quota faster

**Best For**: Translation channels (#vivi-translated or similar)

### Mode 2: On-Demand Translation (Reaction or Command)
**How it works**: Users react with a specific emoji or use `/translate` command to request translation

**Pros**:
- Keeps channels clean by default
- Respects users who want to interpret emojis themselves
- Lower API usage
- More intentional interaction

**Cons**:
- Extra step for users
- May miss important messages if people forget to request
- Less discoverable for new community members

**Best For**: Social channels where emoji is part of fun, not solely for understanding

### Mode 3: Toggle (Per-Server Setting)
**How it works**: Server admins choose between automatic or on-demand via `/settings`

**Pros**:
- Respects different community preferences
- Maximizes adoption across servers with different cultures
- Can differentiate channels (auto in #vivi-translations, on-demand in #general)

**Cons**:
- More complex to implement
- Users must learn about settings

**Recommendation for V1**: Implement Mode 3 with default Mode 1 (automatic). Let server admins customize via:
- `/settings translation-mode [auto|on-demand]`
- `/settings translate-channels [list of channel IDs]` for auto mode

### Implementation: Message Handling
- **Event**: Discord `on_message` event
- **Filter**: Check for emoji content using regex: `<a?:[a-zA-Z0-9_]{1,32}:[0-9]{18,20}>` (custom) and Unicode emoji
- **Action**: Call translation engine, format output, send reply

### Command Interface: Slash Commands vs Prefix
**Recommendation**: Use **slash commands** as primary, offer hybrid support

**Why slash commands**:
- Modern Discord standard (easier discoverability)
- No Message Content intent required (better privacy)
- Built-in autocomplete for parameters
- Better for `/teach :emoji: meaning` (emoji picker integration)

**Key slash commands** for Vivi:
- `/teach :emoji: meaning` — Add emoji to dictionary
- `/translate [emoji-string]` — Manually trigger translation
- `/what :emoji:` — Look up emoji meaning
- `/correct :emoji: new_meaning` — Fix a taught emoji
- `/settings` — Server configuration
- `/emoji-stats` — View accuracy/usage statistics

---

## Learning Interface & Feedback System

The learning system is crucial for scalability and adoption. Here's the recommended approach:

### Teaching Commands
```
/teach :emoji1: meaning
→ "Learned! 🎓 I'll translate :emoji1: as 'meaning'"

/teach :emoji1: :emoji2: compound meaning
→ "Learned! 🎓 I'll translate :emoji1: :emoji2: as 'compound meaning'"
```

### Correction System
```
/correct :emoji: new_meaning
→ "Updated! ✏️ I'll now translate :emoji: as 'new_meaning' (was 'old meaning')"
```

### Query System
```
/what :emoji:
→ "Emoji meaning for :emoji: is 'definition'\nTaught by @User on 2024-01-15\nUsed 47 times this month"
```

### Validation & Confirmation
- Always repeat back what was learned
- Show who taught it and when (build community recognition)
- Optionally show confidence if from ML fallback
- Highlight ambiguous meanings if same emoji has multiple teachings

### Feedback Mechanisms
1. **Reaction-based**: Users react with ✅/❌ to translations
   - Track which emojis get positive vs negative reactions
   - Identify consistently wrong translations for correction

2. **Correction Commands**: `/correct :emoji: new_meaning` explicitly fixes errors
   - Creates audit trail of meaning changes
   - Enables tracking learning over time

3. **Conflict Resolution**: If multiple teachings for same emoji
   - Show all known meanings with vote counts
   - Use most recent teaching by default, surface conflicts
   - Option: `/disambiguate :emoji: choose_meaning` to select preferred one

### Best Practices
- **Community over individual**: Encourage anyone to teach, not just Vivi or admins
- **Transparency**: Always show source of taught meanings
- **Auditability**: Maintain history of meaning changes
- **Disambiguation**: Flag emojis with conflicting meanings early
- **Escalation**: Provide `/report-ambiguous :emoji:` for admin review

---

## Accessibility Considerations

Translation bots serve accessibility functions, so they must be accessible themselves:

### Text Accessibility
- **Plain text output**: Always provide plain text translations, not just embeds
- **No emoji-only responses**: Never respond with just emoji; always include text
- **Clear language**: Use simple, direct language in translations (avoid jargon)
- **Consistent formatting**: Same emoji always translates the same way (aids screen reader prediction)

### Discord Accessibility
- **Slash commands**: Easier for keyboard navigation than prefix commands
- **Accessible embeds**: If using embeds for formatted output:
  - Include plain text alternative in message content
  - Avoid using embeds for critical information
  - Note: Discord embeds cannot have alt-text for images—only use text-based embeds

### Screen Reader Compatibility
- **Emoji descriptions**: Include what each emoji is called (e.g., "woman technologist emoji")
- **Sequence clarity**: When translating compound sequences, explain the combination
- **No hidden information**: Never put crucial meaning in embed footers or nested fields

### Examples of Accessible Responses

**Good**:
```
Vivi: 👩‍💻📱
Bot: (woman technologist emoji, mobile phone emoji)
Translates to: "coding on phone" or "responding to work messages"
```

**Bad**:
```
Vivi: 👩‍💻📱
Bot: [embed with only emoji in footer, no text explanation]
```

### Implementation
- Test output with screen readers (NVDA, JAWS)
- Provide alternative text format via `/translate --verbose` for complex sequences
- Include emoji names in debug/development output

---

## Anti-Features (What NOT to Build)

These features sound good but should be avoided:

### Anti-Feature 1: Persistent Context Learning
- **What it is**: Bot infers emoji meanings from conversation context without explicit teaching
- **Why not**:
  - Creates non-deterministic behavior (same emoji means different things in different contexts)
  - Impossible to debug or correct
  - Users don't understand the bot's logic
  - High error rate leads to loss of trust
- **Better approach**: Explicit `/teach` commands only

### Anti-Feature 2: Cross-Discord Emoji Translation
- **What it is**: Translate emojis the same way across all Discord servers
- **Why not**:
  - Emoji meanings are highly personal and context-dependent
  - Vivi's system is specific to her community
  - Would bloat the dictionary with conflicting meanings
  - Not scalable for other users' emoji systems
- **Better approach**: Per-server dictionaries with optional public sharing for Vivi's specific system

### Anti-Feature 3: Real-Time Chat Simulation
- **What it is**: Bot attempts to continue Vivi's conversation or generate new emoji sequences
- **Why not**:
  - Out of scope (translation, not generation)
  - Risk of impersonation
  - Confusion about what Vivi actually said vs what bot generated
  - Community prefers Vivi's authentic communication
- **Better approach**: Stick to translating Vivi's actual messages

### Anti-Feature 4: Full NLP Context Analysis
- **What it is**: Use complex NLP to understand message context and vary translations
- **Why not**:
  - Over-engineering for the problem
  - Adds maintenance burden
  - Makes behavior unpredictable
  - Initial rule-based approach is more trustworthy
- **Better approach**: Simple context hints (channel type, time of day) with explicit teaching

---

## Configuration Per-Server

Different servers may have different translation preferences:

### Server Settings to Store
```yaml
guild_id: 12345678
settings:
  translation_mode: "auto"  # or "on-demand"
  auto_channels: [chan_id_1, chan_id_2]  # channels where auto-translation is enabled
  verbose_translations: false  # expand vs summarize
  show_confidence: false  # show certainty for learned meanings
  allow_community_teaching: true  # can non-mods teach emoji meanings?
  default_language: "en"  # future: support other languages
  include_emoji_names: true  # include emoji name for accessibility
```

### Database Schema
```
Emoji_Meanings:
  id: UUID
  guild_id: int
  emoji_unicode: str (or custom_emoji_id)
  meaning: str
  taught_by_user_id: int
  taught_at: timestamp
  usage_count: int
  accuracy_rating: float (0-1, from reactions)
  is_sequence: bool
  confidence: float (1.0 for taught, < 1.0 for inferred)

Guild_Settings:
  guild_id: int
  translation_mode: str
  auto_channels: json
  ...

Emoji_Statistics:
  emoji_id: UUID
  guild_id: int
  total_uses: int
  positive_reactions: int
  negative_reactions: int
  conflicts_count: int
  last_updated: timestamp
```

### Database Choice Recommendation
- **Development/Small Scale** (< 100 servers): SQLite with Keyv abstraction
- **Production** (100+ servers): PostgreSQL with connection pooling
- **Real-Time Stats** (future): Redis for caching popular emoji definitions

### Configuration Commands
```
/settings translation-mode [auto|on-demand]
/settings auto-channels [#channel-list]
/settings verbose [true|false]
/settings allow-teaching [true|false]
```

---

## Implementation Roadmap

### MVP (Phase 1): Foundation
- **Features**: Message detection, basic emoji dictionary, `/teach` command, on-demand translation
- **Complexity**: Medium
- **Timeline**: 2-3 weeks
- **Core**: Rule-based translations only; no ML

### V1 (Phase 2): Polished & Learning-Ready
- **Add**: Per-server settings, `/what` query, `/correct` command, reaction-based feedback
- **Add**: Accessibility improvements (emoji names, plain text)
- **Add**: Basic statistics (`/emoji-stats`)
- **Timeline**: 3-4 weeks
- **Focus**: Community testing and meaning refinement

### V2 (Phase 3): Smart Fallback (Optional)
- **Add**: Statistical fallback for unknown emoji sequences
- **Add**: Confidence scores for inferred meanings
- **Add**: Emoji conflict detection and disambiguation
- **Add**: Optional global dictionary sharing
- **Timeline**: 4-6 weeks
- **Focus**: Scalability and reduced manual maintenance

---

## Summary

The Vivi Speech Translator should be built as a **hybrid system**:

1. **Start with deterministic, rule-based translation** that's fully transparent and debuggable
2. **Enable community learning** via simple `/teach` commands that grow the dictionary organically
3. **Provide feedback mechanisms** (reactions, corrections) to improve accuracy over time
4. **Remain focused** on Vivi's specific emoji system, not generic emoji translation
5. **Prioritize accessibility** since translation itself is an accessibility feature
6. **Leave room for future ML enhancement** but don't build it until needed

The core differentiator is not sophisticated AI, but **intentional design for learning and community participation**. The bot becomes more valuable as more people teach it, creating network effects that benefit the whole community.

### Recommended Feature Set for V1
- ✅ Message detection and emoji parsing
- ✅ Rule-based translation with `/teach` command
- ✅ Per-server configuration (auto vs on-demand mode)
- ✅ Correction and query commands (`/what`, `/correct`)
- ✅ Reaction-based feedback (✅/❌)
- ✅ Accessible output (plain text, emoji names)
- ✅ PluralKit integration (if Vivi's community uses it)
- ⏭️ Statistics dashboard (Phase 2)
- ⏭️ ML fallback (Phase 3+, if needed)

---

## Sources & References

### Discord Bot Development
- [Best Discord Bots in 2026: Complete Guide](https://blog.communityone.io/best-discord-bots/)
- [Storing data with Keyv | discord.js Guide](https://discordjs.guide/keyv/)
- [Awaiting Messages & Reactions · A Guide to Discord Bots](https://maah.gitbooks.io/discord-bots/content/getting-started/awaiting-messages-and-reactions.html)
- [Discord.js - Responding to Messages](https://cratecode.com/info/discordjs-responding-to-messages)

### Slash Commands & Command Interfaces
- [Slash command prefixes · discord/discord-api-docs](https://github.com/discord/discord-api-docs/discussions/3744)
- [Discord Interactions | Pycord Guide](https://guide.pycord.dev/interactions)

### Emoji & NLP Processing
- [NLP Series: Day 5 — Handling Emojis: Strategies and Code Implementation](https://medium.com/@ebimsv/nlp-series-day-5-handling-emojis-strategies-and-code-implementation-0f8e77e3a25c)
- [Assessing Emoji Use in Modern Text Processing Tools](https://arxiv.org/pdf/2101.00430)
- [Emojinize: Enriching Any Text with Emoji Translations](https://arxiv.org/html/2403.03857v2)

### Translation Approaches
- [Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation](https://www.mdpi.com/2079-9282/9/2/201)
- [Rule-Based Machine Translation - Wikipedia](https://en.wikipedia.org/wiki/Rule-based-machine-translation)
- [Rule Based Approach in NLP - GeeksforGeeks](https://www.geeksforgeeks.org/nlp/rule-based-approach-in-nlp/)

### Accessibility
- [Discord: Accessibility in Web Apps Done Right](https://a11yup.com/articles/discord-accessibility-in-web-apps-done-right/)
- [Discord Accessibility for blind users](https://support.discord.com/hc/en-us/community/posts/360032435152-Discord-Accessibility-for-blind-users)
- [Using a Screen Reader on Discord](https://support.discord.com/hc/en-us/articles/7180791233559-Using-a-Screen-Reader-on-Discord)
- [GitHub - 9vult/Raiha: Raiha Discord Accessibility Bot](https://github.com/9vult/Raiha)

### PluralKit Integration
- [PluralKit - System Management Bot](https://pluralkit.me/)
- [Navigating PluralKit: A Guide to Discord's Unique Bot for System Management](https://www.oreateai.com/blog/navigating-pluralkit-a-guide-to-discord-unique-bot-for-system-management-31ce1863fda39661189c6b8c031c864b)
- [GitHub - PluralKit/PluralKit](https://github.com/PluralKit/PluralKit)

### Database & Configuration
- [How to Create a Database for Your Discord Bot](https://cybrancee.com/learn/knowledge-base/how-to-create-a-database-for-your-discord-bot/)
- [How I Host a Bot in 45,000 Discord Servers For Free](https://dev.to/mistval/how-i-host-a-bot-in-45000-discord-servers-for-free-5bk9)

### Feedback Systems
- [Automating User Feedback Monitoring on Discord Using AI](https://cohere.com/blog/automating-user-feedback-monitoring-on-discord-using-ai)
- [From Discord Chaos to Organized Feedback](https://betahub.io/blog/guides/2025/07/16/discord-community-feedback.html)