docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Dani B
2026-01-29 11:02:32 -05:00
parent 2f5131434e
commit 901574f8c8
8 changed files with 3559 additions and 0 deletions

View File

@@ -0,0 +1,774 @@
# Architecture Research: Vivi Speech Translator
## Overview
Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.
This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.
---
## Core Components
### 1. Discord Client
**Responsibility:** Establish and maintain the connection to Discord's API and WebSocket.
**Key Details:**
- Uses `discord.Client` or `discord.ext.commands.Bot` from discord.py library
- Requires `Intents` configuration to specify which events the bot listens for:
- `message_content` intent: Required to read message text (privileged intent, requires approval)
- `guilds` intent: Track guild membership and changes
- `direct_messages` intent: Listen for DMs
- `dm_messages` intent: Read DM message content
- Initializes on startup and runs the main event loop via `client.run(token)`
- Handles connection failures and automatic reconnection
**Why This Matters:** Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.
---
### 2. Message Event Handler
**Responsibility:** Receive all messages, filter for relevance, and route to downstream processors.
**Key Details:**
- Implements `on_message` event in discord.py (async callback)
- Filters for:
1. **Webhook Detection:** Check if `message.webhook_id` is not None (indicates a proxied message)
2. **PluralKit Verification:** Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
3. **Vivi Detection:** Check if the `member_id` in the PluralKit response matches Vivi's registered member ID
4. **Bot Self-Filter:** Ignore messages from Vivi Speech Translator bot itself
- Routes confirmed Vivi messages to the Emoji Parser
- Handles both guild channels and DMs
**PluralKit Detection Approach:**
When a message is received, the bot can query the PluralKit API using the message ID:
```
GET https://api.pluralkit.me/v2/messages/{message_id}
```
This returns a Message object containing:
- `member`: The member object that proxied the message (contains member_id, name, avatar, etc.)
- `sender`: The original user ID that sent the command (the account owner)
- `system`: The system that manages the members
- `timestamp`: When the message was sent
- `guild`: The guild ID where the message was sent
- `channel`: The channel ID where the message was sent
By checking if `response.member.id == vivi_member_id`, the bot can verify Vivi specifically sent the message.
**Rate Limiting:** PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.
---
### 3. Emoji Parser
**Responsibility:** Extract and categorize emojis from a message into a structured sequence.
**Key Details:**
- Receives the confirmed Vivi message text from the Message Event Handler
- Uses regex patterns to extract:
1. **Unicode Emojis:** Standard emoji characters (😷, ❌, etc.)
- Pattern: `\p{Extended_Pictographic}` (matches full Unicode emoji range)
- Alternative Python regex: `([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)`
2. **Custom Server Emojis:** Discord custom emoji format `<:emoji_name:emoji_id>` or `<a:emoji_name:emoji_id>` (for animated)
- Pattern: `<a?:[^:\s]+:\d+>`
- Preserves order of emojis as they appear left-to-right
- Returns a structured list like: `[{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}]`
- Handles edge cases:
- Emoji skin tone modifiers
- Zero-width joiners (ZWJ sequences like family emojis)
- Emoji variations
**Why This Order Matters:** The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.
---
### 4. Translation Engine
**Responsibility:** Convert emoji sequences into natural language using the emoji dictionary.
**Key Details:**
- Receives structured emoji list from Emoji Parser
- For each emoji:
1. Look up its meaning in the Emoji Dictionary (database)
2. Handle three cases:
- **Known emoji:** Include its meaning in output
- **Unknown emoji:** Display the emoji itself with a placeholder or skip
- **Custom emoji:** Look up by custom emoji ID in database
- Generates natural language output:
- If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
- If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
- If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the `/teach` command."
- Considers emoji context (e.g., combination of emojis might have a specific meaning)
**Output Format:** The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).
---
### 5. Database Layer
**Responsibility:** Store and retrieve persistent data (emoji dictionary and server configurations).
**Key Details:**
- **Tech Stack:** SQLAlchemy ORM with PostgreSQL for production reliability
- **Async Support:** Use `sqlalchemy.ext.asyncio` or `asyncpg` to avoid blocking the Discord event loop
- **Initialization:** Override `Bot.start()` or use a `setup_hook` to connect to database on startup
- **Connection Pooling:** Configure connection pool to handle concurrent requests from message handlers
**Two Core Tables:**
1. **emoji_dictionary**
- `emoji_string` (TEXT, PRIMARY KEY): The emoji character(s) or custom emoji format
- `custom_emoji_id` (BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)
- `meaning` (TEXT): The learned meaning
- `created_at` (TIMESTAMP): When first learned
- `updated_at` (TIMESTAMP): Last update time
- `updated_by_user_id` (BIGINT, NULLABLE): User ID of who taught/corrected this
- `updated_by_member_id` (TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)
- `created_in_guild` (BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)
- Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
2. **server_configuration**
- `guild_id` (BIGINT, PRIMARY KEY): Discord server ID
- `auto_translate` (BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require `/translate` command
- `created_at` (TIMESTAMP): When server config created
- Updated by Configuration Command Handler
**Important Design Decisions:**
- **Global dictionary:** Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
- **Per-server config:** Each server has its own settings (auto vs. on-demand mode).
- **User attribution:** Track who taught each emoji for transparency and conflict resolution.
- **No per-server emoji variants:** The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.
---
### 6. Command Handler (Teaching & Configuration)
**Responsibility:** Process bot commands for teaching emojis and configuring server behavior.
**Key Details:**
- **Tech:** Discord.py `commands.Cog` extension for modular command organization
- **Commands to Implement:**
1. `/teach <emoji_sequence> <meaning>`
- Extract emojis from the sequence using the Emoji Parser
- Insert or update each emoji in the database
- Confirm: "Learned: 😷 = sick, 2⃣ = two, etc."
- Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
2. `/forget <emoji>`
- Delete emoji from dictionary
- Confirm deletion
3. `/meaning <emoji>`
- Look up and reply with the meaning of a specific emoji
- If unknown, reply: "I don't know that one yet."
4. `/config auto-translate <on|off>`
- Update `server_configuration.auto_translate` in database
- Only server admins can change this
- Requires guild context (won't work in DMs)
5. `/translate <emoji_sequence>` (On-demand mode)
- Manually trigger translation of an emoji sequence
- Works in both channels and DMs
- **Error Handling:**
- Graceful failures if database is unavailable
- Clear user feedback for invalid emoji sequences
- Require proper permissions for sensitive commands
---
### 7. Configuration Layer
**Responsibility:** Load bot configuration (token, database connection string, etc.) at startup.
**Key Details:**
- Use environment variables for secrets: `DISCORD_TOKEN`, `DATABASE_URL`, `PLURALKIT_TOKEN` (optional, for user-specific API calls)
- Configuration file (e.g., `config.json` or `.env`) for non-secret settings:
- Vivi's member ID (to filter for her messages specifically)
- Default auto-translate mode
- Logging level
- Initialize Config before starting the bot
---
## Data Flow
### Message Reception to Response
```
┌──────────────────────────────────────────────────────────────┐
│ Discord Message Event │
│ (user posts message proxied by PluralKit webhook) │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Message Event Handler │
│ - Check: webhook_id != None? │
│ - Query: PluralKit API for message info │
│ - Verify: member_id == Vivi's ID? │
│ - Filter: Ignore self-messages │
└──────────────────────────────────────────────────────────────┘
(YES, Vivi's message)
┌──────────────────────────────────────────────────────────────┐
│ Emoji Parser │
│ - Extract emojis with regex │
│ - Categorize: Unicode vs. custom │
│ - Preserve order │
│ - Output: [{type, value, id}, ...] │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Translation Engine │
│ - For each emoji: lookup in database │
│ - Compose natural language │
│ - Handle unknown emojis │
│ - Format response │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Database (emoji_dictionary) │
│ - O(1) lookup by emoji_string (hash indexed) │
│ - Return: meaning, metadata │
└──────────────────────────────────────────────────────────────┘
(Lookup Results)
┌──────────────────────────────────────────────────────────────┐
│ Response Formatting │
│ - Compose message │
│ - Check context: channel vs. DM │
│ - Apply server config: auto-translate mode │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Discord API Response │
│ - Send reply to channel or DM │
│ - Handle rate limits │
│ - Log interaction │
└──────────────────────────────────────────────────────────────┘
```
### Teaching Flow (Command-Driven)
```
┌──────────────────────────────────────────────────────────────┐
│ User runs: /teach 😷 2⃣ "Vivi is sick, not sinuses" │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Command Handler │
│ - Parse command arguments │
│ - Authenticate: Is user authorized to teach? │
│ - Extract emojis from sequence │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Database Layer │
│ - INSERT or UPDATE emoji_dictionary │
│ - Set: emoji_string, meaning, updated_by, timestamp │
│ - Commit transaction │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Confirmation Reply │
│ - "Learned: 😷 = sick, 2⃣ = two" │
│ - Post in same context (channel or DM) │
└──────────────────────────────────────────────────────────────┘
```
---
## PluralKit Integration Details
### Detection Approach
1. **Webhook Detection (First Filter):**
- Check `message.webhook_id` property in discord.py
- If not None, message was sent via webhook (PluralKit proxy)
2. **PluralKit API Query (Confirmation):**
- Query endpoint: `GET https://api.pluralkit.me/v2/messages/{message_id}`
- The `message_id` can be the webhook message ID or the original message ID (original works for 30 minutes)
- Parse response to get `member` object
3. **Member Verification:**
- Extract `member.id` from API response
- Compare with Vivi's known member ID (from config)
- If match: Process as Vivi's message
- If not match: Ignore (message from another member)
4. **Alternative: Member Names (Backup):**
- If using member ID fails, fall back to checking `member.name`
- Look for "Vivi" or configured member name
### API Endpoints Used
| Endpoint | Purpose | Rate Limit | Response |
|----------|---------|-----------|----------|
| `GET /v2/messages/{message}` | Get proxied message info | 10/sec | Message object with member, sender, guild, channel, timestamp |
| `GET /v2/systems/@me` | Get authenticated system info | 10/sec | Full system + members (requires token) |
| `GET /v2/members/{member}` | Get specific member info | 10/sec | Member object with proxy tags, avatar, etc. |
**Authentication (Optional):**
- Public queries (member lookup) don't require authentication
- System-specific queries (private member settings) require system token via `Authorization: Bearer {token}` header
- For Vivi's system, store the system token in environment variable `PLURALKIT_TOKEN` for authenticated access
### Implementation in discord.py
```python
import aiohttp
async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
"""Check if message was proxied by Vivi via PluralKit."""
# Step 1: Check if message is from a webhook
if message.webhook_id is None:
return False # Not proxied
# Step 2: Query PluralKit API
async with aiohttp.ClientSession() as session:
try:
async with session.get(
f"https://api.pluralkit.me/v2/messages/{message.id}"
) as resp:
if resp.status != 200:
return False # Not a PluralKit message
data = await resp.json()
# Step 3: Check member ID matches Vivi
if data.get("member", {}).get("id") == vivi_member_id:
return True
else:
return False
except Exception as e:
# Log error, but don't crash
print(f"PluralKit API error: {e}")
return False
```
---
## Database Schema
### emoji_dictionary Table
```sql
CREATE TABLE emoji_dictionary (
emoji_string TEXT PRIMARY KEY,
custom_emoji_id BIGINT NULLABLE,
meaning TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_by_user_id BIGINT NULLABLE,
updated_by_member_id TEXT NULLABLE,
created_in_guild BIGINT NULLABLE
);
CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);
```
### server_configuration Table
```sql
CREATE TABLE server_configuration (
guild_id BIGINT PRIMARY KEY,
auto_translate BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_guild_id ON server_configuration(guild_id);
```
### Alternative: SQLAlchemy ORM Definitions
```python
from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime
Base = declarative_base()
class EmojiDictionary(Base):
__tablename__ = "emoji_dictionary"
emoji_string = Column(String, primary_key=True)
custom_emoji_id = Column(BigInteger, nullable=True)
meaning = Column(Text, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
updated_by_user_id = Column(BigInteger, nullable=True)
updated_by_member_id = Column(String, nullable=True)
created_in_guild = Column(BigInteger, nullable=True)
class ServerConfiguration(Base):
__tablename__ = "server_configuration"
guild_id = Column(BigInteger, primary_key=True)
auto_translate = Column(Boolean, default=True)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
```
---
## Suggested Build Order
### Phase 1: Foundation (Week 1-2)
**Goal:** Get Vivi messages detected and logged.
1. **Set up Discord bot:**
- Create Discord application and token
- Initialize discord.py Client/Bot with required Intents
- Implement basic `on_message` event handler
- Test basic logging
2. **Implement PluralKit detection:**
- Add webhook detection (check `message.webhook_id`)
- Add PluralKit API query and member verification
- Log when Vivi messages are detected
- Handle API errors gracefully
3. **Database initialization:**
- Set up PostgreSQL database
- Create emoji_dictionary and server_configuration tables
- Test connection from bot
**Deliverables:** Bot logs every Vivi message to console; doesn't respond yet.
---
### Phase 2: Emoji Parsing & Translation (Week 3-4)
**Goal:** Translate Vivi's emojis to text.
1. **Emoji parsing:**
- Implement regex patterns for Unicode and custom emojis
- Extract emoji sequences in order
- Test with various emoji types
2. **Basic emoji lookup:**
- Query emoji_dictionary table
- Return meanings for known emojis
- Handle unknown emojis
3. **Response formatting:**
- Compose natural language from emoji meanings
- Send reply to channel/DM
- Handle edge cases (no emojis, all unknown)
4. **Manual testing:**
- Create test emojis in database
- Post Vivi messages and verify translations
**Deliverables:** Bot translates Vivi messages; appears in channels and DMs.
---
### Phase 3: Teaching Commands (Week 5-6)
**Goal:** Allow users to teach the bot emoji meanings.
1. **Implement `/teach` command:**
- Parse emoji sequences
- Insert into database with metadata
- Confirm to user
2. **Implement `/meaning` command:**
- Look up single emoji
- Reply with meaning or "not learned yet"
3. **Implement `/forget` command:**
- Delete emoji from database
- Require admin or Vivi permission
4. **Permission system:**
- Restrict teaching to authorized users (Vivi + alters)
- Use Discord roles or user ID allowlist
**Deliverables:** Users can teach and query emoji meanings via commands.
---
### Phase 4: Per-Server Configuration (Week 7)
**Goal:** Allow servers to opt into/out of auto-translation.
1. **Implement `/config auto-translate` command:**
- Toggle auto-translate on/off per server
- Requires admin permission
- Only works in guild context (not DMs)
2. **Update message handler:**
- Check server config before auto-translating
- Only reply if auto_translate == TRUE
- In DMs, always translate when `/translate` is used
3. **On-demand translation:**
- `/translate` command for manual translation
- Works in any context
**Deliverables:** Servers can control translation behavior; bot respects preferences.
---
### Phase 5: Polish & Edge Cases (Week 8+)
**Goal:** Handle real-world complexity.
1. **Natural language formatting:**
- Improve composition of translations
- Handle emoji modifiers (skin tones, ZWJ sequences)
- Custom emoji descriptions
2. **Error handling & resilience:**
- Database unavailability
- PluralKit API failures
- Rate limiting with exponential backoff
- Graceful degradation
3. **Logging & monitoring:**
- Structured logging for debugging
- Monitor database performance
- Track API error rates
4. **Codebase refactoring:**
- Move commands to separate Cogs
- Organize into modules: `cogs/teaching.py`, `cogs/config.py`, etc.
- Add docstrings and type hints
5. **Testing:**
- Unit tests for emoji parsing
- Integration tests for database queries
- End-to-end tests with Discord
**Deliverables:** Robust, maintainable codebase ready for production.
---
## Scaling Considerations
### Multi-Server Architecture
**Challenge:** Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.
**Solution:**
1. **Shared Emoji Dictionary:**
- Single global PostgreSQL database with all emoji meanings
- All servers query the same emoji_dictionary table
- Updates are reflected across all servers immediately
- Reduces redundancy and keeps meanings consistent
2. **Per-Server Configuration:**
- Each guild has its own row in server_configuration
- Fast lookup by guild_id (indexed)
- Allows servers to choose auto-translate vs. on-demand
3. **Connection Pooling:**
- SQLAlchemy async engine with `pool_size=20, max_overflow=10` (tunable)
- Reuses database connections across handlers
- Prevents connection exhaustion under load
### Performance Optimization
1. **Emoji Lookup Performance:**
- Primary key index on emoji_dictionary.emoji_string for O(1) lookup
- Secondary index on custom_emoji_id for custom emoji queries
- Consider in-memory cache (Redis) if lookups become bottleneck:
- Query Redis first (1ms latency)
- Fall back to PostgreSQL
- Invalidate cache on updates
2. **Caching Strategy (Optional, Post-MVP):**
- Use Redis for frequently accessed emojis
- TTL: 1 hour (emoji meanings change rarely)
- Invalidate cache when `/teach` or `/forget` commands update dictionary
- Benefits: Reduced database load, lower latency
3. **Rate Limiting:**
- PluralKit API: 10 requests/second (already enforced by API)
- Discord API: 50 requests/minute per channel (built into discord.py)
- Implement local rate limiting with `asyncio.Semaphore` for PluralKit queries:
```python
semaphore = asyncio.Semaphore(5) # Max 5 concurrent PluralKit queries
```
4. **Message Handler Optimization:**
- Webhook detection (local check): ~0ms
- PluralKit API query: ~100-200ms (async, non-blocking)
- Emoji parsing (regex): ~1-5ms
- Database lookup: ~1-50ms
- **Total:** ~100-250ms per message (acceptable, happens in background)
### Scaling Beyond 2,000 Guilds
**Discord Requirement:** Bots with 2,000+ guilds must implement sharding.
**Sharding in discord.py:**
- discord.py handles sharding automatically if configured
- Bot distributes connections across multiple "shards" to different Discord servers
- Each shard handles a subset of guilds
- Emoji dictionary remains shared across all shards (single database)
**Example Configuration:**
```python
intents = discord.Intents.default()
bot = discord.AutoShardedBot(intents=intents) # Automatic sharding
# Bot will shard automatically based on guild count
```
### Database Scaling
**For Millions of Emojis:**
- Table partitioning by emoji language/category (if dict grows huge)
- Read replicas for queries (if read-heavy)
- Consider denormalization (e.g., cache popular emoji meanings in memory)
**Current Recommendation:** Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.
---
## Component Interaction Diagram
```
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Discord Server (Guild) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ #general │ │
│ │ Vivi (proxied): 😷 2⃣ 🍑 ❌ 🤧 │ │
│ │ Vivi Speech Translator: "Vivi is sick, but not in ..." │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
↑ (message event)
┌──────────────────────────┴──────────────────────────────────────┐
│ Discord.py Bot Framework │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Discord Client │ │
│ │ - Maintains WebSocket connection │ │
│ │ - Routes events to handlers │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Message Event Handler (on_message) │ │
│ │ - Filter: webhook_id? │ │
│ │ - Query: PluralKit API │ │
│ │ - Verify: member_id == Vivi? │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Emoji Parser │ │
│ │ - Extract emojis (regex) │ │
│ │ - Categorize (Unicode/custom) │ │
│ │ - Preserve order │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Translation Engine │ │
│ │ - Lookup emojis in database │ │
│ │ - Compose natural language │ │
│ │ - Format response │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Command Handler (Cogs) │ │
│ │ - /teach: Learn emoji meanings │ │
│ │ - /meaning: Look up emoji │ │
│ │ - /forget: Delete emoji │ │
│ │ - /config: Server preferences │ │
│ │ - /translate: Manual translation │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────┬──────────────────────────────────┘
↓ (database queries/updates)
┌──────────────────────────────────────────────────────────────────┐
│ Database Layer │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ SQLAlchemy ORM + asyncio │ │
│ │ - Async connection pool │ │
│ │ - Connection reuse │ │
│ │ - Transaction management │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database │ │
│ │ │ │
│ │ emoji_dictionary: server_configuration: │ │
│ │ ├─ 😷 → sick ├─ guild_123 → auto ON │ │
│ │ ├─ 2⃣ → two ├─ guild_456 → auto OFF │ │
│ │ ├─ 🍑 → peach └─ guild_789 → auto ON │ │
│ │ ├─ ❌ → no │ │
│ │ └─ :me1: → Vivi (+ metadata, timestamps) │ │
│ │ (+ metadata, timestamps) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
External APIs:
┌────────────────────────────────────────────────────────────┐
│ PluralKit API (api.pluralkit.me) │
│ - GET /v2/messages/{id} → member info │
└────────────────────────────────────────────────────────────┘
```
---
## Technology Stack Recommendations
| Layer | Component | Technology | Why |
|-------|-----------|-----------|-----|
| **Bot Framework** | Discord Integration | discord.py 2.x | Async-native, active community, rich feature set |
| **Database ORM** | Persistence | SQLAlchemy 2.0 + asyncio | Async support, type-safe, widely adopted |
| **Database** | Data Store | PostgreSQL | Reliable, open-source, JSONB for future extensibility |
| **Async Runtime** | Concurrency | asyncio (built-in) | Lightweight, integrated with discord.py |
| **Caching** | Performance (Phase 5+) | Redis | Fast in-memory lookups, TTL support, distributed |
| **Logging** | Debugging | Python logging module | Built-in, structured logging can extend |
| **API Requests** | HTTP Calls | aiohttp | Async-native, connection pooling |
| **Testing** | Quality Assurance | pytest + pytest-asyncio | Async test support, fixtures |
| **Deployment** | Hosting | Docker + systemd or cloud | Reproducible environment, easy updates |
---
## Summary
**Recommended Architecture:**
Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:
1. **Discord Client** listens for messages and routes them through a detection pipeline
2. **Message Event Handler** identifies when Vivi speaks (via PluralKit webhook + API verification)
3. **Emoji Parser** extracts emoji sequences while preserving order and type information
4. **Translation Engine** looks up meanings and composes responses
5. **Database Layer** (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
6. **Command Handler** (discord.py Cogs) allows teaching, querying, and configuration
The bot prioritizes:
- **Reliability:** Graceful error handling, retry logic, database transactions
- **Performance:** O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
- **Scalability:** Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
- **Maintainability:** Modular Cog architecture, clear component boundaries, comprehensive logging
Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.
The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).
---
## Related Documentation
- [PluralKit API Reference](https://pluralkit.me/api/)
- [discord.py Documentation](https://discordpy.readthedocs.io/)
- [SQLAlchemy Async Documentation](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html)
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
- [Redis Documentation](https://redis.io/documentation)

View File

@@ -0,0 +1,546 @@
# Features Research: Vivi Speech Translator
## Executive Summary
The Vivi Speech Translator should combine table-stakes Discord bot features with a hybrid translation approach that starts rule-based but can learn and improve over time. The core differentiator is intelligent emoji-to-text translation that understands Vivi's unique communication system, enabled by a learning mechanism that teaches the bot new emoji meanings while maintaining deterministic, debuggable behavior for known sequences.
## Table Stakes Features
These are features Discord bot users expect and that are necessary for basic functionality:
### Message Event Handling
- **Description**: Detect and respond to messages containing emojis from Vivi
- **Complexity**: Low
- **Dependencies**: Discord.py or discord.js framework
- **Why Essential**: Without this, the bot cannot observe the messages it needs to translate
- **Implementation**: Monitor `on_message` events and filter for messages containing emoji sequences
### Emoji Parsing
- **Description**: Parse both standard Unicode emojis and Discord custom emojis from message content
- **Complexity**: Low
- **Dependencies**: Emoji library (emoji, discord.py emoji utilities)
- **Why Essential**: Must accurately identify which emojis are present to begin translation
- **Implementation**: Use regex patterns to find `\d+` custom emoji IDs and standard emoji Unicode
### Reply/Response Messages
- **Description**: Send translation responses back to chat or as replies
- **Complexity**: Low
- **Dependencies**: Message API (Discord.py message.reply() or create_message())
- **Why Essential**: Users need to see the translation output
- **Implementation**: Format translations as clear, readable text messages; optionally use embeds for rich formatting
### Command Interface
- **Description**: Allow teaching the bot new emoji meanings via commands
- **Complexity**: Medium
- **Dependencies**: Command parser, permission checking
- **Why Essential**: Enables the learning system that makes the bot scale without hardcoding every emoji
- **Implementation**: Slash commands or hybrid prefix/slash commands (see "Message Handling Modes" below)
### Per-Server Configuration
- **Description**: Store server-specific settings (translation mode, custom emoji meanings)
- **Complexity**: Medium
- **Dependencies**: Database (SQLite for small scale, PostgreSQL for production)
- **Why Essential**: Different servers have different preferences for how verbose translations should be
- **Implementation**: Simple key-value store per guild_id (server ID)
### Rate Limiting & Error Handling
- **Description**: Gracefully handle Discord API limits and network errors
- **Complexity**: Medium
- **Dependencies**: Framework error handling
- **Why Essential**: Prevents bot crashes and makes service reliable
- **Implementation**: Exponential backoff for failed API calls, catch timeouts
---
## Differentiating Features
These features set Vivi's translator apart from generic Discord bots:
### Learning System (Rule-Based Foundation with Growth)
- **Description**: Bot learns emoji meanings when explicitly taught by users
- **Complexity**: Medium
- **Why This Differentiates**: Makes the translator sustainable for Vivi's unique and evolving emoji language without requiring the developer to manually hardcode every sequence
- **Constraints**:
- **No implicit learning**: Never infer emoji meanings from context—require explicit teaching
- **Explicit confirmation**: Always confirm back what was learned so users can verify correctness
- **Easy correction**: Provide `/correct :emoji: new_meaning` for when meanings change
- **Community contribution**: Allow anyone in the server to teach, not just Vivi
- **Database**: Store emoji meanings with metadata (who taught it, when, how many uses)
- **Feedback loop**: Track which translations are most helpful, identify ambiguous emojis
### Emoji Sequence Detection
- **Description**: Recognize sequences of emojis that form compound meanings (e.g., 👩‍💻📱 = "coding on phone")
- **Complexity**: Medium
- **Why This Differentiates**: Vivi likely uses emoji clusters; simple one-to-one mapping isn't enough
- **Implementation Approach**:
- Define sequence patterns (consecutive emojis with or without separators)
- Store meanings for common sequences
- Fall back to individual emoji meanings if no sequence match
- Allow users to teach sequences via `/teach :emoji1: :emoji2: compound_meaning`
### Context-Aware Translation Formatting
- **Description**: Vary translation output based on conversational context
- **Complexity**: High
- **Why This Differentiates**: Translations that feel natural, not robotic; adapts tone to channel context
- **Examples**:
- In #general: "Vivi is having fun 😊"
- In #serious-discussion: "Expressing contentment and readiness"
- Response variation: Sometimes expand, sometimes summarize based on recent context
- **Implementation**: Store channel context settings; analyze surrounding messages for tone
### PluralKit Integration (Optional but Recommended)
- **Description**: Detect which alter is communicating via PluralKit webhook metadata
- **Complexity**: High
- **Why This Differentiates**: Essential for communities where Vivi shares an account with headmates; respects system identity
- **How It Works**:
- PluralKit creates webhooks with system/member metadata in username
- Bot can parse webhook source_guild_id and custom embed footers to identify system member
- Enables "Vivi says: [translation]" vs generic bot response
- **Limitations**: Requires PluralKit to be active in server; only works with PluralKit proxy format
### Translation Quality Tracking
- **Description**: Monitor which translations get positive vs negative reactions
- **Complexity**: Medium
- **Why This Differentiates**: Enables continuous improvement and identifies ambiguous emojis needing clarification
- **Implementation**:
- Store emoji usage statistics (frequency, accuracy ratings)
- Provide admin dashboard `/emoji_stats` to see problematic emojis
- Optionally flag ambiguous emojis for human review
### Global Dictionary Option (Future Differentiator)
- **Description**: Share emoji meanings across servers with opt-in
- **Complexity**: High
- **Why This Differentiates**: Could benefit other systems and communities; positions Vivi's system as a standard
- **Constraints**:
- Privacy-first: Only share if opted-in by server admin
- Vivi-specific: Focus on Vivi's emoji system, not generic emoji translation
- Conflict resolution: Last-teaching-wins or voting system for disagreements
---
## Translation Approaches Analyzed
### Rule-Based Pattern Matching
**How it works**: Explicit rules define emoji → text mappings. New mappings must be explicitly added.
**Pros**:
- Fully deterministic and debuggable
- Fast performance (no ML overhead)
- Transparent to users (users understand why emoji means what)
- Easy to correct mistakes (just update the rule)
**Cons**:
- Requires explicit teaching for every emoji/sequence
- Doesn't generalize to patterns the bot hasn't seen
- Becomes unwieldy with hundreds of emoji combinations
### Statistical/ML-Based Approach
**How it works**: Train a model on known emoji → text pairs, predict meanings for unknown emojis using similarity or context.
**Pros**:
- Can handle novel emoji combinations through pattern inference
- Generalizes from limited training data
- Captures semantic relationships between emoji meanings
**Cons**:
- Black-box behavior ("why did the bot translate it that way?")
- Requires significant training data to work well
- Harder to debug when translations are wrong
- Users don't understand or trust the mechanism
### Recommended: Hybrid Approach (Rule-Based + Fallback)
**Phase 1 (MVP): Rule-Based with Community Learning**
- Start with hardcoded emoji meanings for common sequences
- Allow users to teach new emojis via `/teach` command
- 100% transparent: users know exactly why each translation happens
- Fast, reliable, debuggable
**Phase 2 (Future): Statistical Fallback**
- Analyze emoji usage patterns in learned meanings
- If emoji appears in multiple compounds, infer partial meanings
- Use embedding-based similarity to suggest translations for unknown emoji sequences
- Always show confidence scores; require confirmation before using inferred meanings
**Phase 3 (Long-term): Continuous Learning**
- Track user corrections and positive/negative reactions
- Retrain fallback model on accepted vs rejected translations
- Identify consistently ambiguous emojis for human review
- Adjust translation format based on what's most helpful per server
**Why This Is Best**:
- Starts simple and user-friendly
- Scales to hundreds of emojis through learning
- Maintains trust through transparency
- Enables improvement over time without requiring ML expertise
---
## Message Handling Modes
Discord bots can respond to messages in different ways. Choose the approach that best serves Vivi's community:
### Mode 1: Automatic Translation (On Every Message)
**How it works**: Bot automatically translates every message from Vivi (or messages with emoji content)
**Pros**:
- Instant understanding without extra steps
- No friction for casual readers
- Good for channels where translation is the main purpose
**Cons**:
- Can be noisy in mixed-audience channels
- Spoils the "reading Vivi directly" experience for community members who prefer it
- Uses Discord API quota faster
**Best For**: Translation channels (#vivi-translated or similar)
### Mode 2: On-Demand Translation (Reaction or Command)
**How it works**: Users react with a specific emoji or use `/translate` command to request translation
**Pros**:
- Keeps channels clean by default
- Respects users who want to interpret emojis themselves
- Lower API usage
- More intentional interaction
**Cons**:
- Extra step for users
- May miss important messages if people forget to request
- Less discoverable for new community members
**Best For**: Social channels where emoji is part of fun, not solely for understanding
### Mode 3: Toggle (Per-Server Setting)
**How it works**: Server admins choose between automatic or on-demand via `/settings`
**Pros**:
- Respects different community preferences
- Maximizes adoption across servers with different cultures
- Can differentiate channels (auto in #vivi-translations, on-demand in #general)
**Cons**:
- More complex to implement
- Users must learn about settings
**Recommendation for V1**: Implement Mode 3 with default Mode 1 (automatic). Let server admins customize via:
- `/settings translation-mode [auto|on-demand]`
- `/settings translate-channels [list of channel IDs]` for auto mode
### Implementation: Message Handling
- **Event**: Discord `on_message` event
- **Filter**: Check for emoji content using regex: `<a?:[a-zA-Z0-9_]{1,32}:[0-9]{18,20}>` (custom) and Unicode emoji
- **Action**: Call translation engine, format output, send reply
### Command Interface: Slash Commands vs Prefix
**Recommendation**: Use **slash commands** as primary, offer hybrid support
**Why slash commands**:
- Modern Discord standard (easier discoverability)
- No Message Content intent required (better privacy)
- Built-in autocomplete for parameters
- Better for `/teach :emoji: meaning` (emoji picker integration)
**Key slash commands** for Vivi:
- `/teach :emoji: meaning` — Add emoji to dictionary
- `/translate [emoji-string]` — Manually trigger translation
- `/what :emoji:` — Look up emoji meaning
- `/correct :emoji: new_meaning` — Fix a taught emoji
- `/settings` — Server configuration
- `/emoji-stats` — View accuracy/usage statistics
---
## Learning Interface & Feedback System
The learning system is crucial for scalability and adoption. Here's the recommended approach:
### Teaching Commands
```
/teach :emoji1: meaning
→ "Learned! 🎓 I'll translate :emoji1: as 'meaning'"
/teach :emoji1: :emoji2: compound meaning
→ "Learned! 🎓 I'll translate :emoji1: :emoji2: as 'compound meaning'"
```
### Correction System
```
/correct :emoji: new_meaning
→ "Updated! ✏️ I'll now translate :emoji: as 'new_meaning' (was 'old meaning')"
```
### Query System
```
/what :emoji:
→ "Emoji meaning for :emoji: is 'definition'\nTaught by @User on 2024-01-15\nUsed 47 times this month"
```
### Validation & Confirmation
- Always repeat back what was learned
- Show who taught it and when (build community recognition)
- Optionally show confidence if from ML fallback
- Highlight ambiguous meanings if same emoji has multiple teachings
### Feedback Mechanisms
1. **Reaction-based**: Users react with ✅/❌ to translations
- Track which emojis get positive vs negative reactions
- Identify consistently wrong translations for correction
2. **Correction Commands**: `/correct :emoji: new_meaning` explicitly fixes errors
- Creates audit trail of meaning changes
- Enables tracking learning over time
3. **Conflict Resolution**: If multiple teachings for same emoji
- Show all known meanings with vote counts
- Use most recent teaching by default, surface conflicts
- Option: `/disambiguate :emoji: choose_meaning` to select preferred one
### Best Practices
- **Community over individual**: Encourage anyone to teach, not just Vivi or admins
- **Transparency**: Always show source of taught meanings
- **Auditability**: Maintain history of meaning changes
- **Disambiguation**: Flag emojis with conflicting meanings early
- **Escalation**: Provide `/report-ambiguous :emoji:` for admin review
---
## Accessibility Considerations
Translation bots serve accessibility functions, so they must be accessible themselves:
### Text Accessibility
- **Plain text output**: Always provide plain text translations, not just embeds
- **No emoji-only responses**: Never respond with just emoji; always include text
- **Clear language**: Use simple, direct language in translations (avoid jargon)
- **Consistent formatting**: Same emoji always translates the same way (aids screen reader prediction)
### Discord Accessibility
- **Slash commands**: Easier for keyboard navigation than prefix commands
- **Accessible embeds**: If using embeds for formatted output:
- Include plain text alternative in message content
- Avoid using embeds for critical information
- Note: Discord embeds cannot have alt-text for images—only use text-based embeds
### Screen Reader Compatibility
- **Emoji descriptions**: Include what each emoji is called (e.g., "woman technologist emoji")
- **Sequence clarity**: When translating compound sequences, explain the combination
- **No hidden information**: Never put crucial meaning in embed footers or nested fields
### Examples of Accessible Responses
**Good**:
```
Vivi: 👩‍💻📱
Bot: (woman technologist emoji, mobile phone emoji)
Translates to: "coding on phone" or "responding to work messages"
```
**Bad**:
```
Vivi: 👩‍💻📱
Bot: [embed with only emoji in footer, no text explanation]
```
### Implementation
- Test output with screen readers (NVDA, JAWS)
- Provide alternative text format via `/translate --verbose` for complex sequences
- Include emoji names in debug/development output
---
## Anti-Features (What NOT to Build)
These features sound good but should be avoided:
### Anti-Feature 1: Persistent Context Learning
- **What it is**: Bot infers emoji meanings from conversation context without explicit teaching
- **Why not**:
- Creates non-deterministic behavior (same emoji means different things in different contexts)
- Impossible to debug or correct
- Users don't understand the bot's logic
- High error rate leads to loss of trust
- **Better approach**: Explicit `/teach` commands only
### Anti-Feature 2: Cross-Discord Emoji Translation
- **What it is**: Translate emojis the same way across all Discord servers
- **Why not**:
- Emoji meanings are highly personal and context-dependent
- Vivi's system is specific to her community
- Would bloat the dictionary with conflicting meanings
- Not scalable for other users' emoji systems
- **Better approach**: Per-server dictionaries with optional public sharing for Vivi's specific system
### Anti-Feature 3: Real-Time Chat Simulation
- **What it is**: Bot attempts to continue Vivi's conversation or generate new emoji sequences
- **Why not**:
- Out of scope (translation, not generation)
- Risk of impersonation
- Confusion about what Vivi actually said vs what bot generated
- Community prefers Vivi's authentic communication
- **Better approach**: Stick to translating Vivi's actual messages
### Anti-Feature 4: Full NLP Context Analysis
- **What it is**: Use complex NLP to understand message context and vary translations
- **Why not**:
- Over-engineering for the problem
- Adds maintenance burden
- Makes behavior unpredictable
- Initial rule-based approach is more trustworthy
- **Better approach**: Simple context hints (channel type, time of day) with explicit teaching
---
## Configuration Per-Server
Different servers may have different translation preferences:
### Server Settings to Store
```yaml
guild_id: 12345678
settings:
translation_mode: "auto" # or "on-demand"
auto_channels: [chan_id_1, chan_id_2] # channels where auto-translation is enabled
verbose_translations: false # expand vs summarize
show_confidence: false # show certainty for learned meanings
allow_community_teaching: true # can non-mods teach emoji meanings?
default_language: "en" # future: support other languages
include_emoji_names: true # include emoji name for accessibility
```
### Database Schema
```
Emoji_Meanings:
id: UUID
guild_id: int
emoji_unicode: str (or custom_emoji_id)
meaning: str
taught_by_user_id: int
taught_at: timestamp
usage_count: int
accuracy_rating: float (0-1, from reactions)
is_sequence: bool
confidence: float (1.0 for taught, < 1.0 for inferred)
Guild_Settings:
guild_id: int
translation_mode: str
auto_channels: json
...
Emoji_Statistics:
emoji_id: UUID
guild_id: int
total_uses: int
positive_reactions: int
negative_reactions: int
conflicts_count: int
last_updated: timestamp
```
### Database Choice Recommendation
- **Development/Small Scale** (< 100 servers): SQLite with Keyv abstraction
- **Production** (100+ servers): PostgreSQL with connection pooling
- **Real-Time Stats** (future): Redis for caching popular emoji definitions
### Configuration Commands
```
/settings translation-mode [auto|on-demand]
/settings auto-channels [#channel-list]
/settings verbose [true|false]
/settings allow-teaching [true|false]
```
---
## Implementation Roadmap
### MVP (Phase 1): Foundation
- **Features**: Message detection, basic emoji dictionary, `/teach` command, on-demand translation
- **Complexity**: Medium
- **Timeline**: 2-3 weeks
- **Core**: Rule-based translations only; no ML
### V1 (Phase 2): Polished & Learning-Ready
- **Add**: Per-server settings, `/what` query, `/correct` command, reaction-based feedback
- **Add**: Accessibility improvements (emoji names, plain text)
- **Add**: Basic statistics (`/emoji-stats`)
- **Timeline**: 3-4 weeks
- **Focus**: Community testing and meaning refinement
### V2 (Phase 3): Smart Fallback (Optional)
- **Add**: Statistical fallback for unknown emoji sequences
- **Add**: Confidence scores for inferred meanings
- **Add**: Emoji conflict detection and disambiguation
- **Add**: Optional global dictionary sharing
- **Timeline**: 4-6 weeks
- **Focus**: Scalability and reduced manual maintenance
---
## Summary
The Vivi Speech Translator should be built as a **hybrid system**:
1. **Start with deterministic, rule-based translation** that's fully transparent and debuggable
2. **Enable community learning** via simple `/teach` commands that grow the dictionary organically
3. **Provide feedback mechanisms** (reactions, corrections) to improve accuracy over time
4. **Remain focused** on Vivi's specific emoji system, not generic emoji translation
5. **Prioritize accessibility** since translation itself is an accessibility feature
6. **Leave room for future ML enhancement** but don't build it until needed
The core differentiator is not sophisticated AI, but **intentional design for learning and community participation**. The bot becomes more valuable as more people teach it, creating network effects that benefit the whole community.
### Recommended Feature Set for V1
- ✅ Message detection and emoji parsing
- ✅ Rule-based translation with `/teach` command
- ✅ Per-server configuration (auto vs on-demand mode)
- ✅ Correction and query commands (`/what`, `/correct`)
- ✅ Reaction-based feedback (✅/❌)
- ✅ Accessible output (plain text, emoji names)
- ✅ PluralKit integration (if Vivi's community uses it)
- ⏭️ Statistics dashboard (Phase 2)
- ⏭️ ML fallback (Phase 3+, if needed)
---
## Sources & References
### Discord Bot Development
- [Best Discord Bots in 2026: Complete Guide](https://blog.communityone.io/best-discord-bots/)
- [Storing data with Keyv | discord.js Guide](https://discordjs.guide/keyv/)
- [Awaiting Messages & Reactions · A Guide to Discord Bots](https://maah.gitbooks.io/discord-bots/content/getting-started/awaiting-messages-and-reactions.html)
- [Discord.js - Responding to Messages](https://cratecode.com/info/discordjs-responding-to-messages)
### Slash Commands & Command Interfaces
- [Slash command prefixes · discord/discord-api-docs](https://github.com/discord/discord-api-docs/discussions/3744)
- [Discord Interactions | Pycord Guide](https://guide.pycord.dev/interactions)
### Emoji & NLP Processing
- [NLP Series: Day 5 — Handling Emojis: Strategies and Code Implementation](https://medium.com/@ebimsv/nlp-series-day-5-handling-emojis-strategies-and-code-implementation-0f8e77e3a25c)
- [Assessing Emoji Use in Modern Text Processing Tools](https://arxiv.org/pdf/2101.00430)
- [Emojinize: Enriching Any Text with Emoji Translations](https://arxiv.org/html/2403.03857v2)
### Translation Approaches
- [Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation](https://www.mdpi.com/2079-9282/9/2/201)
- [Rule-Based Machine Translation - Wikipedia](https://en.wikipedia.org/wiki/Rule-based-machine-translation)
- [Rule Based Approach in NLP - GeeksforGeeks](https://www.geeksforgeeks.org/nlp/rule-based-approach-in-nlp/)
### Accessibility
- [Discord: Accessibility in Web Apps Done Right](https://a11yup.com/articles/discord-accessibility-in-web-apps-done-right/)
- [Discord Accessibility for blind users](https://support.discord.com/hc/en-us/community/posts/360032435152-Discord-Accessibility-for-blind-users)
- [Using a Screen Reader on Discord](https://support.discord.com/hc/en-us/articles/7180791233559-Using-a-Screen-Reader-on-Discord)
- [GitHub - 9vult/Raiha: Raiha Discord Accessibility Bot](https://github.com/9vult/Raiha)
### PluralKit Integration
- [PluralKit - System Management Bot](https://pluralkit.me/)
- [Navigating PluralKit: A Guide to Discord's Unique Bot for System Management](https://www.oreateai.com/blog/navigating-pluralkit-a-guide-to-discord-unique-bot-for-system-management-31ce1863fda39661189c6b8c031c864b)
- [GitHub - PluralKit/PluralKit](https://github.com/PluralKit/PluralKit)
### Database & Configuration
- [How to Create a Database for Your Discord Bot](https://cybrancee.com/learn/knowledge-base/how-to-create-a-database-for-your-discord-bot/)
- [How I Host a Bot in 45,000 Discord Servers For Free](https://dev.to/mistval/how-i-host-a-bot-in-45000-discord-servers-for-free-5bk9)
### Feedback Systems
- [Automating User Feedback Monitoring on Discord Using AI](https://cohere.com/blog/automating-user-feedback-monitoring-on-discord-using-ai)
- [From Discord Chaos to Organized Feedback](https://betahub.io/blog/guides/2025/07/16/discord-community-feedback.html)

248
.planning/research/INDEX.md Normal file
View File

@@ -0,0 +1,248 @@
# Vivi Speech Translator - Research Index
**Research Date:** January 29, 2025
**Status:** Complete
**Total Documentation:** 2,900 lines across 7 files
---
## Quick Navigation
### I Want to...
**Get a quick overview fast** → Read [QUICK_REFERENCE.md](./QUICK_REFERENCE.md) (5 min read)
- Installation checklist
- One-page tech summary
- Code examples
- Common commands
**Understand the detailed reasoning** → Read [STACK.md](./STACK.md) (15 min read)
- Comprehensive analysis of all 7 research questions
- Production-ready recommendations
- Anti-patterns to avoid
- Implementation roadmap
- Cost breakdown
**Review existing files in project** → Check other *.md files in this directory
- [ARCHITECTURE.md](./ARCHITECTURE.md) - Project structure planning
- [FEATURES.md](./FEATURES.md) - Feature specifications
- [PITFALLS.md](./PITFALLS.md) - Common mistakes to avoid
- [README.md](./README.md) - Overview
---
## The 7 Key Questions - Quick Answers
| Question | Answer | Confidence | Location |
|----------|--------|------------|----------|
| **1. Best Discord bot framework?** | discord.py 2.6.4 (Python) | ✅ HIGH | STACK.md §1 |
| **2. Best language to use?** | Python 3.10+ | ✅ HIGH | STACK.md §2 |
| **3. Database for emoji mappings?** | SQLite (MVP) → PostgreSQL (prod) | ✅ HIGH | STACK.md §3 |
| **4. How to integrate PluralKit?** | pluralkit.py + webhook dispatch | ✅ HIGH | STACK.md §4 |
| **5. Standard emoji/text libraries?** | emoji 2.11.0+, pydantic 2.5.0+ | ✅ HIGH | STACK.md §5 |
| **6. Best hosting option?** | Railway ($0-5/mo) or Oracle Cloud (free) | ✅ HIGH | STACK.md §6 |
| **7. Authentication best practices?** | Slash commands, no MessageContent intent | ✅ HIGH | STACK.md §7 |
---
## File Guide
### STACK.md (690 lines) - PRIMARY RESEARCH DOCUMENT
**Read this for:** Complete analysis with full rationale
**Key Sections:**
- Executive Summary (2-3 sentence overview)
- Discord Bot Framework analysis (includes why NOT Pycord)
- Language comparison (Python vs JS vs Rust vs Go)
- Database schema and comparison (SQLite vs PostgreSQL)
- PluralKit integration mechanism (how webhooks work)
- Key libraries table (with versions and installation)
- Hosting options (Railway, Oracle Cloud, self-hosted)
- Authentication/Permissions guide (intents, OAuth2)
- Anti-patterns (7 major mistakes to avoid)
- Implementation roadmap (4 phases over 6-8 weeks)
- Cost breakdown (monthly expenses by phase)
- References (10 authoritative sources)
**Use this to:**
- Explain tech choices to stakeholders
- Understand production requirements
- Plan implementation phases
- Avoid common pitfalls
---
### QUICK_REFERENCE.md (351 lines) - DEVELOPER START GUIDE
**Read this for:** Get started immediately with code examples
**Key Sections:**
- Installation checklist (copy-paste pip commands)
- Database quick start (SQLite code, PostgreSQL connection string)
- PluralKit integration checklist (7-step verification)
- Slash commands example (vs prefix commands)
- Environment variables template (.env setup)
- Bot intents configuration (what to enable/disable)
- Hosting setup instructions (Railway CLI commands)
- Common development commands
- Anti-patterns summary (7 DON'Ts)
- Testing examples (pytest code)
- Troubleshooting table (6 common issues)
- Next steps checklist
**Use this to:**
- Set up local development environment
- Copy code templates
- Find command syntax
- Quick-fix common problems
- Deploy to production
---
### Other Files in Directory
**ARCHITECTURE.md** (774 lines)
- Existing project structure planning
- System design diagrams
- Module organization
- Data flow
**FEATURES.md** (546 lines)
- Feature specifications
- User stories
- Requirements breakdown
- Acceptance criteria
**PITFALLS.md** (506 lines)
- Common mistakes teams make
- Why each pitfall happens
- How to avoid/fix them
**README.md** (33 lines)
- Directory overview
- File purposes
---
## Tech Stack at a Glance
```
Bot Framework: discord.py 2.6.4 (Python)
Database: SQLite (MVP) → PostgreSQL (prod)
PluralKit: pluralkit.py + webhook dispatch
Hosting: Railway ($0-5/mo) or Oracle Cloud (free)
Commands: Slash commands (/)
Cost (MVP): $0/month
Cost (Prod): $0-20/month
Confidence: VERY HIGH (all 2025-current)
```
---
## Implementation Phases
### Phase 1: MVP (Weeks 1-2) - $0/month
- discord.py + SQLite
- `/learn 🎭 "meaning"` and `/translate` commands
- Local testing
### Phase 2: PluralKit (Weeks 3-4) - $5/mo
- Webhook endpoint
- Member detection
- System caching
### Phase 3: Production (Weeks 5-6) - $0-15/mo
- PostgreSQL migration
- Deploy to Railway
- Error tracking
### Phase 4: Scaling (Weeks 7+) - $20-50/mo
- Global emoji dictionary
- Per-server overrides
- Analytics dashboard
- Redis caching (optional)
---
## Before You Start
### Required Setup
- [ ] Python 3.10+ installed locally
- [ ] Create Discord application (Discord Developer Portal)
- [ ] Get PluralKit system token (run `pk;token` in Discord)
- [ ] Read STACK.md and QUICK_REFERENCE.md
- [ ] Understand anti-patterns section
### Avoid These Common Mistakes
1. Don't use Pycord (unmaintained since 2023)
2. Don't store bot token in code (use .env)
3. Don't request MessageContent intent (use slash commands)
4. Don't poll PluralKit API (use webhooks instead)
5. Don't use synchronous database calls
6. Don't write custom webhook signature code
7. Don't let emoji history table grow unbounded
### Recommended First Step
1. Read QUICK_REFERENCE.md (10 min)
2. Set up Python environment (15 min)
3. Create Discord bot in Developer Portal (5 min)
4. Get PluralKit token (1 min)
5. Start coding MVP (then reference STACK.md for details)
---
## High-Confidence Decisions
These recommendations have **VERY HIGH confidence** because they are:
✓ Based on 2025 current releases (discord.py 2.6.4, released Oct 2025)
✓ Production-proven (used by 1000+ Discord bots)
✓ Actively maintained (discord.py: new version every 3 months)
✓ Community-tested (largest Python Discord bot community)
✓ Cost-effective (minimal infrastructure required)
✓ Future-proof (no deprecated technologies)
None of the recommendations are speculative or "experimental."
---
## What NOT in This Research
This research focuses on **2025 production-ready tech stack**, not:
- ❌ Machine learning for emoji translation (beyond scope)
- ❌ Advanced NLP processing (emoji library sufficient)
- ❌ Real-time synchronization across 1000+ servers (future phase)
- ❌ Custom Discord bot framework creation (use existing)
- ❌ Legacy Python 2.7 support (use Python 3.10+)
---
## Questions?
If you have questions about these recommendations:
1. **Specific library question?** → Check "Key Libraries" table in STACK.md
2. **How does PluralKit work?** → See "PluralKit Integration" section in STACK.md
3. **How to set up locally?** → Follow "Installation Checklist" in QUICK_REFERENCE.md
4. **What to avoid?** → Review "Anti-patterns" section in STACK.md
5. **Need example code?** → See code examples in QUICK_REFERENCE.md
6. **Cost concerns?** → Check "Cost Breakdown" in STACK.md
---
## Summary
The **2025 tech stack for Vivi Speech Translator** is:
- **Framework:** discord.py 2.6.4 (Python 3.10+)
- **Database:** SQLite (MVP) → PostgreSQL (production)
- **Integration:** pluralkit.py + webhook dispatch
- **Hosting:** Railway Cloud ($0-5/month)
- **Confidence:** Very High (all current, production-proven)
**Start here:** Read QUICK_REFERENCE.md, then reference STACK.md for deep dives.
---
*Last Updated: January 29, 2025*
*Maintained for: Vivi Speech Translator Project*

View File

@@ -0,0 +1,506 @@
# Pitfalls Research: Vivi Speech Translator
A comprehensive analysis of common mistakes in Discord bot development, with specific focus on PluralKit integration, emoji translation, and learning systems.
---
## PluralKit Integration Pitfalls
### Pitfall: Unreliable Vivi Message Detection (Webhook vs Direct Author Check)
**What goes wrong:**
PluralKit proxies messages through webhooks with the member's name. Bots can detect Vivi's messages by either:
- Checking if the author is the user ID (unreliable if webhook proxying is used)
- Parsing the webhook username (fragile - can be modified)
- Checking for proxy tags in message content (only works with bracket-style proxies)
The core issue: If detection logic mixes these approaches or doesn't account for webhook proxying edge cases, you'll get false positives (bot responds to non-Vivi messages) and false negatives (Vivi's messages don't translate).
**Warning signs:**
- Bot randomly responds to messages from other PluralKit members
- Vivi's proxied messages don't trigger translation but manually typed messages do
- Bot responds during PluralKit testing/repoxying (reproxy command)
- Inconsistent detection across channels with different permissions
**Prevention:**
- **Consistent source of truth**: Decide on ONE reliable detection method - either webhook creator ID (most reliable) or member username in webhook
- **Query PluralKit API**: When uncertain, use PluralKit's REST API to verify if a message came from Vivi's system (3/second rate limit for updates)
- **Cache member names**: Store known proxy tag patterns for Vivi locally to reduce API calls
- **Test edge cases**: Reproxy, message edits, reactions on webhook messages, and DMs from Vivi
- **Log detection failures**: When detection fails, log the message author, webhook info, and detected proxy tags for debugging
**Which phase should address it:** Phase 1 (core message detection) - must be bulletproof before teaching system
**API Rate Limit Note:** PluralKit enforces:
- 10/second for GET requests (member lookup)
- 3/second for POST/PATCH/DELETE (system updates)
- Use Dispatch Webhooks for event-driven updates instead of polling
---
### Pitfall: Webhook Editing Race Conditions
**What goes wrong:**
When Vivi edits her proxied message, your bot attempts to edit its translation simultaneously. Discord webhooks don't handle concurrent edits well:
- Message edit by original proxy webhook and bot translation edit can conflict
- Race conditions can cause message state corruption
- Edited messages may revert to old content or show partial updates
- Reactions added during the edit window may be lost
**Warning signs:**
- Translations occasionally show old/incorrect emoji meanings after Vivi edits her message
- Bot throws 500 errors when trying to edit webhook messages
- Translation accuracy degrades when Vivi edits quickly
- Missing reactions after Vivi's message is edited
**Prevention:**
- **Don't edit webhook messages directly**: Instead, post a new translation message and delete the old one
- **Add edit detection**: Use `message.edited_at` to detect changes, but DON'T race with Vivi
- **Queue edit requests**: If Vivi edits, queue a job to re-translate after a 1-second delay to avoid simultaneous edits
- **Handle 500s gracefully**: Treat webhook edit failures as "post new translation instead"
- **Add edit timestamps**: Show "(edited)" in translation to indicate it's a response to an edit
**Which phase should address it:** Phase 2 (message handling) or Phase 3 (refinement)
---
## Discord API Pitfalls
### Pitfall: Message Content Intent Denial and Architecture Lock-In
**What goes wrong:**
Discord requires `message_content` privileged intent to read emoji content in messages. Bots in 75+ servers must apply for approval. Common mistakes:
- Building the bot assuming you'll get approval (you might not)
- Designing architecture around passive message scanning instead of interactions
- Failing to plan alternatives when approval is denied
- Using `@bot.event async def on_message()` which conflicts with slash command handlers
**Warning signs:**
- Bot works in testing but stops working after hitting 75 servers
- Approval denial with no fallback plan
- Message content suddenly inaccessible mid-development
- Architecture rewrite needed to add slash commands later
**Prevention:**
- **Design for slash commands first**: Use `/translate <emoji>` instead of passive scanning
- **Use Interactions API**: Buttons, select menus, and slash commands don't require message content intent
- **Plan for denial**: Have a fallback UI (buttons to trigger translation, not automatic)
- **Unverified bots get free access**: Stay under 75 servers during development, or use unverified bot for testing
- **Document intent usage**: Be ready to explain why your emoji translation bot needs message content (you'll need to for 75+ servers)
- **Prepare alternatives**: Reactions or buttons as fallback if approval is denied
**Which phase should address it:** Phase 1 (architectural decisions)
---
### Pitfall: Rate Limiting and Burst Requests
**What goes wrong:**
Discord enforces global (50 req/sec) and per-route rate limits. Vivi Speech can hit limits when:
- Translating messages with many emojis (multiple API lookups)
- Multiple users triggering translations simultaneously
- Teaching system saving entries rapidly
- PluralKit API queries in addition to Discord API calls
**Warning signs:**
- Bot suddenly goes silent (no responses)
- 429 (Too Many Requests) errors in logs
- Delayed translations (multi-second latency)
- Inconsistent behavior during peak usage
**Prevention:**
- **Cache emoji translations**: Store learned meanings in-memory with TTL (time-to-live)
- **Batch emoji lookups**: If translating a message with 5 emojis, don't make 5 API calls - batch them
- **Implement exponential backoff**: When rate limited, wait with exponential delays (1s, 2s, 4s...)
- **Queue teaching commands**: Don't save to database on every teach attempt - queue and batch writes
- **Monitor rate limit headers**: Parse `X-RateLimit-Remaining` and `X-RateLimit-Reset-After` headers
- **Shard properly**: Maintain ~1,000 guilds per shard maximum
- **Use caching layers**: Redis or in-memory LRU cache for frequently translated emojis
**Which phase should address it:** Phase 2 (scaling) and Phase 3 (optimization)
---
### Pitfall: Privilege and Permission Confusion
**What goes wrong:**
Bots need specific permissions but devs often either:
- Request too many permissions (users won't invite)
- Request insufficient permissions (bot fails silently)
- Don't verify permissions before action (command fails with cryptic error)
- Don't check user permissions before teaching (malicious edits to dictionary)
**Warning signs:**
- Bot invited but can't send messages
- Teaching commands work in DM but not in server
- Translation attempts fail silently with no error message
- Non-mods can change emoji meanings
**Prevention:**
- **Minimal permission set**: Only request `send_messages`, `manage_messages` (for deleting own messages), `read_message_history`
- **Check before acting**: Verify bot has required permissions using `message.channel.permissions_for(bot.user)`
- **User permission checks**: Only allow trusted users (mods, Vivi herself) to teach emoji meanings
- **Clear error messages**: "I don't have permission to send messages here" instead of silent failure
- **Test on new servers**: Invite bot to a test server with minimal permissions and verify all features work
**Which phase should address it:** Phase 1 (setup) and Phase 3 (moderation)
---
## Learning System Pitfalls
### Pitfall: Dictionary Quality Degradation Over Time
**What goes wrong:**
User-contributed learning systems fail when:
- Users add typos, slang, or inside jokes as emoji meanings
- Duplicate or conflicting meanings accumulate (😀 = "happy", "smile", "goofy face")
- Rarely-used emojis have outdated or weird meanings
- No audit trail - can't track who broke the dictionary
- Stale entries that were never useful remain forever
**Warning signs:**
- Translations become nonsensical or off-topic
- Conflicting definitions for same emoji (confusing translations)
- Many emoji with zero meaningful translation
- Teaching system abused by trolls or internal conflicts
**Prevention:**
- **Validation on teach**: Check for minimum length (3 chars), no excessive emojis in meaning, no URLs
- **Audit trail**: Log every emoji meaning change with `timestamp`, `user_id`, `old_value`, `new_value`
- **Review process**: For shared systems, flag new meanings for mod approval before going live
- **Meaning versioning**: Keep multiple meanings, let users vote/rank them (future: Phase 4)
- **Freshness markers**: Track last-used date; prompt for re-confirmation if unused for 90+ days
- **Duplicate detection**: Warn if adding a meaning similar to existing ones
- **Clear command output**: Show current meaning before accepting new one, ask for confirmation
**Which phase should address it:** Phase 3 (learning) - implement audit trail from day one
---
### Pitfall: Teaching Interface Too Complex or Text-Heavy
**What goes wrong:**
Learning systems fail when the teaching UI is hard to use:
- Complex command syntax that users forget
- Too many options/flags (overwhelming)
- No confirmation of what was taught
- Text-heavy responses (bad for users with dysgraphia like Vivi)
- No visual feedback (emoji shown in response)
**Warning signs:**
- Few emoji meanings actually get added
- Users give up and stop teaching
- Confusion about command syntax
- Vivi avoids using teaching feature
- Other system members always teach instead
**Prevention:**
- **Simple one-liner commands**: `/teach 😀 happy` not `/teach --emoji 😀 --meaning "happy" --priority high`
- **Visual confirmation**: Include the emoji in the response ("Learned: 😀 = happy")
- **Show current meaning**: "😀 currently means: happy | Update it? Type: `/teach 😀 new meaning`"
- **Short responses**: Keep bot responses under 2 lines when possible
- **Use buttons over typing**: React with checkmark/X for confirmation instead of "yes/no"
- **Emoji picker**: If possible, allow selecting emoji by reaction instead of typing
- **Accessible syntax**: Support aliases - `/learn 😀 happy` same as `/teach 😀 happy`
**Which phase should address it:** Phase 3 (learning system design)
**Accessibility Note:** Vivi has Dysgraphia, which affects writing ability. Keep commands short, use visual confirmation (emoji in responses), and minimize text output.
---
### Pitfall: Scope Creep in Learning Features
**What goes wrong:**
Learning systems start simple but can grow uncontrollably:
- "Let's add multiple meanings per emoji" → complexity explosion
- "Different meanings for different contexts" → database redesign
- "Per-server emoji dictionaries" → multi-tenancy complexity
- "Emoji meaning versioning and rollback" → audit log nightmare
- "Machine learning to auto-generate meanings" → maintenance burden
**Warning signs:**
- Feature backlog grows faster than you can implement
- Core translation becomes slow/unreliable
- Code becomes hard to understand and modify
- New features break old ones
- Team gets overwhelmed with edge cases
**Prevention:**
- **MVP scope**: Phase 1-3 = simple one-meaning-per-emoji, all servers share same dictionary
- **Clear phase boundaries**: Document what's in each phase; don't add features mid-phase
- **Say no to feature requests**: Politely defer to Phase 4 or beyond
- **Keep it simple**: One meaning per emoji, user teaches it once, it sticks
- **Future extensibility**: Design database schema to support multiple meanings later, but don't implement it yet
- **Regular scope reviews**: Every 2 weeks, ask: "Is this feature essential to core translation?"
**Which phase should address it:** Phase 1 (planning) - establish clear phase gates
---
## Multi-Server and Data Pitfalls
### Pitfall: Global Dictionary Conflicts Across Servers
**What goes wrong:**
A shared emoji dictionary works well until different communities use the same emoji differently:
- Server A uses 🎉 for "party"; Server B uses it for "achievement"
- Emoji meanings don't match community context
- Users from Server B are confused by Server A's translations
- No way to override meanings per-server
**Warning signs:**
- Users report wrong translations in their server
- Emoji meanings conflict between communities
- No way to customize meanings per-guild
- One server's trolls ruin translations for everyone
**Prevention:**
- **Phase 1-2: Global only**: Accept that all servers share one dictionary
- **Phase 4 planning**: Design per-server override system (store in `server_id:emoji` key)
- **Document limitation**: "Emoji meanings are shared across all servers - curate carefully"
- **Moderation**: Have a trusted team that curates the global dictionary
- **Community rules**: Require consensus or voting before changing popular emoji meanings
- **Meaning context**: Store both the meaning AND its frequency/reliability (crowdsourced)
**Which phase should address it:** Phase 4 (advanced) - keep Phase 1-3 global-only
---
## Emoji Handling Pitfalls
### Pitfall: Unicode Representation Edge Cases and Combining Characters
**What goes wrong:**
Emoji are more complex than they appear:
- Some "single" emoji are multi-codepoint: 👨‍👩‍👧 (family) = 7 codepoints with zero-width joiners
- Variation selectors () change emoji appearance: ❤ vs ❤️
- Skin tone modifiers add extra codepoints: 👋 vs 👋🏻
- Regex fails on complex emoji
- String length in Python != visual emoji count
**Warning signs:**
- Some emojis don't parse or get corrupted
- Emoji combinations disappear or get split
- Search for specific emoji sometimes fails
- Emoji with skin tones treated as separate emojis
**Prevention:**
- **Use emoji library**: Don't parse manually - use `emoji` package which understands combining characters
- **Normalize input**: Normalize emoji to canonical form before storage/lookup (NFD normalization)
- **Test edge cases**: Include in test suite:
- Family emoji (👨‍👩‍👧)
- Skin tone modifiers (👋🏻 through 👋🏿)
- Gendered variants (👨‍⚕️ vs 👩‍⚕️)
- Flags (🇺🇸 = 2 regional indicators)
- Keycap sequences (1⃣)
- **Store as text, not codepoints**: Keep emoji as Unicode strings in database, not split into codepoints
- **Validate emoji**: Check if input is actually a valid emoji using `emoji.is_emoji()` before storing
- **Document supported emoji**: Be explicit about which emoji are supported (all Unicode emoji, or subset?)
**Which phase should address it:** Phase 2 (core translation)
**Reference:** Complex emoji like 👨‍👩‍👧‍👦 (family) consist of multiple code points: `['0x00000031', '0x0000fe0f', '0x000020e3']` pattern. Never assume 1 emoji = 1 character.
---
## Security Pitfalls
### Pitfall: Hidden Command Privilege Escalation and Authorization Bypass
**What goes wrong:**
Learning systems allow users to modify bot data. Common authorization mistakes:
- No permission check - any user can teach emoji
- No hierarchy check - regular user can override mod's meanings
- Teaching command accepts unsafe input (SQL injection, command injection)
- Audit trail incomplete - can't prove who made unauthorized changes
- Bot token in environment exposed - full compromise
**Warning signs:**
- Non-mods can modify emoji dictionary
- Troll edits spread before mods notice
- No way to revert malicious changes
- Bot behaves unexpectedly with suspicious permissions
- Dictionary contains offensive or misleading entries
**Prevention:**
- **Permission check every command**: Verify user is mod/trusted before `/teach`
- **Whitelist approach**: Only specific users (Vivi, trusted friends) can teach, not everyone
- **Input validation**: Sanitize meaning text - no special chars, max length, filter profanity
- **Audit everything**: Log `user_id`, `timestamp`, `emoji`, `old_meaning`, `new_meaning`, `was_approved`
- **Immutable audit log**: Once written, audit entries can't be modified
- **Reversibility**: Always support `/undo` or `/revert <emoji>` for recent changes
- **No bot token exposure**: Use `.env` file (gitignored), not hardcoded secrets
- **Rate limit teaching**: Prevent spam - one teach per user per 5 seconds
- **Approval workflow**: For shared systems, new meanings require mod approval before going live
**Which phase should address it:** Phase 3 (learning system)
---
### Pitfall: PluralKit API Data Privacy and Personal Information Leakage
**What goes wrong:**
When querying PluralKit API to verify Vivi's identity:
- System info becomes visible to other users through bot queries
- Member details (pronouns, description) could be displayed accidentally
- API errors expose system ID in stack traces
- Bot caches PluralKit data but doesn't respect privacy settings
**Warning signs:**
- System info visible when not intended
- Other users can query Vivi's system through bot commands
- Sensitive member data appears in error messages
- Bot stores outdated PluralKit data
**Prevention:**
- **Minimal API queries**: Only fetch what you need (member ID, not full profile)
- **Cache respectfully**: Store only user ID verification, not personal details
- **Error handling**: Don't expose system IDs or member names in error messages
- **Privacy by default**: Don't display any system info unless Vivi explicitly allows it
- **Respect privacy settings**: If Vivi's system is private, don't query it
- **No logging of personal data**: Filter logs to remove member names, descriptions
- **Clear API use policy**: Document what data you collect and why (for Vivi's consent)
**Which phase should address it:** Phase 1 (architecture) - design for privacy from the start
---
## Translation Quality Pitfalls
### Pitfall: Translations Feel Robotic or Lose Context
**What goes wrong:**
Simple concatenation of emoji meanings produces awkward, stilted translations:
- "😀😍🎉" becomes "happy + love + party" (grammatically weird)
- Emoji in sequence don't flow naturally
- Context is lost (is this celebratory? sarcastic? sad?)
- Complex emoji (👨‍👩‍👧) get broken into confusing pieces
**Warning signs:**
- Translations feel hard to read
- Users prefer the original emoji over bot's translation
- Emoji sequences don't combine logically
- Accessibility readers struggle with the output
**Prevention:**
- **Adaptive formatting**: Group related emoji:
- Multiple emoji of same type → comma-separated ("happy, joyful, excited")
- Verb + object → natural phrasing ("loves X", "celebrates with")
- Emoji + punctuation → handle specially
- **Context awareness**: If Vivi teaches "😀 = amused at something", use that context
- **Order matters**: Preserve emoji order in translation, not alphabetical
- **Natural language**: Use connectors ("and", "with") not just commas
- **Test readability**: Read translation aloud - does it sound natural?
- **User testing**: Show translations to Vivi - do they capture intent?
**Which phase should address it:** Phase 3 (refinement) or Phase 4 (NLP)
---
## Accessibility Pitfalls
### Pitfall: Teaching System Too Text-Heavy for Users with Dysgraphia
**What goes wrong:**
Emoji learning systems assume users can easily type complex commands and read long responses. For Vivi (with Dysgraphia):
- Typing is laborious and produces errors
- Long text responses are hard to parse
- No visual confirmation of what was taught
- Complex command syntax is hard to remember
- Alternative input methods not supported
**Warning signs:**
- Vivi avoids using teaching feature entirely
- Alternate system members always teach instead
- Teaching commands are frequently mistyped
- Vivi asks for repetition of bot responses
- Few emoji get added to dictionary
**Prevention:**
- **Minimal typing required**: `/teach 😀 happy` not `/teach --emoji 😀 --meaning "happy" --tags emotion`
- **Visual confirmation**: Show emoji in response for confirmation (eyes can process faster than text)
- **Short responses**: Max 1-2 sentences, not paragraphs
- **Command aliases**: Both `/teach` and `/learn` work for same function
- **Text-to-speech friendly**: Use punctuation, avoid abbreviations, clear structure
- **Reaction-based UI**: "React with ✓ to confirm" instead of "Type 'yes'"
- **Error recovery**: If typo in emoji, bot suggests correction with reaction buttons
- **Accessible defaults**: Large, clear emoji in responses; emoji codes visible in text form too
- **Alternative confirmation**: "This emoji now means X. React ✓ to keep, or I'll delete in 5 sec"
**Which phase should address it:** Phase 3 (learning system UX)
**Context:** Vivi has Dysgraphia, affecting her ability to write and type. Design for minimal text input and maximum visual feedback.
---
## Hosting and Infrastructure Pitfalls
### Pitfall: Inadequate Infrastructure Leads to Downtime
**What goes wrong:**
Small Discord bots often run on free/cheap hosting that can't handle scale:
- Heroku, Replit, Glitch shut down or have uptime issues
- Database goes down → translations stop working
- No monitoring → crashes go unnoticed for hours
- Single-point failure → bot death = translations unavailable
**Warning signs:**
- Bot goes offline unpredictably
- Slow response times during peak hours
- Database connection timeouts
- No way to know if bot is running
**Prevention:**
- **Use paid hosting**: AWS, Digital Ocean, Google Cloud - reliable infrastructure
- **Database backup**: Regular automated backups of emoji dictionary
- **Health checks**: Bot pings itself regularly; alerts if no response
- **Logging**: All errors logged to persistent storage, not just console
- **Redundancy (future)**: For Phase 4+, consider running bot on 2 servers with failover
- **Monitoring**: Use tools like Sentry or DataDog to track crashes
- **Graceful degradation**: If database is down, serve cached emoji meanings
**Which phase should address it:** Phase 1 (setup) - establish reliable hosting immediately
---
## Summary: Top Pitfalls to Watch For
### 1. **Message Detection Reliability** (Phase 1 - CRITICAL)
Unreliable detection of Vivi's messages through PluralKit webhook proxying leads to missed translations or false positives. Use webhook creator ID as source of truth, implement proper caching, and test edge cases.
### 2. **Message Content Intent and Architecture** (Phase 1 - CRITICAL)
Designing bot around passive message scanning when you may not get privileged intent approval. Plan for slash commands and interactions from the start; treat message content as optional.
### 3. **Dictionary Quality Degradation** (Phase 3 - HIGH)
User-contributed emoji meanings become nonsensical over time without validation, audit trails, and review processes. Implement these from day one, not as an afterthought.
### 4. **Teaching Interface Complexity** (Phase 3 - HIGH)
Text-heavy, complex teaching commands discourage use and frustrate Vivi (with Dysgraphia). Keep it simple: `/teach emoji meaning`. Show visual confirmation.
### 5. **Rate Limiting and Scaling** (Phase 2+ - MEDIUM)
Discord and PluralKit API limits hit unexpectedly when translating messages with many emoji. Implement caching, batch requests, and exponential backoff from Phase 2 onward.
### 6. **Emoji Edge Cases** (Phase 2 - MEDIUM)
Complex emoji with combining characters, skin tones, and zero-width joiners break naive parsing. Use proper emoji library, normalize input, test thoroughly.
### 7. **Authorization and Security** (Phase 3 - HIGH)
Teaching system without permission checks or audit trails leads to troll edits and data corruption. Require authentication, validate input, log everything.
### 8. **Webhook Race Conditions** (Phase 2+ - MEDIUM)
Simultaneous edits by Vivi and bot translation cause corruption. Post new translations instead of editing; queue requests with delay to avoid races.
---
## Research Sources
- [Discord Bot Development Guide 2025](https://cybrancee.com/blog/how-to-make-a-simple-discord-bot-ultimate-2025-guide/)
- [PluralKit Documentation](https://pluralkit.me/)
- [Discord Rate Limiting Guide](https://discord.com/developers/docs/topics/rate-limits)
- [Message Content Intent FAQ](https://support-dev.discord.com/hc/en-us/articles/4404772028055-Message-Content-Privileged-Intent-FAQ)
- [Discord.py Emoji Handling](https://github.com/Rapptz/discord.py)
- [Emoji Library Documentation](https://carpedm20.github.io/emoji/docs/index.html)
- [Discord Webhook Best Practices](https://hookdeck.com/webhooks/platforms/guide-to-discord-webhooks-features-and-best-practices)
- [Discord Bot Security Best Practices 2025](https://friendify.net/blog/discord-bot-security-best-practices-2025.html)
- [Accessibility for Dysgraphia](https://top5accessibility.com/blog/orthographic-dyslexia-dysgraphia/)

View File

@@ -0,0 +1,351 @@
# Quick Reference: Tech Stack Decisions
**For comprehensive rationale, see [STACK.md](./STACK.md)**
## One-Page Summary
```
Bot Framework: discord.py 2.6.4 (Python 3.10+)
Database: SQLite (MVP) / PostgreSQL (production)
PluralKit: pluralkit.py + webhook dispatch
Hosting: Railway ($0-5/month) or Oracle Cloud (free)
Command Style: Slash commands (/translate 🎭)
```
---
## Installation Checklist (Fresh Project)
```bash
# Create project
mkdir vivi-speech-translator
cd vivi-speech-translator
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Core dependencies
pip install discord.py==2.6.4
pip install pluralkit==1.1.5
pip install aiosqlite==0.19.0 # MVP database
pip install aiohttp==3.9.0 # Webhook server
pip install emoji==2.11.0 # Emoji handling
pip install pydantic==2.5.0 # Data validation
pip install python-dotenv==1.0.0
# Optional (production)
pip install asyncpg==0.29.0 # PostgreSQL driver
pip install sentry-sdk==1.45.0 # Error tracking
# Testing
pip install pytest==7.4.0
pip install pytest-asyncio==0.23.0
```
---
## Database Quick Start
### MVP (SQLite)
```python
import aiosqlite
# Create database
async with aiosqlite.connect('emoji.db') as db:
await db.execute('''
CREATE TABLE IF NOT EXISTS emoji_mappings (
id INTEGER PRIMARY KEY,
emoji TEXT UNIQUE NOT NULL,
meaning TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
''')
await db.commit()
# Use in bot
async def learn_emoji(emoji_str, meaning):
async with aiosqlite.connect('emoji.db') as db:
await db.execute(
'INSERT INTO emoji_mappings (emoji, meaning) VALUES (?, ?)',
(emoji_str, meaning)
)
await db.commit()
```
### Production (PostgreSQL)
```python
import asyncpg
pool = await asyncpg.create_pool('postgresql://user:pwd@host/db')
async def learn_emoji(emoji_str, meaning):
async with pool.acquire() as conn:
await conn.execute(
'INSERT INTO emoji_mappings (emoji, meaning) VALUES ($1, $2)',
emoji_str, meaning
)
```
---
## PluralKit Integration Checklist
- [ ] Get system token: Run `pk;token` in Discord with PluralKit bot
- [ ] Store token in `.env`: `PLURALKIT_TOKEN=pk_...`
- [ ] Create webhook endpoint in your bot
- [ ] Register webhook URL with PluralKit API
- [ ] Set up webhook signature verification (use nacl library)
- [ ] Cache member list in memory on startup
- [ ] Update cache on webhook events (CREATE_MEMBER, UPDATE_MEMBER, DELETE_MEMBER)
**Webhook Endpoint Example:**
```python
from aiohttp import web
async def handle_pk_webhook(request):
body = await request.text()
signature = request.headers.get('X-Signature-Ed25519')
# Verify signature
if not verify_signature(body, signature):
return web.Response(status=401)
data = await request.json()
# Update emoji mappings based on member changes
return web.Response(status=200)
# Add to bot setup
app.router.add_post('/webhooks/pluralkit', handle_pk_webhook)
```
---
## Slash Commands vs Prefix Commands
### ✅ Use Slash Commands (Recommended)
```python
@bot.slash_command(name="learn")
async def learn(ctx, emoji: str, meaning: str):
"""Learn what an emoji means"""
await learn_emoji(emoji, meaning)
await ctx.respond(f"Learned: {emoji} = {meaning}")
# No MessageContent intent needed!
# Users see autocomplete, clearer UI
```
### ❌ Don't Use Prefix Commands
```python
@bot.command(name="learn")
async def learn(ctx, emoji: str, meaning: str):
"""Learn what an emoji means"""
# Requires MessageContent intent ❌
# Harder to use, poor UX
```
---
## Environment Variables (.env)
```bash
# .env (DO NOT COMMIT - add to .gitignore)
DISCORD_TOKEN=your_bot_token_here
PLURALKIT_TOKEN=pk_your_system_token_here
DATABASE_URL=postgresql://user:password@localhost/emojidb
WEBHOOK_SECRET=your_webhook_secret_from_pk
# Optional
SENTRY_DSN=https://...@sentry.io/...
LOG_LEVEL=INFO
```
**Add to .gitignore:**
```
.env
*.db
*.sqlite3
__pycache__/
venv/
```
---
## Bot Intents Configuration
```python
intents = discord.Intents.default()
intents.guilds = True # Required
intents.members = True # For member info
intents.message_content = False # NOT needed (use slash commands)
bot = discord.Bot(intents=intents)
```
**Don't enable message_content unless you parse raw text messages.**
---
## Permissions URL
```
https://discord.com/oauth2/authorize
?client_id=YOUR_CLIENT_ID
&scope=bot
&permissions=536996928
```
**Permissions included:**
- Read Messages/View Channels (1024)
- Send Messages (2048)
- Embed Links (16384)
- Read Message History (65536)
---
## Hosting Setup
### Railway (Recommended)
```bash
# Install Railway CLI
curl -fsSL https://railway.app/install.sh | bash
# Initialize
railway init
# Deploy from GitHub
# (auto-deploys when you push to main)
# View logs
railway logs
# Set environment variables
railway variables set DISCORD_TOKEN=...
```
### Cost
- First $5/month free credit
- $5 for 1GB RAM for bot
- $15 for PostgreSQL (or get free tier)
- **Total: $0-20/month for small bot**
### Oracle Cloud (Free Alternative)
- 4 CPU, 24GB RAM, 200GB storage VM (always free)
- Run bot + PostgreSQL on same VM
- Note: May delete instances after 60 days inactivity
- Requires Linux/Docker knowledge
---
## Common Commands During Development
```bash
# Run locally
python main.py
# Run tests
pytest
# Format code
black .
# Check types
mypy .
# View database (SQLite)
sqlite3 emoji.db
> SELECT * FROM emoji_mappings;
# View logs on Railway
railway logs -f
# Deploy to Railway
git push origin main # auto-deploys
```
---
## Anti-Patterns to Avoid
❌ Using Pycord (py-cord) - unmaintained since 2023
❌ Storing bot token in code - use `.env`
❌ Requesting MessageContent intent for slash commands
❌ Polling PluralKit API - use webhooks instead
❌ Synchronous database calls - use async/await
❌ Custom webhook signature verification - use nacl library
❌ Unbounded emoji history table - set expiry policies
---
## Testing Emoji Detection
```python
# test_emoji.py
import pytest
import emoji
def test_emoji_detection():
test_emoji = "😊"
assert emoji.is_emoji(test_emoji)
demojized = emoji.demojize(test_emoji)
assert demojized == ":smiling_face_with_smiling_eyes:"
@pytest.mark.asyncio
async def test_learn_emoji():
await learn_emoji("🎭", "happy performance")
meanings = await get_emoji_meanings("🎭")
assert "happy performance" in meanings
```
---
## Monitoring in Production
**Recommended additions after MVP:**
1. **Error Tracking:** Sentry for automatic error alerts
2. **Logging:** Structured logging to file or cloud
3. **Metrics:** Track command usage, emoji diversity, member growth
4. **Health Checks:** Endpoint that Railway monitors (`GET /health`)
```python
# Simple health check
@app.route('/health')
async def health(request):
return web.Response(text='OK', status=200)
```
---
## Troubleshooting Quick Fixes
| Problem | Solution |
|---------|----------|
| Bot doesn't start | Check DISCORD_TOKEN in .env |
| Slash commands not appearing | Call `await bot.sync_commands()` |
| Database locked error (SQLite) | Only one writer at a time - use PostgreSQL for scaling |
| PluralKit webhook not received | Check public URL is reachable, signature verification |
| Slow emoji lookups | Add database indices, implement Redis cache |
| Bot unresponsive | Check for sync calls blocking event loop |
---
## Next Steps
1. **Read full research:** [STACK.md](./STACK.md)
2. **Setup development environment** using checklist above
3. **Start with MVP:** SQLite, slash commands, Vivi user detection
4. **Implement PluralKit:** Webhook dispatch, member caching
5. **Test locally:** Private Discord server for development
6. **Deploy to Railway:** Connect GitHub, set environment variables
7. **Monitor in production:** Sentry, logs, metrics
8. **Iterate:** Gather feedback, add features
---
## Resources
- [discord.py Docs](https://discordpy.readthedocs.io/)
- [PluralKit API](https://pluralkit.me/api/)
- [Railway Getting Started](https://docs.railway.app/)
- [Pydantic Validation](https://docs.pydantic.dev/)
- [emoji Library](https://carpedm20.github.io/emoji/)

View File

@@ -0,0 +1,33 @@
# Research Documents
This directory contains research and analysis for the Vivi Speech Translator project.
## Documents
### PITFALLS.md
Comprehensive analysis of common mistakes in Discord bot development, with specific focus on:
- **PluralKit Integration Pitfalls** - Message detection, webhook reliability
- **Discord API Pitfalls** - Message content intent, rate limiting, permissions
- **Learning System Pitfalls** - Dictionary quality, teaching interface UX, scope creep
- **Multi-Server Issues** - Dictionary conflicts across servers
- **Emoji Handling** - Unicode edge cases, combining characters
- **Security** - Authorization, privilege escalation, data privacy
- **Translation Quality** - Making translations feel natural
- **Accessibility** - Text-heavy interfaces for users with Dysgraphia
- **Infrastructure** - Hosting and reliability
Each pitfall includes:
- What goes wrong (the problem)
- Warning signs (how to detect it)
- Prevention strategies (how to avoid it)
- Which phase should address it
**Top 8 Critical Pitfalls to Watch For:**
1. Message Detection Reliability (Phase 1)
2. Message Content Intent Architecture (Phase 1)
3. Dictionary Quality Degradation (Phase 3)
4. Teaching Interface Complexity (Phase 3)
5. Rate Limiting and Scaling (Phase 2+)
6. Emoji Edge Cases (Phase 2)
7. Authorization and Security (Phase 3)
8. Webhook Race Conditions (Phase 2+)

690
.planning/research/STACK.md Normal file
View File

@@ -0,0 +1,690 @@
# Stack Research: Vivi Speech Translator
**Last Updated:** January 29, 2025
**Research Scope:** Production-ready 2025 tech stack for Discord bot with PluralKit/Tupperbox integration
---
## Executive Summary
For the Vivi Speech Translator project, the recommended 2025 stack is **discord.py 2.6.4 (Python)** with **PostgreSQL/SQLite** for emoji mapping storage, **pluralkit.py** for PluralKit integration via webhook dispatch, and **Railway** or **Oracle Cloud** for hosting. This combination offers mature frameworks, proven ecosystem integration, and cost-effectiveness while avoiding deprecated or unmaintained projects.
---
## Discord Bot Framework
**Recommendation:** discord.py 2.6.4 (Python)
**Why:**
- **Actively Maintained:** Latest version 2.6.4 released October 8, 2025 with healthy release cadence (new versions every 3 months)
- **Mature Ecosystem:** 7+ years of development, largest Python Discord bot community, extensive documentation and third-party libraries
- **Slash Commands:** Built-in support for modern Discord interactions without requiring message content intent for command parsing
- **Async-First Design:** Native asyncio support essential for handling multiple concurrent API calls (PluralKit queries, webhook processing)
- **Production Proven:** Powers many enterprise Discord communities with robust error handling and performance
**Alternatives:**
- **Pycord (py-cord):** Fork of discord.py with enhanced UI components, but no new releases to PyPI in 12+ months - marked as inactive/discontinued as of 2025. Not recommended for greenfield projects.
- **discord.js (TypeScript/JavaScript):** Popular but slower than Python at CPU-bound tasks. Better for teams comfortable with Node.js ecosystem.
- **Serenity/Twilight (Rust):** Excellent performance but steep learning curve, overkill for a learning/utility bot, smaller community.
- **Go (discordgo):** Good performance but emoji/text processing libraries less mature than Python ecosystem.
**Confidence:** High - discord.py is the de facto standard for Python Discord bot development in 2025.
---
## Language
**Recommendation:** Python 3.10+
**Why:**
- **Rich Text Processing:** Python has the most mature emoji handling libraries (emoji 2.x, regex, unicode support)
- **Data Validation:** Pydantic ecosystem dominates for structured data (emoji mappings, system configs)
- **Community Resources:** Largest Discord bot community uses Python, easiest to find tutorials and debugging help
- **Rapid Prototyping:** Fast iteration on emoji detection/translation logic before optimization
- **Integration Libraries:** pluralkit.py, aiosqlite, and asyncpg all have high-quality Python implementations
**Version Specifics:**
- Minimum: Python 3.8 (discord.py requirement)
- Recommended: Python 3.10 or 3.11 (pattern matching, better async, better type hints)
- Support through: Python 3.12 confirmed by discord.py
**Alternatives:**
- **JavaScript/TypeScript:** discord.js is feature-complete, but text emoji processing slower. Consider if team prefers TypeScript for type safety.
- **Rust:** serenity/twilight offer 5-10x performance gains if emoji translation becomes CPU-bound with millions of mappings. Not needed initially.
- **Go:** discordgo is simpler than Rust but emoji libraries less mature than Python.
**Confidence:** High - Python is the optimal choice for this project's text processing and ecosystem needs.
---
## Database
**Recommendation:** **PostgreSQL 15+** (production/scaling) or **SQLite 3** (MVP/single-instance)
**Schema Overview:**
```sql
-- Global emoji dictionary
CREATE TABLE emoji_mappings (
id SERIAL PRIMARY KEY,
emoji TEXT NOT NULL UNIQUE,
meanings TEXT[] NOT NULL, -- array of translations
created_at TIMESTAMP DEFAULT NOW(),
confidence FLOAT DEFAULT 0.5,
usage_count INT DEFAULT 0
);
-- Per-server overrides (future feature)
CREATE TABLE server_overrides (
id SERIAL PRIMARY KEY,
server_id BIGINT NOT NULL,
emoji TEXT NOT NULL,
custom_meaning TEXT NOT NULL,
created_by BIGINT NOT NULL,
UNIQUE(server_id, emoji)
);
-- PluralKit system tracking
CREATE TABLE pk_systems (
id SERIAL PRIMARY KEY,
pk_system_id TEXT NOT NULL UNIQUE,
discord_user_id BIGINT NOT NULL,
last_synced TIMESTAMP DEFAULT NOW(),
member_count INT DEFAULT 0
);
-- Learning history for future model training
CREATE TABLE translation_history (
id SERIAL PRIMARY KEY,
emoji TEXT NOT NULL,
translation TEXT NOT NULL,
system_id BIGINT,
context TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
```
### Detailed Comparison
**PostgreSQL (Recommended for Production)**
**Advantages:**
- Handles complex queries for learning/analytics (emoji co-occurrence, translation frequency)
- Supports array types natively (efficient emoji->meanings mappings)
- JSONB support for extensible emoji metadata
- Scales to millions of emoji mappings across thousands of servers
- Transaction support ensures data consistency during learning updates
- Free tier available on Railway, Render, or self-hosted
**Setup:**
```bash
# Using asyncpg (async driver for discord.py)
pip install asyncpg
```
**Considerations:**
- Requires external database service if cloud-hosted ($5-15/month)
- Overkill for MVP with <10 servers, <1000 emoji mappings
- Network latency adds 5-50ms per query (mitigated with caching)
---
**SQLite (Recommended for MVP)**
**Advantages:**
- Zero setup: single file database, no server needed
- Free and embedded
- Fast for <10K emoji mappings and <100 concurrent users
- Migrate to PostgreSQL later without API changes (SQLAlchemy compatibility)
- Excellent for local development and testing
**Setup:**
```bash
# Using aiosqlite (async driver for discord.py)
pip install aiosqlite
```
**Limitations:**
- One writer at a time (concurrent updates block)
- No network access (bot must run on same machine)
- Not suitable if bot replicates across multiple servers
- No native array types (serialize to JSON)
**Use SQLite when:**
- MVP with single bot instance
- <1000 servers, <50K emoji mappings
- Learning phase before optimization
---
**Decision Framework:**
| Scenario | Recommendation | Rationale |
|----------|---|---|
| **MVP (Weeks 1-4)** | SQLite + aiosqlite | Fast iteration, zero ops overhead |
| **Public Bot (Month 2+)** | PostgreSQL + asyncpg | Scale across communities, learn patterns |
| **Enterprise (100+ servers)** | PostgreSQL + Redis cache layer | Millions of mappings, sub-100ms response |
**Confidence:** High - This structure mirrors successful Discord bot implementations (Logiq, MEE6, others).
---
## PluralKit Integration
### How PluralKit Works
PluralKit uses **Discord webhook proxying** to detect and rewrite messages:
1. User configures bracket patterns (e.g., `[Name]` for member "Name")
2. User sends: `[Name] 🎭💫 means "happy performance"`
3. PluralKit intercepts, detects brackets, replaces message under webhook as "Name" profile
4. **Result:** Message appears as if sent by that member's profile
### Detection Mechanisms
**Option A: Webhook Dispatch Events (Recommended)**
- PluralKit sends JSON webhooks when members are created/updated/deleted
- Webhook payload includes member ID, modified fields, system ID
- Signing token for security validation
- No message content parsing required
**Payload Example:**
```json
{
"id": "webhook-event-id",
"type": "UPDATE_MEMBER",
"system": "system-id",
"key": "member-id",
"data": {
"name": "Vivi",
"avatar_url": "https://..."
},
"signing_token": "verify-this"
}
```
**Option B: Message Content Intent (Fallback)**
- Listen for all messages, check for PluralKit proxy brackets
- Requires `MESSAGE_CONTENT` privileged intent
- Higher latency, more complex parsing
- Use only if webhook dispatch unavailable
### Implementation Approach for discord.py
```python
# 1. Create webhook listener endpoint
from aiohttp import web
async def pk_webhook_handler(request):
"""Receive PluralKit dispatch webhooks"""
data = await request.json()
signing_token = request.headers.get('X-Signature-Ed25519')
# Verify signature
if not verify_signature(data, signing_token, PK_SECRET):
return web.Response(status=401, text='Unauthorized')
# Handle event types
if data['type'] == 'UPDATE_MEMBER':
await update_emoji_mappings(data['system'], data['key'])
return web.Response(text='OK')
# 2. Register webhook with PluralKit API
async def register_pk_webhook():
"""Call PluralKit API to register webhook URL"""
async with aiohttp.ClientSession() as session:
headers = {'Authorization': PK_SYSTEM_TOKEN}
payload = {
'url': 'https://your-bot-domain.com/webhooks/pk',
'events': ['UPDATE_MEMBER', 'DELETE_MEMBER', 'CREATE_MEMBER']
}
await session.post(
'https://api.pluralkit.me/v2/systems/webhooks',
json=payload,
headers=headers
)
# 3. Query system info for Vivi
from pluralkit import Client
async def get_system_members(system_id):
"""Fetch Vivi's system members using pluralkit.py library"""
client = Client(token=PK_SYSTEM_TOKEN)
system = await client.get_system(system_id)
members = await client.get_system_members(system_id)
return members
# 4. Detect Vivi's messages
async def on_message(message):
"""Intercept all messages, check if from Vivi's system"""
if message.author.id == VIVI_USER_ID:
# Check if this is a proxied message using PluralKit API
try:
proxied = await client.get_message(message.id)
if proxied and proxied.system:
await handle_vivi_message(message)
except Exception:
pass # Not a proxied message
```
### Integration Libraries
- **pluralkit.py:** Client library for PluralKit API v2 (GitHub: PluralKit/PluralKit)
- Install: `pip install pluralkit`
- Handles auth, models, rate limiting
- Current version: 1.1.5+
### API Endpoints Needed
| Endpoint | Purpose | Frequency |
|----------|---------|-----------|
| `GET /systems/{id}` | Fetch system info | On startup, cache for 1 hour |
| `GET /systems/{id}/members` | List all members | On startup, update on webhook event |
| `GET /messages/{id}` | Query if message proxied | Per message (optional, high quota cost) |
| `POST /systems/webhooks` | Register webhook | On startup |
### Rate Limits
- Standard: 2 requests/second
- Burst: 10 requests/second
- Message endpoint: Separate 1 request/second quota
- Webhook dispatch: No rate limits, server-initiated
**Recommendation:** Cache member lists in-memory with 1-hour TTL, update only on webhook events. Avoid polling `GET /messages/{id}` for every message (expensive quota).
---
## Key Libraries
| Purpose | Library | Version | Installation | Notes |
|---------|---------|---------|--------------|-------|
| **Discord API** | discord.py | 2.6.4+ | `pip install discord.py` | Modern interactions, slash commands, intents |
| **PluralKit API** | pluralkit.py | 1.1.5+ | `pip install pluralkit` | Type-safe member/system models |
| **Async Database** | aiosqlite | 0.19.0+ | `pip install aiosqlite` | SQLite with asyncio (MVP) |
| **Async Database** | asyncpg | 0.29.0+ | `pip install asyncpg` | PostgreSQL with asyncio (production) |
| **Emoji Handling** | emoji | 2.11.0+ | `pip install emoji` | Convert emoji ↔ names, demojize/emojize |
| **Data Validation** | pydantic | 2.5.0+ | `pip install pydantic` | Validate emoji mappings, system configs |
| **HTTP Requests** | aiohttp | 3.9.0+ | `pip install aiohttp` | Async webhook server for PluralKit |
| **Environment Config** | python-dotenv | 1.0.0+ | `pip install python-dotenv` | Manage tokens, API keys safely |
| **JSON Handling** | jsonschema | 4.20.0+ | `pip install jsonschema` | Validate PluralKit webhook payloads |
### Why These Specific Libraries
**emoji 2.11.0+:**
- Supports full Unicode 15.0 emoji set (2025 standard)
- `emoji.demojize()` → emoji to `:name:` codes
- `emoji.emojize()` → codes to emoji
- Handles variant selectors and skin tone modifiers
- Example: `emoji.demojize("😊")``":smiling_face_with_smiling_eyes:"`
**pydantic 2.5.0+:**
- Runtime type validation (catch invalid emoji mappings before DB save)
- Auto-generate JSON schemas for API documentation
- Configuration management for bot settings
- Example:
```python
from pydantic import BaseModel, validator
class EmojiMapping(BaseModel):
emoji: str
meanings: list[str]
@validator('emoji')
def validate_emoji(cls, v):
if not emoji.is_emoji(v):
raise ValueError('Invalid emoji')
return v
```
**asyncpg over psycopg2:**
- Native async/await (required for discord.py bot loop)
- 2-3x faster than sync driver in async context
- Connection pooling built-in
- No threading overhead
---
## Hosting & Deployment
### Recommended Approach: Cloud PaaS (Hybrid Model)
**Primary Recommendation:** Railway + PostgreSQL (Managed)
**Setup:**
1. Discord bot code hosted on Railway
2. PostgreSQL database also on Railway
3. Public URL for webhook endpoint (PluralKit dispatch)
4. $5/month free credits, ~$0-10/month if modest usage
**Why Railway:**
- Automatic deployments from GitHub (git push = live update)
- Built-in PostgreSQL add-on ($15/month or included in free tier for small projects)
- Environment variables for secrets (tokens, API keys)
- Good uptime (99.95%), supports long-running processes
- Easy scaling if needed later
- Free domain with SSL certificate
**Setup Commands:**
```bash
# Install Railway CLI
curl -fsSL https://railway.app/install.sh | bash
# Login
railway login
# Initialize project
railway init
# Deploy
git push # automatic if GitHub connected
# View logs
railway logs
```
### Alternative Options
#### Option 2: Oracle Cloud (Free Tier) + Self-Hosted Bot
**Services:**
- Oracle Cloud Always-Free VM (4 CPU, 24GB RAM, 200GB storage) - runs bot + PostgreSQL
- Bot code in Docker container
- Systemd or supervisor for process management
**Advantages:**
- Completely free for life
- Plenty of resources for 1000+ emoji mappings
- Full control over environment
**Disadvantages:**
- Oracle may delete instances after 60 days of inactivity (unpredictable)
- Requires Linux/Docker knowledge
- Manual SSL certificate renewal (Let's Encrypt)
- No automatic redeploys
#### Option 3: Render (Free Tier Deprecated)
**Status:** Render removed free tier in 2024. Not recommended for budget projects.
#### Option 4: Self-Hosted on Raspberry Pi / Home Server
**Setup:**
- Raspberry Pi 5 ($80 hardware) or old laptop
- SQLite database
- Systemd service runner
- NGINX reverse proxy for webhooks
- Dynamic DNS for public URL (Cloudflare, DuckDNS)
**Cost:** Electricity only (~$10/year)
**Reliability:** Depends on home internet uptime
**Best for:** Learning/hobby projects, not community-facing bots
---
## Authentication & Permissions (Discord OAuth2/Intents)
### Required Intents
```python
intents = discord.Intents.default()
intents.guilds = True # Guild events (joins, member counts)
intents.members = True # Member info for presence checks
# intents.message_content = True # ONLY if using prefix commands or parsing raw messages
# For slash commands: NOT REQUIRED
bot = discord.Bot(intents=intents)
```
### Why NOT to Request Message Content Intent
**Slash Commands Don't Need It:**
- `/translate 🎭` → Works without message content intent
- Discord sends interaction object with full data
**Don't Use Prefix Commands:**
- Prefix commands (e.g., `!translate 🎭`) require message_content intent
- Adds compliance burden (privacy concern)
- Slash commands are standard for new bots in 2025
### Required Permissions
```
Bot Invite URL Permissions (decimal: 536996928):
- Read Messages/View Channels (1024)
- Send Messages (2048)
- Embed Links (16384)
- Read Message History (65536)
- Use Slash Commands (274877906944) # auto-included in interactions
Don't request:
- Manage Messages (edit other users' messages) - not needed
- Administrator - major red flag, users won't add bot
```
### OAuth2 Setup
1. **Register Bot in Discord Developer Portal:**
- Create application → Create bot user
- Copy bot token, store in `.env`
- Enable Intents: GUILD_MEMBERS, GUILDS
2. **Generate Invite Link:**
- Use Discord Permissions Calculator
- Share: `https://discord.com/oauth2/authorize?client_id={CLIENT_ID}&scope=bot&permissions=536996928`
3. **Bot Token Management:**
```python
import os
from dotenv import load_dotenv
load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')
bot.run(TOKEN)
```
### MFA Requirement
If bot has elevated permissions (marked with asterisk in permissions list) and added to guild with MFA enabled, **bot owner must enable 2FA on Discord account**. Plan for this before public release.
---
## Related Anti-patterns
### ❌ Using Pycord in 2025
**Why Not:**
- No new PyPI releases since November 2023 (12+ months)
- Actively marked as "discontinued" or "low priority maintenance"
- discord.py 2.6.4 is more stable and has better community support
- Migration from Pycord → discord.py requires minimal changes (compatible imports)
**If you inherit Pycord code:** Plan migration to discord.py, but it's not urgent.
---
### ❌ Storing Bot Token in Code
**Why Not:**
- GitHub will scan and revoke tokens automatically (good) but bot will be compromised
- Attacker gets full bot access, can impersonate, delete, spam communities
**Correct Approach:**
```python
# ✅ Use environment variables
from dotenv import load_dotenv
import os
load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')
# ✅ Git should ignore .env
echo ".env" >> .gitignore
```
---
### ❌ Requesting MessageContent Intent "Just in Case"
**Why Not:**
- Discord tracks intent abuse (compliance review for 100+ guilds)
- Shows poor design (should use slash commands instead)
- Privacy red flag for communities
- Adds API request latency for every message
**When you actually need it:**
- Prefix commands ONLY (not applicable for Vivi bot)
- Raw message parsing (not needed for emoji detection via webhooks)
- Chat bots that need to understand full conversation
---
### ❌ Syncing Emoji Mappings via REST API Polling
**Why Not:**
- PluralKit rate limits API calls (2 requests/sec)
- Polling every 30 seconds across 100 members = 200+ API calls/30s (throttled, errors)
- High latency (5+ second delay to sync new member)
**Correct Approach:**
- Use webhook dispatch (PluralKit pushes updates to you)
- Cache member list in-memory
- Update only on webhook events
---
### ❌ Building Custom PluralKit Webhook Signature Verification
**Why Not:**
- Ed25519 signature verification is cryptographically complex
- One mistake = accepts forged webhooks (security vulnerability)
**Correct Approach:**
```python
# Use library instead
from nacl.signing import VerifyKey
from nacl.exceptions import BadSignatureError
def verify_pk_signature(body: bytes, signature: str, public_key: str) -> bool:
try:
verify_key = VerifyKey(public_key)
verify_key.verify(body, bytes.fromhex(signature))
return True
except BadSignatureError:
return False
```
---
### ❌ Storing Full Emoji History Without Expiry
**Why Not:**
- Unbounded table growth (millions of rows/month)
- Query performance degrades over time
- Storage costs balloon on cloud databases
**Correct Approach:**
```sql
-- Archive old data monthly
INSERT INTO emoji_translation_archive
SELECT * FROM translation_history
WHERE created_at < NOW() - INTERVAL '3 months';
DELETE FROM translation_history
WHERE created_at < NOW() - INTERVAL '3 months';
CREATE INDEX idx_created_at ON translation_history(created_at);
```
---
### ❌ Using Synchronous Libraries (requests, sqlite3)
**Why Not:**
- Blocks Discord bot event loop
- One slow query = all slash commands freeze
- Unresponsive bot experience
**Correct Approach:**
```python
# ❌ DON'T
import sqlite3
conn = sqlite3.connect('emoji.db') # Blocks entire bot!
# ✅ DO
import aiosqlite
async with aiosqlite.connect('emoji.db') as db:
cursor = await db.execute('SELECT ...')
```
---
## Implementation Roadmap (Greenfield)
### Phase 1: MVP (Weeks 1-2)
- **Tech:** discord.py 2.6.4 + SQLite + slash commands
- **Features:**
- `/learn 🎭 "happy performance"` - store emoji → meaning
- `/translate 🎭💫 ...` - look up emoji meanings
- Detect Vivi's user ID, listen for messages
- **Testing:** Local development, manual testing in private Discord server
### Phase 2: PluralKit Integration (Weeks 3-4)
- Add webhook endpoint for PluralKit dispatch events
- Cache system members in-memory
- Detect "from Vivi's system" vs "from other users"
- Store per-system learned mappings
### Phase 3: Production Prep (Weeks 5-6)
- Migrate SQLite → PostgreSQL
- Deploy to Railway
- Set up logging and error tracking (Sentry, optional)
- Public bot invite link, documentation
### Phase 4: Scaling (Weeks 7+)
- Global emoji dictionary learning across all servers
- Per-server overrides for custom meanings
- Analytics dashboard (most common emoji, growth trends)
- Redis cache layer if needed
---
## Cost Breakdown (Monthly)
| Component | Free Option | Production Option | Cost |
|-----------|-------------|-------------------|------|
| **Bot Hosting** | Railway free tier | Railway | $0-5 |
| **Database** | SQLite (local) | PostgreSQL (Railway) | $0 (included) |
| **PluralKit API** | Free (webhook only) | Free | $0 |
| **Logging** (optional) | stdout | Sentry | $0-50 |
| **Custom Domain** | discord.bot.app | your-domain.com | $12+ |
| **TOTAL** | **$0** | **$0-20** | - |
---
## Summary
The 2025 recommended stack for Vivi Speech Translator is:
**Framework:** discord.py 2.6.4 (Python 3.10+)
**Database:** SQLite (MVP) → PostgreSQL (production) with asyncpg/aiosqlite
**PluralKit:** pluralkit.py library + webhook dispatch events
**Hosting:** Railway Cloud ($0-5/month)
**Libraries:** emoji 2.11.0+, pydantic 2.5.0+, aiohttp 3.9.0+
This stack prioritizes **maintainability** (discord.py is actively maintained), **ecosystem maturity** (largest Python Discord community), **cost-effectiveness** (free tier sufficient), and **reliability** (proven in production by 1000+ bots).
Start with MVP (SQLite, local development) to validate emoji detection logic, then migrate to PostgreSQL on Railway for multi-server deployment. Avoid Pycord (unmaintained), don't request message content intent (use slash commands instead), and leverage webhook dispatch for efficient PluralKit integration.
---
## References & Sources
1. [discord.py Official Docs](https://discordpy.readthedocs.io/) - Latest 2.6.4
2. [PluralKit API Reference](https://pluralkit.me/api/)
3. [PluralKit Webhook Dispatch](https://pluralkit.me/api/dispatch/)
4. [pluralkit.py Library Docs](https://pluralkit.readthedocs.io/)
5. [Discord Bot Hosting Guide 2025](https://www.mambahost.com/blog/discord-bot-hosting-free-vs-paid/)
6. [Discord Intents & OAuth2 Documentation](https://discord.com/developers/docs/topics/oauth2)
7. [Railway Cloud Platform](https://railway.app/)
8. [Pydantic v2 Documentation](https://docs.pydantic.dev/)
9. [emoji Library (carpedm20)](https://carpedm20.github.io/emoji/)
10. [SQLite vs PostgreSQL Comparison 2025](https://friendify.net/blog/discord-bot-database-choices-sqlite-postgres-mongo-2025.html)

View File

@@ -0,0 +1,411 @@
# Research Summary: Vivi Speech Translator
**Synthesized from:** STACK.md, FEATURES.md, ARCHITECTURE.md, PITFALLS.md
**Date:** January 29, 2025
**Status:** Ready for Requirements Definition
---
## Executive Summary
Vivi Speech Translator is a rule-based emoji-to-text translation Discord bot built for a specific user system (Vivi via PluralKit). The recommended 2025 stack is **discord.py 2.6.4 + PostgreSQL/SQLite + webhook-driven PluralKit integration**, prioritizing simplicity, reliability, and community-driven learning over complex AI. The core differentiator is transparent, learnable translation with strong accessibility for users with dysgraphia—the bot becomes more valuable as users teach it emoji meanings, creating positive network effects.
The project succeeds by staying focused: detect Vivi's PluralKit-proxied messages, parse emoji sequences, translate via a persistent dictionary, and enable users to grow that dictionary through simple commands. Avoid context inference, cross-Discord generalization, and real-time chat simulation. This narrowly-scoped approach maximizes shipping speed while maintaining high confidence in architectural decisions.
**Key Risk:** PluralKit webhook detection is load-bearing. Message detection must be bulletproof (Phase 1) before scaling. Secondary risk: keeping the teaching interface simple enough for a user with dysgraphia to adopt comfortably.
---
## Key Findings
### From STACK.md: Technology Recommendations
**Recommended Stack (2025):**
| Component | Choice | Rationale |
|-----------|--------|-----------|
| **Language** | Python 3.10+ | Richest emoji/text processing libraries, largest Discord bot community |
| **Framework** | discord.py 2.6.4 | Actively maintained (October 2025), mature 7-year ecosystem, async-first, native slash commands |
| **Database (MVP)** | SQLite 3 + aiosqlite | Zero setup, single file, sufficient for MVP testing (<10K emoji mappings) |
| **Database (Prod)** | PostgreSQL 15+ + asyncpg | Scales to millions of mappings, native array types, connection pooling, $0-15/mo on Railway |
| **PluralKit Integration** | pluralkit.py + webhook dispatch | Use event-driven webhooks (instant, free) vs API polling (expensive, slow) |
| **Hosting** | Railway Cloud | $0-5/mo free tier, auto-deploys from Git, built-in PostgreSQL, public webhook URL for PluralKit |
| **Key Libraries** | emoji 2.11.0, pydantic 2.5.0, aiohttp 3.9.0 | Unicode 15.0 support, async-native, data validation |
**Critical Avoids:**
- ❌ Pycord (py-cord) — Inactive since 2023, no PyPI releases
- ❌ Message content intent as primary architecture — Design for slash commands, treat intent as optional
- ❌ REST API polling for PluralKit — Use webhook dispatch instead (rate limits: 2 req/sec vs unlimited webhooks)
- ❌ Synchronous database libraries (sqlite3, psycopg2) — Block bot event loop; use aiosqlite/asyncpg
**Confidence:** **VERY HIGH** — All recommendations are current, production-proven, and community-standard in early 2025.
---
### From FEATURES.md: What to Build
**Table Stakes (Must-Have):**
- Message detection + emoji parsing
- Reply/response infrastructure
- Slash command interface (not prefix commands)
- Per-server configuration (auto vs on-demand mode)
- Rate limiting + error handling
**Differentiators (Should-Have):**
- Learning system: `/teach emoji meaning` → stores in database
- Emoji sequence detection (e.g., "👩‍💻📱" = compound concept)
- Query system: `/meaning emoji` or `/what emoji`
- Correction system: `/correct emoji new_meaning`
- Reaction-based feedback (✅/❌ on translations)
- Accessibility: plain text output, no emoji-only responses, visual confirmation
**PluralKit Integration (Critical for Scope):**
- Detect webhook proxy via `message.webhook_id`
- Verify member_id via `GET /v2/messages/{id}` API
- Enable "Vivi says: [translation]" style responses
**Never Build (Out of Scope):**
- ❌ Context-based inference ("infer emoji meaning from conversation")
- ❌ Cross-Discord emoji translation ("same meaning everywhere")
- ❌ Real-time chat simulation ("bot generates new emoji sequences")
- ❌ Full NLP context analysis ("understand subtle tone shifts")
**MVP Feature Set (Phases 1-3):**
- Message detection & emoji parsing
- Rule-based translation (no ML)
- `/teach`, `/meaning`, `/correct` commands
- Auto/on-demand toggle per server
- Accessible output (plain text, emoji names)
**Roadmap Implication:** Build in layers. Phase 1-2 deliver value (users see translations). Phase 3 enables growth (users can teach). Phase 4+ adds refinement (caching, stats, multi-server overrides).
**Confidence:** **HIGH** — Feature research grounded in Discord bot best practices, accessibility standards, and PluralKit integration patterns.
---
### From ARCHITECTURE.md: Component Design
**7-Component System:**
1. **Discord Client** — Maintains WebSocket, initializes event loop
2. **Message Event Handler** — Filters for webhook, queries PluralKit, verifies Vivi
3. **Emoji Parser** — Extracts emoji sequences via regex, preserves order
4. **Translation Engine** — Looks up emoji meanings, composes natural language
5. **Database Layer** — Async SQLAlchemy + PostgreSQL/SQLite
6. **Command Handler (Cogs)** — Teaching, configuration, queries
7. **Configuration Layer** — Environment variables for secrets
**Data Flow (Simplified):**
```
Vivi's Message (via webhook)
↓ [PluralKit detection]
Emoji Parser (regex extraction)
↓ [order-preserving]
Database Lookup (O(1) via index)
↓ [emoji→meaning]
Translation Composition
↓ [natural language]
Discord Reply
```
**Database Schema (Two Core Tables):**
- **emoji_dictionary**: emoji_string (PK) → meaning + metadata (created_at, updated_by, confidence)
- **server_configuration**: guild_id (PK) → auto_translate (boolean) + created_at
**Key Design Decisions:**
- Global shared emoji dictionary (Phase 1-3) — simplifies MVP; per-server overrides deferred to Phase 4
- Async-first (aiosqlite/asyncpg) — prevents blocking bot's event loop
- Primary key on emoji_string, secondary index on custom_emoji_id — enables O(1) lookups
- Webhook detection first, API verification second — reduces API calls, catches non-PluralKit webhooks
**Suggested Build Order (5 Phases):**
1. **Phase 1 (Weeks 1-2):** Foundation — Discord client + PluralKit detection + database setup
2. **Phase 2 (Weeks 3-4):** Emoji parsing & translation — regex + lookup + reply formatting
3. **Phase 3 (Weeks 5-6):** Teaching system — `/teach`, `/meaning`, `/correct` commands
4. **Phase 4 (Week 7):** Per-server config — auto/on-demand toggle, `/config` command
5. **Phase 5 (Week 8+):** Polish — caching, logging, edge cases, error handling
**Scaling Path:**
- **MVP (Single bot, <10 servers):** SQLite, local development
- **Production (100-1000 servers):** PostgreSQL on Railway, connection pooling
- **Enterprise (1000+ servers):** Add Redis caching layer, implement Discord sharding
**Confidence:** **HIGH** — Architecture mirrors successful production Discord bots (MEE6, Logiq, etc.). Component boundaries are clean, async patterns are standard.
---
### From PITFALLS.md: Risks to Prevent
**Top 8 Pitfalls with Phase Assignment:**
1. **Message Detection Reliability (Phase 1 - CRITICAL)**
- Risk: False positives (translates non-Vivi messages) or false negatives (misses Vivi)
- Cause: Mixing webhook detection methods, edge case in PluralKit proxying
- Prevention: Use webhook creator ID as source of truth, cache member names, test reproxy edge cases, log failures
- Cost if ignored: Bot unreliable from day one, loses user trust
2. **Message Content Intent Denial (Phase 1 - CRITICAL)**
- Risk: Bot designed for passive message scanning; approval denied at 75 servers
- Cause: Assuming Discord approval is guaranteed
- Prevention: Design for slash commands first (`/translate emoji`), treat message content intent as optional
- Cost if ignored: Architectural rewrite mid-project
3. **Dictionary Quality Degradation (Phase 3 - HIGH)**
- Risk: User-taught emoji meanings become nonsensical (typos, trolls, conflicts)
- Cause: No validation, no audit trail, no approval workflow
- Prevention: Validate meaning length/content, log every change, flag conflicts, require mod approval for shared meanings
- Cost if ignored: Translations become unreliable by month 2-3
4. **Teaching Interface Too Complex (Phase 3 - HIGH)**
- Risk: Vivi (with dysgraphia) avoids using teaching system; feature becomes unused
- Cause: Text-heavy commands, complex syntax, no visual confirmation
- Prevention: Ultra-simple commands (`/teach emoji meaning`), show emoji in response, keep responses under 2 sentences
- Cost if ignored: Bot cannot learn, static dictionary limits usefulness
5. **Rate Limiting (Phase 2+ - MEDIUM)**
- Risk: Bot goes silent during peak usage (Discord or PluralKit API limits hit)
- Cause: Naive request patterns, no caching, no exponential backoff
- Prevention: Cache emoji translations, batch lookups, implement exponential backoff, monitor rate limit headers
- Cost if ignored: Intermittent outages, poor user experience
6. **Emoji Parsing Edge Cases (Phase 2 - MEDIUM)**
- Risk: Complex emoji (skin tones, ZWJ sequences, variation selectors) break parsing
- Cause: Naive string operations, incorrect regex patterns
- Prevention: Use emoji library (not manual regex), normalize input (NFD), test with families/skin tones/flags
- Cost if ignored: Some emoji don't translate or get corrupted
7. **Authorization & Security (Phase 3 - HIGH)**
- Risk: Non-mods can teach emoji, trolls corrupt dictionary, no audit trail
- Cause: No permission checks, no input validation, no logging
- Prevention: Whitelist who can teach (Vivi + trusted), validate input, log everything, support `/undo` or revert
- Cost if ignored: Dictionary spam, loss of data integrity
8. **Webhook Race Conditions (Phase 2+ - MEDIUM)**
- Risk: Vivi edits her message while bot edits its translation; both fail or corrupt
- Cause: Simultaneous edits via same webhook
- Prevention: Post new translation instead of editing; queue requests with 1-sec delay if edit detected
- Cost if ignored: Occasional translation failures and message corruption
**Confidence:** **MEDIUM-HIGH** — Pitfalls are well-documented in Discord bot literature. Phase assignments are defensible but require validation during planning.
---
## Implications for Roadmap
### Suggested Phase Structure (5 Phases, ~8 Weeks)
**Phase 1: Foundation (Weeks 1-2) — Detect Vivi**
- **Goal:** Prove we can reliably detect Vivi's PluralKit-proxied messages
- **Features:**
- Discord bot initialization (discord.py, intents, token)
- Webhook detection (`message.webhook_id`)
- PluralKit API verification (`GET /v2/messages/{id}`)
- Member ID verification (compare to Vivi's ID)
- Database schema + tables (emoji_dictionary, server_configuration)
- **Deliverable:** Bot logs every Vivi message to console (doesn't respond yet)
- **Critical Pitfalls to Avoid:** Message detection reliability, message content intent design, authorization design
- **Research Needed:** ❌ None — STACK.md and PITFALLS.md are definitive
- **Success Criteria:**
- ✅ Bot detects Vivi's messages with >99% accuracy
- ✅ No false positives (ignores non-Vivi webhooks)
- ✅ Handles reproxy, edits, and DMs correctly
**Phase 2: Emoji Parsing & Translation (Weeks 3-4) — Make Vivi Understood**
- **Goal:** Turn emoji into natural language; deploy MVP
- **Features:**
- Emoji parser (regex for Unicode + custom emoji)
- Database lookups (O(1) via primary key)
- Response composition ("Vivi says: [translation]")
- Auto-translate toggle per server
- Basic error handling (unknown emoji, rate limits)
- **Deliverable:** Bot translates Vivi's emoji in channels and DMs
- **Critical Pitfalls to Avoid:** Emoji edge cases, rate limiting, webhook race conditions
- **Research Needed:** ❌ None — ARCHITECTURE.md covers this thoroughly
- **Success Criteria:**
- ✅ All Unicode emoji parse correctly (including skin tones, ZWJ)
- ✅ Custom Discord emoji supported
- ✅ Translations appear in <500ms
- ✅ Accessible format (plain text, no emoji-only responses)
**Phase 3: Teaching System (Weeks 5-6) — Enable Growth**
- **Goal:** Let users and Vivi teach emoji meanings; enable sustainable growth
- **Features:**
- `/teach emoji meaning` command
- `/meaning emoji` or `/what emoji` query
- `/correct emoji new_meaning` updates
- Input validation (length, content, duplicates)
- Audit trail (logged changes with user_id, timestamp)
- Reaction-based feedback (✅/❌ on translations)
- Permission checks (whitelist who can teach)
- **Deliverable:** Users can teach emoji via simple one-liner commands; bot confirms with visual emoji
- **Critical Pitfalls to Avoid:** Dictionary degradation, interface complexity (dysgraphia UX), authorization bypass, emoji conflicts
- **Research Needed:** ⚠️ Potential research needed on Vivi's specific dysgraphia constraints and optimal UI patterns
- **Success Criteria:**
- ✅ Vivi finds teaching interface usable (simple syntax, visual confirmation)
- ✅ 50+ emoji taught in first week of beta
- ✅ No troll edits (proper permission checks)
- ✅ Audit trail enables revert if needed
**Phase 4: Per-Server Configuration & Scaling (Week 7) — Customize & Optimize**
- **Goal:** Let servers customize translation behavior; add basic caching
- **Features:**
- `/config auto-translate [on|off]` per-server toggle
- `/translate emoji` on-demand command
- Redis caching layer (optional, for hot emoji)
- PostgreSQL migration (if MVP showed >1000 emoji, scaling needed)
- Basic statistics (`/emoji-stats`)
- **Deliverable:** Different servers can choose auto vs on-demand translation; bot performance optimized
- **Critical Pitfalls to Avoid:** Global dictionary conflicts (document limitation; defer per-server overrides to Phase 5+)
- **Research Needed:** ⚠️ Performance profiling may reveal caching needs earlier than expected
- **Success Criteria:**
- ✅ Servers can customize behavior via commands
- ✅ Bot remains responsive at 100+ servers with 1000+ emoji
- ✅ Average response time <250ms (including PluralKit API call)
**Phase 5: Polish & Production Hardening (Week 8+) — Stabilize**
- **Goal:** Make bot production-ready with comprehensive error handling, logging, monitoring
- **Features:**
- Structured logging (all errors, API calls, performance metrics)
- Sentry or equivalent error tracking
- Graceful degradation (serve cached meanings if DB down)
- Edge case handling (message edits, deletions, permission changes)
- Documentation and runbooks
- **Deliverable:** Production-grade bot with <99% uptime, full observability
- **Research Needed:** ❌ None — standard DevOps practices
- **Success Criteria:**
- ✅ <0.1% error rate on translations
- ✅ All errors logged and alertable
- ✅ Can diagnose issues within 5 minutes from logs
---
## Research Flags & Validation Needs
### High-Confidence Areas (Skip Deeper Research)
- **Stack:** discord.py 2.6.4, SQLite→PostgreSQL, Railway hosting — all production-proven
- **Architecture:** Component design, data flows, build order — standard Discord bot patterns
- **PluralKit Integration:** Webhook dispatch vs API polling tradeoff is well-documented
### Medium-Confidence Areas (Validate During Planning)
- **Phase 3 UX:** Dysgraphia accessibility — validate teaching interface usability with Vivi early
- **Phase 2 Performance:** Emoji parser edge cases — comprehensive test suite needed before Phase 2
- **Phase 4 Scaling:** PostgreSQL migration point — may happen sooner/later than expected based on emoji volume
### Areas Requiring Phase-Specific Research
- **Phase 3:** Optimal teaching UX for dysgraphia (interview Vivi, iterate prototype)
- **Phase 4:** Per-server override system design (if pursued; currently deferred to Phase 5+)
- **Phase 5:** Sentry configuration, structured logging patterns (standard practice, low risk)
---
## Confidence Assessment
| Area | Level | Rationale | Gaps |
|------|-------|-----------|------|
| **Tech Stack** | ⭐⭐⭐ VERY HIGH | discord.py 2.6.4, SQLite/PostgreSQL, Railway all production-standard in 2025; no experimental choices | None — all recommendations are proven |
| **Architecture** | ⭐⭐⭐ VERY HIGH | Component design mirrors MEE6, Logiq, other production bots; async patterns well-documented | None — patterns are industry-standard |
| **Features & MVP Scope** | ⭐⭐⭐ HIGH | Rule-based learning is transparent, debuggable, and explicitly preferred over ML; feature scope is tight | Dysgraphia UX needs validation; confirm Vivi's preferences |
| **Pitfalls** | ⭐⭐ MEDIUM-HIGH | Most pitfalls are documented in Discord bot literature; prioritization is defensible | Message detection reliability needs testing; rate limiting impact unknown until scale testing |
| **Roadmap Phases** | ⭐⭐⭐ HIGH | Build order is logical (detection → translation → teaching → config → polish); each phase delivers value | Phase 3 timing may shift based on Vivi's teaching interface feedback |
| **PluralKit Integration** | ⭐⭐⭐ VERY HIGH | Webhook dispatch approach is efficient, well-documented; API endpoints are stable | None — integration is straightforward |
| **Accessibility** | ⭐⭐ MEDIUM | General accessibility principles are sound; dysgraphia-specific UX patterns need user testing | Vivi's exact preferences (command aliases, visual feedback styles, response length) unknown |
**Overall Confidence: ⭐⭐⭐ HIGH**
This project has clear requirements, proven technology choices, and manageable scope. The main confidence gap is **teaching interface UX for dysgraphia** — validate early in Phase 3 planning with Vivi's direct feedback.
---
## Gaps to Address During Requirements Definition
1. **Vivi's Teaching UX Preferences** (High Priority)
- What syntax feels easiest? (`/teach emoji meaning` vs `/learn emoji meaning` vs `/add emoji meaning`?)
- How should bot confirm back? (Show emoji only? Emoji + text? How many words max?)
- Are reaction buttons easier than typing? (React ✓ vs type "yes")
- What emoji naming system? (Unicode names? Custom? Both?)
- **Action:** Interview Vivi early in requirements phase; prototype 2-3 UI patterns
2. **Exact Emoji Coverage** (Medium Priority)
- Does Vivi use only standard Unicode emoji, or custom Discord emoji, or both?
- Are there specific emoji types (families, flags, keycaps) that are critical?
- Does she use ZWJ sequences (👨‍👩‍👧)?
- **Action:** Ask Vivi to share examples of emoji she commonly uses
3. **Moderation & Teaching Permissions** (Medium Priority)
- Who should be allowed to teach emoji? (Only Vivi? Vivi + alters? Trusted friends? Everyone?)
- How should conflicts be resolved if two people teach different meanings for same emoji?
- Is there a mod team to approve meanings, or trust-first approach?
- **Action:** Clarify with Vivi's community (or proxy representative)
4. **Multi-System Scope** (Low Priority)
- Is this bot only for Vivi's system, or will it serve multiple DID systems?
- If multiple systems, how do we handle different emoji meanings per system?
- **Action:** Clarify scope; if multi-system, defer to Phase 4+ for per-system overrides
5. **Response Format Preferences** (Low Priority)
- Should bot translate emoji-only, or include surrounding context?
- Example: Vivi posts "😷 2⃣ 🍑 ❌" → Bot says "sick, two, peach, no" OR "Vivi: sick, feeling crappy about two things, and definitely not"?
- **Action:** Test both formats with Vivi; let her choose style
---
## Cost Breakdown & Go/No-Go Criteria
**MVP Monthly Cost:**
- Bot Hosting (Railway): $0 (free tier)
- Database (SQLite, local): $0
- **Total: $0**
**Production Monthly Cost (100+ servers):**
- Bot Hosting (Railway): $5
- PostgreSQL (Railway): $15
- Logging/Monitoring (optional): $0-50
- **Total: $20-70**
**Go Criteria (Phase 1 Completion):**
- ✅ Message detection >99% accurate
- ✅ No false positives or negatives
- ✅ Database queries <50ms
- ✅ Code reviewed and documented
**No-Go Criteria (Stop if True):**
- ❌ PluralKit API rate limits prevent scaling to 10+ servers
- ❌ Discord denies message content intent AND no viable slash command path
- ❌ Vivi finds teaching interface unusable after Phase 3 testing
---
## Sources & References
### Research Files (Synthesized)
- `.planning/research/STACK.md` — Technology recommendations, rationale, alternatives
- `.planning/research/FEATURES.md` — Feature scope, learning approach, accessibility, anti-features
- `.planning/research/ARCHITECTURE.md` — Component design, data flows, database schema, scaling
- `.planning/research/PITFALLS.md` — Common mistakes, prevention strategies, phase assignments
### External References
- [discord.py Documentation](https://discordpy.readthedocs.io/) — Latest 2.6.4
- [PluralKit API Reference](https://pluralkit.me/api/)
- [Railway Cloud Platform](https://railway.app/)
- [Discord Bot Security Best Practices 2025](https://friendify.net/blog/discord-bot-security-best-practices-2025.html)
- [Accessibility for Dysgraphia](https://top5accessibility.com/blog/orthographic-dyslexia-dysgraphia/)
---
## Next Steps for Roadmap Creator
1. **Read this SUMMARY.md** (5 min) — Understand the synthesis
2. **Review PITFALLS.md** (15 min) — Understand phase-specific risks
3. **Clarify gaps with Vivi** (async) — Teaching UX, emoji coverage, permissions
4. **Map phases to sprints** — Assign timelines, team, success criteria
5. **Create requirements document** — Expand phase descriptions into user stories
6. **Begin Phase 1 development** — Foundation: Discord client + PluralKit detection
---
**Status:** ✅ Ready for Requirements Definition
**Synthesized by:** GSD Research Synthesizer
**Date:** January 29, 2025
**Confidence Level:** ⭐⭐⭐ HIGH
Proceed to roadmap creation. Prioritize Vivi interview for teaching UX validation.