Synthesized research findings from 4 parallel researcher agents: Key Findings: - Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration - Architecture: 7-component system with clear separation of concerns, async-native - Features: Rule-based learning system starting simple, avoiding context inference and ML - Pitfalls: 8 critical risks identified with phase assignments and prevention strategies Recommended Approach: - 5-phase build order (detection → translation → teaching → config → polish) - Focus on dysgraphia accessibility for teaching interface - Start with message detection reliability (Phase 1, load-bearing) - Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+ Confidence Levels: - Tech Stack: VERY HIGH (all production-proven, no experimental choices) - Architecture: VERY HIGH (mirrors successful production bots) - Features: HIGH (tight scope, transparent approach) - Roadmap: HIGH (logical phase progression with value delivery) Gaps to Address in Requirements: - Vivi's teaching UX preferences (dysgraphia-specific patterns) - Exact emoji coverage and naming conventions - Moderation/teaching permissions model - Multi-system scope and per-system customization needs Ready for requirements definition and roadmap creation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
775 lines
39 KiB
Markdown
775 lines
39 KiB
Markdown
# Architecture Research: Vivi Speech Translator
|
||
|
||
## Overview
|
||
|
||
Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.
|
||
|
||
This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.
|
||
|
||
---
|
||
|
||
## Core Components
|
||
|
||
### 1. Discord Client
|
||
|
||
**Responsibility:** Establish and maintain the connection to Discord's API and WebSocket.
|
||
|
||
**Key Details:**
|
||
- Uses `discord.Client` or `discord.ext.commands.Bot` from discord.py library
|
||
- Requires `Intents` configuration to specify which events the bot listens for:
|
||
- `message_content` intent: Required to read message text (privileged intent, requires approval)
|
||
- `guilds` intent: Track guild membership and changes
|
||
- `direct_messages` intent: Listen for DMs
|
||
- `dm_messages` intent: Read DM message content
|
||
- Initializes on startup and runs the main event loop via `client.run(token)`
|
||
- Handles connection failures and automatic reconnection
|
||
|
||
**Why This Matters:** Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.
|
||
|
||
---
|
||
|
||
### 2. Message Event Handler
|
||
|
||
**Responsibility:** Receive all messages, filter for relevance, and route to downstream processors.
|
||
|
||
**Key Details:**
|
||
- Implements `on_message` event in discord.py (async callback)
|
||
- Filters for:
|
||
1. **Webhook Detection:** Check if `message.webhook_id` is not None (indicates a proxied message)
|
||
2. **PluralKit Verification:** Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
|
||
3. **Vivi Detection:** Check if the `member_id` in the PluralKit response matches Vivi's registered member ID
|
||
4. **Bot Self-Filter:** Ignore messages from Vivi Speech Translator bot itself
|
||
- Routes confirmed Vivi messages to the Emoji Parser
|
||
- Handles both guild channels and DMs
|
||
|
||
**PluralKit Detection Approach:**
|
||
When a message is received, the bot can query the PluralKit API using the message ID:
|
||
```
|
||
GET https://api.pluralkit.me/v2/messages/{message_id}
|
||
```
|
||
This returns a Message object containing:
|
||
- `member`: The member object that proxied the message (contains member_id, name, avatar, etc.)
|
||
- `sender`: The original user ID that sent the command (the account owner)
|
||
- `system`: The system that manages the members
|
||
- `timestamp`: When the message was sent
|
||
- `guild`: The guild ID where the message was sent
|
||
- `channel`: The channel ID where the message was sent
|
||
|
||
By checking if `response.member.id == vivi_member_id`, the bot can verify Vivi specifically sent the message.
|
||
|
||
**Rate Limiting:** PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.
|
||
|
||
---
|
||
|
||
### 3. Emoji Parser
|
||
|
||
**Responsibility:** Extract and categorize emojis from a message into a structured sequence.
|
||
|
||
**Key Details:**
|
||
- Receives the confirmed Vivi message text from the Message Event Handler
|
||
- Uses regex patterns to extract:
|
||
1. **Unicode Emojis:** Standard emoji characters (😷, ❌, etc.)
|
||
- Pattern: `\p{Extended_Pictographic}` (matches full Unicode emoji range)
|
||
- Alternative Python regex: `([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)`
|
||
2. **Custom Server Emojis:** Discord custom emoji format `<:emoji_name:emoji_id>` or `<a:emoji_name:emoji_id>` (for animated)
|
||
- Pattern: `<a?:[^:\s]+:\d+>`
|
||
- Preserves order of emojis as they appear left-to-right
|
||
- Returns a structured list like: `[{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}]`
|
||
- Handles edge cases:
|
||
- Emoji skin tone modifiers
|
||
- Zero-width joiners (ZWJ sequences like family emojis)
|
||
- Emoji variations
|
||
|
||
**Why This Order Matters:** The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.
|
||
|
||
---
|
||
|
||
### 4. Translation Engine
|
||
|
||
**Responsibility:** Convert emoji sequences into natural language using the emoji dictionary.
|
||
|
||
**Key Details:**
|
||
- Receives structured emoji list from Emoji Parser
|
||
- For each emoji:
|
||
1. Look up its meaning in the Emoji Dictionary (database)
|
||
2. Handle three cases:
|
||
- **Known emoji:** Include its meaning in output
|
||
- **Unknown emoji:** Display the emoji itself with a placeholder or skip
|
||
- **Custom emoji:** Look up by custom emoji ID in database
|
||
- Generates natural language output:
|
||
- If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
|
||
- If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
|
||
- If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the `/teach` command."
|
||
- Considers emoji context (e.g., combination of emojis might have a specific meaning)
|
||
|
||
**Output Format:** The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).
|
||
|
||
---
|
||
|
||
### 5. Database Layer
|
||
|
||
**Responsibility:** Store and retrieve persistent data (emoji dictionary and server configurations).
|
||
|
||
**Key Details:**
|
||
- **Tech Stack:** SQLAlchemy ORM with PostgreSQL for production reliability
|
||
- **Async Support:** Use `sqlalchemy.ext.asyncio` or `asyncpg` to avoid blocking the Discord event loop
|
||
- **Initialization:** Override `Bot.start()` or use a `setup_hook` to connect to database on startup
|
||
- **Connection Pooling:** Configure connection pool to handle concurrent requests from message handlers
|
||
|
||
**Two Core Tables:**
|
||
|
||
1. **emoji_dictionary**
|
||
- `emoji_string` (TEXT, PRIMARY KEY): The emoji character(s) or custom emoji format
|
||
- `custom_emoji_id` (BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)
|
||
- `meaning` (TEXT): The learned meaning
|
||
- `created_at` (TIMESTAMP): When first learned
|
||
- `updated_at` (TIMESTAMP): Last update time
|
||
- `updated_by_user_id` (BIGINT, NULLABLE): User ID of who taught/corrected this
|
||
- `updated_by_member_id` (TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)
|
||
- `created_in_guild` (BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)
|
||
- Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
|
||
|
||
2. **server_configuration**
|
||
- `guild_id` (BIGINT, PRIMARY KEY): Discord server ID
|
||
- `auto_translate` (BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require `/translate` command
|
||
- `created_at` (TIMESTAMP): When server config created
|
||
- Updated by Configuration Command Handler
|
||
|
||
**Important Design Decisions:**
|
||
- **Global dictionary:** Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
|
||
- **Per-server config:** Each server has its own settings (auto vs. on-demand mode).
|
||
- **User attribution:** Track who taught each emoji for transparency and conflict resolution.
|
||
- **No per-server emoji variants:** The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.
|
||
|
||
---
|
||
|
||
### 6. Command Handler (Teaching & Configuration)
|
||
|
||
**Responsibility:** Process bot commands for teaching emojis and configuring server behavior.
|
||
|
||
**Key Details:**
|
||
- **Tech:** Discord.py `commands.Cog` extension for modular command organization
|
||
- **Commands to Implement:**
|
||
|
||
1. `/teach <emoji_sequence> <meaning>`
|
||
- Extract emojis from the sequence using the Emoji Parser
|
||
- Insert or update each emoji in the database
|
||
- Confirm: "Learned: 😷 = sick, 2️⃣ = two, etc."
|
||
- Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
|
||
|
||
2. `/forget <emoji>`
|
||
- Delete emoji from dictionary
|
||
- Confirm deletion
|
||
|
||
3. `/meaning <emoji>`
|
||
- Look up and reply with the meaning of a specific emoji
|
||
- If unknown, reply: "I don't know that one yet."
|
||
|
||
4. `/config auto-translate <on|off>`
|
||
- Update `server_configuration.auto_translate` in database
|
||
- Only server admins can change this
|
||
- Requires guild context (won't work in DMs)
|
||
|
||
5. `/translate <emoji_sequence>` (On-demand mode)
|
||
- Manually trigger translation of an emoji sequence
|
||
- Works in both channels and DMs
|
||
|
||
- **Error Handling:**
|
||
- Graceful failures if database is unavailable
|
||
- Clear user feedback for invalid emoji sequences
|
||
- Require proper permissions for sensitive commands
|
||
|
||
---
|
||
|
||
### 7. Configuration Layer
|
||
|
||
**Responsibility:** Load bot configuration (token, database connection string, etc.) at startup.
|
||
|
||
**Key Details:**
|
||
- Use environment variables for secrets: `DISCORD_TOKEN`, `DATABASE_URL`, `PLURALKIT_TOKEN` (optional, for user-specific API calls)
|
||
- Configuration file (e.g., `config.json` or `.env`) for non-secret settings:
|
||
- Vivi's member ID (to filter for her messages specifically)
|
||
- Default auto-translate mode
|
||
- Logging level
|
||
- Initialize Config before starting the bot
|
||
|
||
---
|
||
|
||
## Data Flow
|
||
|
||
### Message Reception to Response
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Discord Message Event │
|
||
│ (user posts message proxied by PluralKit webhook) │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Message Event Handler │
|
||
│ - Check: webhook_id != None? │
|
||
│ - Query: PluralKit API for message info │
|
||
│ - Verify: member_id == Vivi's ID? │
|
||
│ - Filter: Ignore self-messages │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
(YES, Vivi's message)
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Emoji Parser │
|
||
│ - Extract emojis with regex │
|
||
│ - Categorize: Unicode vs. custom │
|
||
│ - Preserve order │
|
||
│ - Output: [{type, value, id}, ...] │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Translation Engine │
|
||
│ - For each emoji: lookup in database │
|
||
│ - Compose natural language │
|
||
│ - Handle unknown emojis │
|
||
│ - Format response │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Database (emoji_dictionary) │
|
||
│ - O(1) lookup by emoji_string (hash indexed) │
|
||
│ - Return: meaning, metadata │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
(Lookup Results)
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Response Formatting │
|
||
│ - Compose message │
|
||
│ - Check context: channel vs. DM │
|
||
│ - Apply server config: auto-translate mode │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Discord API Response │
|
||
│ - Send reply to channel or DM │
|
||
│ - Handle rate limits │
|
||
│ - Log interaction │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Teaching Flow (Command-Driven)
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ User runs: /teach 😷 2️⃣ "Vivi is sick, not sinuses" │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Command Handler │
|
||
│ - Parse command arguments │
|
||
│ - Authenticate: Is user authorized to teach? │
|
||
│ - Extract emojis from sequence │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Database Layer │
|
||
│ - INSERT or UPDATE emoji_dictionary │
|
||
│ - Set: emoji_string, meaning, updated_by, timestamp │
|
||
│ - Commit transaction │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ Confirmation Reply │
|
||
│ - "Learned: 😷 = sick, 2️⃣ = two" │
|
||
│ - Post in same context (channel or DM) │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## PluralKit Integration Details
|
||
|
||
### Detection Approach
|
||
|
||
1. **Webhook Detection (First Filter):**
|
||
- Check `message.webhook_id` property in discord.py
|
||
- If not None, message was sent via webhook (PluralKit proxy)
|
||
|
||
2. **PluralKit API Query (Confirmation):**
|
||
- Query endpoint: `GET https://api.pluralkit.me/v2/messages/{message_id}`
|
||
- The `message_id` can be the webhook message ID or the original message ID (original works for 30 minutes)
|
||
- Parse response to get `member` object
|
||
|
||
3. **Member Verification:**
|
||
- Extract `member.id` from API response
|
||
- Compare with Vivi's known member ID (from config)
|
||
- If match: Process as Vivi's message
|
||
- If not match: Ignore (message from another member)
|
||
|
||
4. **Alternative: Member Names (Backup):**
|
||
- If using member ID fails, fall back to checking `member.name`
|
||
- Look for "Vivi" or configured member name
|
||
|
||
### API Endpoints Used
|
||
|
||
| Endpoint | Purpose | Rate Limit | Response |
|
||
|----------|---------|-----------|----------|
|
||
| `GET /v2/messages/{message}` | Get proxied message info | 10/sec | Message object with member, sender, guild, channel, timestamp |
|
||
| `GET /v2/systems/@me` | Get authenticated system info | 10/sec | Full system + members (requires token) |
|
||
| `GET /v2/members/{member}` | Get specific member info | 10/sec | Member object with proxy tags, avatar, etc. |
|
||
|
||
**Authentication (Optional):**
|
||
- Public queries (member lookup) don't require authentication
|
||
- System-specific queries (private member settings) require system token via `Authorization: Bearer {token}` header
|
||
- For Vivi's system, store the system token in environment variable `PLURALKIT_TOKEN` for authenticated access
|
||
|
||
### Implementation in discord.py
|
||
|
||
```python
|
||
import aiohttp
|
||
|
||
async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
|
||
"""Check if message was proxied by Vivi via PluralKit."""
|
||
|
||
# Step 1: Check if message is from a webhook
|
||
if message.webhook_id is None:
|
||
return False # Not proxied
|
||
|
||
# Step 2: Query PluralKit API
|
||
async with aiohttp.ClientSession() as session:
|
||
try:
|
||
async with session.get(
|
||
f"https://api.pluralkit.me/v2/messages/{message.id}"
|
||
) as resp:
|
||
if resp.status != 200:
|
||
return False # Not a PluralKit message
|
||
|
||
data = await resp.json()
|
||
|
||
# Step 3: Check member ID matches Vivi
|
||
if data.get("member", {}).get("id") == vivi_member_id:
|
||
return True
|
||
else:
|
||
return False
|
||
except Exception as e:
|
||
# Log error, but don't crash
|
||
print(f"PluralKit API error: {e}")
|
||
return False
|
||
```
|
||
|
||
---
|
||
|
||
## Database Schema
|
||
|
||
### emoji_dictionary Table
|
||
|
||
```sql
|
||
CREATE TABLE emoji_dictionary (
|
||
emoji_string TEXT PRIMARY KEY,
|
||
custom_emoji_id BIGINT NULLABLE,
|
||
meaning TEXT NOT NULL,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_by_user_id BIGINT NULLABLE,
|
||
updated_by_member_id TEXT NULLABLE,
|
||
created_in_guild BIGINT NULLABLE
|
||
);
|
||
|
||
CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
|
||
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);
|
||
```
|
||
|
||
### server_configuration Table
|
||
|
||
```sql
|
||
CREATE TABLE server_configuration (
|
||
guild_id BIGINT PRIMARY KEY,
|
||
auto_translate BOOLEAN DEFAULT TRUE,
|
||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
|
||
CREATE INDEX idx_guild_id ON server_configuration(guild_id);
|
||
```
|
||
|
||
### Alternative: SQLAlchemy ORM Definitions
|
||
|
||
```python
|
||
from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
|
||
from sqlalchemy.ext.declarative import declarative_base
|
||
from datetime import datetime
|
||
|
||
Base = declarative_base()
|
||
|
||
class EmojiDictionary(Base):
|
||
__tablename__ = "emoji_dictionary"
|
||
|
||
emoji_string = Column(String, primary_key=True)
|
||
custom_emoji_id = Column(BigInteger, nullable=True)
|
||
meaning = Column(Text, nullable=False)
|
||
created_at = Column(DateTime, default=datetime.utcnow)
|
||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||
updated_by_user_id = Column(BigInteger, nullable=True)
|
||
updated_by_member_id = Column(String, nullable=True)
|
||
created_in_guild = Column(BigInteger, nullable=True)
|
||
|
||
class ServerConfiguration(Base):
|
||
__tablename__ = "server_configuration"
|
||
|
||
guild_id = Column(BigInteger, primary_key=True)
|
||
auto_translate = Column(Boolean, default=True)
|
||
created_at = Column(DateTime, default=datetime.utcnow)
|
||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||
```
|
||
|
||
---
|
||
|
||
## Suggested Build Order
|
||
|
||
### Phase 1: Foundation (Week 1-2)
|
||
**Goal:** Get Vivi messages detected and logged.
|
||
|
||
1. **Set up Discord bot:**
|
||
- Create Discord application and token
|
||
- Initialize discord.py Client/Bot with required Intents
|
||
- Implement basic `on_message` event handler
|
||
- Test basic logging
|
||
|
||
2. **Implement PluralKit detection:**
|
||
- Add webhook detection (check `message.webhook_id`)
|
||
- Add PluralKit API query and member verification
|
||
- Log when Vivi messages are detected
|
||
- Handle API errors gracefully
|
||
|
||
3. **Database initialization:**
|
||
- Set up PostgreSQL database
|
||
- Create emoji_dictionary and server_configuration tables
|
||
- Test connection from bot
|
||
|
||
**Deliverables:** Bot logs every Vivi message to console; doesn't respond yet.
|
||
|
||
---
|
||
|
||
### Phase 2: Emoji Parsing & Translation (Week 3-4)
|
||
**Goal:** Translate Vivi's emojis to text.
|
||
|
||
1. **Emoji parsing:**
|
||
- Implement regex patterns for Unicode and custom emojis
|
||
- Extract emoji sequences in order
|
||
- Test with various emoji types
|
||
|
||
2. **Basic emoji lookup:**
|
||
- Query emoji_dictionary table
|
||
- Return meanings for known emojis
|
||
- Handle unknown emojis
|
||
|
||
3. **Response formatting:**
|
||
- Compose natural language from emoji meanings
|
||
- Send reply to channel/DM
|
||
- Handle edge cases (no emojis, all unknown)
|
||
|
||
4. **Manual testing:**
|
||
- Create test emojis in database
|
||
- Post Vivi messages and verify translations
|
||
|
||
**Deliverables:** Bot translates Vivi messages; appears in channels and DMs.
|
||
|
||
---
|
||
|
||
### Phase 3: Teaching Commands (Week 5-6)
|
||
**Goal:** Allow users to teach the bot emoji meanings.
|
||
|
||
1. **Implement `/teach` command:**
|
||
- Parse emoji sequences
|
||
- Insert into database with metadata
|
||
- Confirm to user
|
||
|
||
2. **Implement `/meaning` command:**
|
||
- Look up single emoji
|
||
- Reply with meaning or "not learned yet"
|
||
|
||
3. **Implement `/forget` command:**
|
||
- Delete emoji from database
|
||
- Require admin or Vivi permission
|
||
|
||
4. **Permission system:**
|
||
- Restrict teaching to authorized users (Vivi + alters)
|
||
- Use Discord roles or user ID allowlist
|
||
|
||
**Deliverables:** Users can teach and query emoji meanings via commands.
|
||
|
||
---
|
||
|
||
### Phase 4: Per-Server Configuration (Week 7)
|
||
**Goal:** Allow servers to opt into/out of auto-translation.
|
||
|
||
1. **Implement `/config auto-translate` command:**
|
||
- Toggle auto-translate on/off per server
|
||
- Requires admin permission
|
||
- Only works in guild context (not DMs)
|
||
|
||
2. **Update message handler:**
|
||
- Check server config before auto-translating
|
||
- Only reply if auto_translate == TRUE
|
||
- In DMs, always translate when `/translate` is used
|
||
|
||
3. **On-demand translation:**
|
||
- `/translate` command for manual translation
|
||
- Works in any context
|
||
|
||
**Deliverables:** Servers can control translation behavior; bot respects preferences.
|
||
|
||
---
|
||
|
||
### Phase 5: Polish & Edge Cases (Week 8+)
|
||
**Goal:** Handle real-world complexity.
|
||
|
||
1. **Natural language formatting:**
|
||
- Improve composition of translations
|
||
- Handle emoji modifiers (skin tones, ZWJ sequences)
|
||
- Custom emoji descriptions
|
||
|
||
2. **Error handling & resilience:**
|
||
- Database unavailability
|
||
- PluralKit API failures
|
||
- Rate limiting with exponential backoff
|
||
- Graceful degradation
|
||
|
||
3. **Logging & monitoring:**
|
||
- Structured logging for debugging
|
||
- Monitor database performance
|
||
- Track API error rates
|
||
|
||
4. **Codebase refactoring:**
|
||
- Move commands to separate Cogs
|
||
- Organize into modules: `cogs/teaching.py`, `cogs/config.py`, etc.
|
||
- Add docstrings and type hints
|
||
|
||
5. **Testing:**
|
||
- Unit tests for emoji parsing
|
||
- Integration tests for database queries
|
||
- End-to-end tests with Discord
|
||
|
||
**Deliverables:** Robust, maintainable codebase ready for production.
|
||
|
||
---
|
||
|
||
## Scaling Considerations
|
||
|
||
### Multi-Server Architecture
|
||
|
||
**Challenge:** Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.
|
||
|
||
**Solution:**
|
||
|
||
1. **Shared Emoji Dictionary:**
|
||
- Single global PostgreSQL database with all emoji meanings
|
||
- All servers query the same emoji_dictionary table
|
||
- Updates are reflected across all servers immediately
|
||
- Reduces redundancy and keeps meanings consistent
|
||
|
||
2. **Per-Server Configuration:**
|
||
- Each guild has its own row in server_configuration
|
||
- Fast lookup by guild_id (indexed)
|
||
- Allows servers to choose auto-translate vs. on-demand
|
||
|
||
3. **Connection Pooling:**
|
||
- SQLAlchemy async engine with `pool_size=20, max_overflow=10` (tunable)
|
||
- Reuses database connections across handlers
|
||
- Prevents connection exhaustion under load
|
||
|
||
### Performance Optimization
|
||
|
||
1. **Emoji Lookup Performance:**
|
||
- Primary key index on emoji_dictionary.emoji_string for O(1) lookup
|
||
- Secondary index on custom_emoji_id for custom emoji queries
|
||
- Consider in-memory cache (Redis) if lookups become bottleneck:
|
||
- Query Redis first (1ms latency)
|
||
- Fall back to PostgreSQL
|
||
- Invalidate cache on updates
|
||
|
||
2. **Caching Strategy (Optional, Post-MVP):**
|
||
- Use Redis for frequently accessed emojis
|
||
- TTL: 1 hour (emoji meanings change rarely)
|
||
- Invalidate cache when `/teach` or `/forget` commands update dictionary
|
||
- Benefits: Reduced database load, lower latency
|
||
|
||
3. **Rate Limiting:**
|
||
- PluralKit API: 10 requests/second (already enforced by API)
|
||
- Discord API: 50 requests/minute per channel (built into discord.py)
|
||
- Implement local rate limiting with `asyncio.Semaphore` for PluralKit queries:
|
||
```python
|
||
semaphore = asyncio.Semaphore(5) # Max 5 concurrent PluralKit queries
|
||
```
|
||
|
||
4. **Message Handler Optimization:**
|
||
- Webhook detection (local check): ~0ms
|
||
- PluralKit API query: ~100-200ms (async, non-blocking)
|
||
- Emoji parsing (regex): ~1-5ms
|
||
- Database lookup: ~1-50ms
|
||
- **Total:** ~100-250ms per message (acceptable, happens in background)
|
||
|
||
### Scaling Beyond 2,000 Guilds
|
||
|
||
**Discord Requirement:** Bots with 2,000+ guilds must implement sharding.
|
||
|
||
**Sharding in discord.py:**
|
||
- discord.py handles sharding automatically if configured
|
||
- Bot distributes connections across multiple "shards" to different Discord servers
|
||
- Each shard handles a subset of guilds
|
||
- Emoji dictionary remains shared across all shards (single database)
|
||
|
||
**Example Configuration:**
|
||
```python
|
||
intents = discord.Intents.default()
|
||
bot = discord.AutoShardedBot(intents=intents) # Automatic sharding
|
||
|
||
# Bot will shard automatically based on guild count
|
||
```
|
||
|
||
### Database Scaling
|
||
|
||
**For Millions of Emojis:**
|
||
- Table partitioning by emoji language/category (if dict grows huge)
|
||
- Read replicas for queries (if read-heavy)
|
||
- Consider denormalization (e.g., cache popular emoji meanings in memory)
|
||
|
||
**Current Recommendation:** Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.
|
||
|
||
---
|
||
|
||
## Component Interaction Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ │
|
||
│ Discord Server (Guild) │
|
||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||
│ │ #general │ │
|
||
│ │ Vivi (proxied): 😷 2️⃣ 🍑 ❌ 🤧 │ │
|
||
│ │ Vivi Speech Translator: "Vivi is sick, but not in ..." │ │
|
||
│ └──────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
↑ (message event)
|
||
│
|
||
┌──────────────────────────┴──────────────────────────────────────┐
|
||
│ Discord.py Bot Framework │
|
||
│ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Discord Client │ │
|
||
│ │ - Maintains WebSocket connection │ │
|
||
│ │ - Routes events to handlers │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Message Event Handler (on_message) │ │
|
||
│ │ - Filter: webhook_id? │ │
|
||
│ │ - Query: PluralKit API │ │
|
||
│ │ - Verify: member_id == Vivi? │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Emoji Parser │ │
|
||
│ │ - Extract emojis (regex) │ │
|
||
│ │ - Categorize (Unicode/custom) │ │
|
||
│ │ - Preserve order │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Translation Engine │ │
|
||
│ │ - Lookup emojis in database │ │
|
||
│ │ - Compose natural language │ │
|
||
│ │ - Format response │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ Command Handler (Cogs) │ │
|
||
│ │ - /teach: Learn emoji meanings │ │
|
||
│ │ - /meaning: Look up emoji │ │
|
||
│ │ - /forget: Delete emoji │ │
|
||
│ │ - /config: Server preferences │ │
|
||
│ │ - /translate: Manual translation │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
└──────────────────────────────┬──────────────────────────────────┘
|
||
↓ (database queries/updates)
|
||
┌──────────────────────────────────────────────────────────────────┐
|
||
│ Database Layer │
|
||
│ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ SQLAlchemy ORM + asyncio │ │
|
||
│ │ - Async connection pool │ │
|
||
│ │ - Connection reuse │ │
|
||
│ │ - Transaction management │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ PostgreSQL Database │ │
|
||
│ │ │ │
|
||
│ │ emoji_dictionary: server_configuration: │ │
|
||
│ │ ├─ 😷 → sick ├─ guild_123 → auto ON │ │
|
||
│ │ ├─ 2️⃣ → two ├─ guild_456 → auto OFF │ │
|
||
│ │ ├─ 🍑 → peach └─ guild_789 → auto ON │ │
|
||
│ │ ├─ ❌ → no │ │
|
||
│ │ └─ :me1: → Vivi (+ metadata, timestamps) │ │
|
||
│ │ (+ metadata, timestamps) │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────┘
|
||
|
||
External APIs:
|
||
┌────────────────────────────────────────────────────────────┐
|
||
│ PluralKit API (api.pluralkit.me) │
|
||
│ - GET /v2/messages/{id} → member info │
|
||
└────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## Technology Stack Recommendations
|
||
|
||
| Layer | Component | Technology | Why |
|
||
|-------|-----------|-----------|-----|
|
||
| **Bot Framework** | Discord Integration | discord.py 2.x | Async-native, active community, rich feature set |
|
||
| **Database ORM** | Persistence | SQLAlchemy 2.0 + asyncio | Async support, type-safe, widely adopted |
|
||
| **Database** | Data Store | PostgreSQL | Reliable, open-source, JSONB for future extensibility |
|
||
| **Async Runtime** | Concurrency | asyncio (built-in) | Lightweight, integrated with discord.py |
|
||
| **Caching** | Performance (Phase 5+) | Redis | Fast in-memory lookups, TTL support, distributed |
|
||
| **Logging** | Debugging | Python logging module | Built-in, structured logging can extend |
|
||
| **API Requests** | HTTP Calls | aiohttp | Async-native, connection pooling |
|
||
| **Testing** | Quality Assurance | pytest + pytest-asyncio | Async test support, fixtures |
|
||
| **Deployment** | Hosting | Docker + systemd or cloud | Reproducible environment, easy updates |
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
**Recommended Architecture:**
|
||
|
||
Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:
|
||
|
||
1. **Discord Client** listens for messages and routes them through a detection pipeline
|
||
2. **Message Event Handler** identifies when Vivi speaks (via PluralKit webhook + API verification)
|
||
3. **Emoji Parser** extracts emoji sequences while preserving order and type information
|
||
4. **Translation Engine** looks up meanings and composes responses
|
||
5. **Database Layer** (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
|
||
6. **Command Handler** (discord.py Cogs) allows teaching, querying, and configuration
|
||
|
||
The bot prioritizes:
|
||
- **Reliability:** Graceful error handling, retry logic, database transactions
|
||
- **Performance:** O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
|
||
- **Scalability:** Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
|
||
- **Maintainability:** Modular Cog architecture, clear component boundaries, comprehensive logging
|
||
|
||
Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.
|
||
|
||
The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
- [PluralKit API Reference](https://pluralkit.me/api/)
|
||
- [discord.py Documentation](https://discordpy.readthedocs.io/)
|
||
- [SQLAlchemy Async Documentation](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html)
|
||
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
|
||
- [Redis Documentation](https://redis.io/documentation)
|