docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents: Key Findings: - Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration - Architecture: 7-component system with clear separation of concerns, async-native - Features: Rule-based learning system starting simple, avoiding context inference and ML - Pitfalls: 8 critical risks identified with phase assignments and prevention strategies Recommended Approach: - 5-phase build order (detection → translation → teaching → config → polish) - Focus on dysgraphia accessibility for teaching interface - Start with message detection reliability (Phase 1, load-bearing) - Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+ Confidence Levels: - Tech Stack: VERY HIGH (all production-proven, no experimental choices) - Architecture: VERY HIGH (mirrors successful production bots) - Features: HIGH (tight scope, transparent approach) - Roadmap: HIGH (logical phase progression with value delivery) Gaps to Address in Requirements: - Vivi's teaching UX preferences (dysgraphia-specific patterns) - Exact emoji coverage and naming conventions - Moderation/teaching permissions model - Multi-system scope and per-system customization needs Ready for requirements definition and roadmap creation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
774
.planning/research/ARCHITECTURE.md
Normal file
774
.planning/research/ARCHITECTURE.md
Normal file
@@ -0,0 +1,774 @@
|
||||
# Architecture Research: Vivi Speech Translator
|
||||
|
||||
## Overview
|
||||
|
||||
Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.
|
||||
|
||||
This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
|
||||
### 1. Discord Client
|
||||
|
||||
**Responsibility:** Establish and maintain the connection to Discord's API and WebSocket.
|
||||
|
||||
**Key Details:**
|
||||
- Uses `discord.Client` or `discord.ext.commands.Bot` from discord.py library
|
||||
- Requires `Intents` configuration to specify which events the bot listens for:
|
||||
- `message_content` intent: Required to read message text (privileged intent, requires approval)
|
||||
- `guilds` intent: Track guild membership and changes
|
||||
- `direct_messages` intent: Listen for DMs
|
||||
- `dm_messages` intent: Read DM message content
|
||||
- Initializes on startup and runs the main event loop via `client.run(token)`
|
||||
- Handles connection failures and automatic reconnection
|
||||
|
||||
**Why This Matters:** Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.
|
||||
|
||||
---
|
||||
|
||||
### 2. Message Event Handler
|
||||
|
||||
**Responsibility:** Receive all messages, filter for relevance, and route to downstream processors.
|
||||
|
||||
**Key Details:**
|
||||
- Implements `on_message` event in discord.py (async callback)
|
||||
- Filters for:
|
||||
1. **Webhook Detection:** Check if `message.webhook_id` is not None (indicates a proxied message)
|
||||
2. **PluralKit Verification:** Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
|
||||
3. **Vivi Detection:** Check if the `member_id` in the PluralKit response matches Vivi's registered member ID
|
||||
4. **Bot Self-Filter:** Ignore messages from Vivi Speech Translator bot itself
|
||||
- Routes confirmed Vivi messages to the Emoji Parser
|
||||
- Handles both guild channels and DMs
|
||||
|
||||
**PluralKit Detection Approach:**
|
||||
When a message is received, the bot can query the PluralKit API using the message ID:
|
||||
```
|
||||
GET https://api.pluralkit.me/v2/messages/{message_id}
|
||||
```
|
||||
This returns a Message object containing:
|
||||
- `member`: The member object that proxied the message (contains member_id, name, avatar, etc.)
|
||||
- `sender`: The original user ID that sent the command (the account owner)
|
||||
- `system`: The system that manages the members
|
||||
- `timestamp`: When the message was sent
|
||||
- `guild`: The guild ID where the message was sent
|
||||
- `channel`: The channel ID where the message was sent
|
||||
|
||||
By checking if `response.member.id == vivi_member_id`, the bot can verify Vivi specifically sent the message.
|
||||
|
||||
**Rate Limiting:** PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.
|
||||
|
||||
---
|
||||
|
||||
### 3. Emoji Parser
|
||||
|
||||
**Responsibility:** Extract and categorize emojis from a message into a structured sequence.
|
||||
|
||||
**Key Details:**
|
||||
- Receives the confirmed Vivi message text from the Message Event Handler
|
||||
- Uses regex patterns to extract:
|
||||
1. **Unicode Emojis:** Standard emoji characters (😷, ❌, etc.)
|
||||
- Pattern: `\p{Extended_Pictographic}` (matches full Unicode emoji range)
|
||||
- Alternative Python regex: `([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)`
|
||||
2. **Custom Server Emojis:** Discord custom emoji format `<:emoji_name:emoji_id>` or `<a:emoji_name:emoji_id>` (for animated)
|
||||
- Pattern: `<a?:[^:\s]+:\d+>`
|
||||
- Preserves order of emojis as they appear left-to-right
|
||||
- Returns a structured list like: `[{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}]`
|
||||
- Handles edge cases:
|
||||
- Emoji skin tone modifiers
|
||||
- Zero-width joiners (ZWJ sequences like family emojis)
|
||||
- Emoji variations
|
||||
|
||||
**Why This Order Matters:** The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.
|
||||
|
||||
---
|
||||
|
||||
### 4. Translation Engine
|
||||
|
||||
**Responsibility:** Convert emoji sequences into natural language using the emoji dictionary.
|
||||
|
||||
**Key Details:**
|
||||
- Receives structured emoji list from Emoji Parser
|
||||
- For each emoji:
|
||||
1. Look up its meaning in the Emoji Dictionary (database)
|
||||
2. Handle three cases:
|
||||
- **Known emoji:** Include its meaning in output
|
||||
- **Unknown emoji:** Display the emoji itself with a placeholder or skip
|
||||
- **Custom emoji:** Look up by custom emoji ID in database
|
||||
- Generates natural language output:
|
||||
- If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
|
||||
- If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
|
||||
- If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the `/teach` command."
|
||||
- Considers emoji context (e.g., combination of emojis might have a specific meaning)
|
||||
|
||||
**Output Format:** The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).
|
||||
|
||||
---
|
||||
|
||||
### 5. Database Layer
|
||||
|
||||
**Responsibility:** Store and retrieve persistent data (emoji dictionary and server configurations).
|
||||
|
||||
**Key Details:**
|
||||
- **Tech Stack:** SQLAlchemy ORM with PostgreSQL for production reliability
|
||||
- **Async Support:** Use `sqlalchemy.ext.asyncio` or `asyncpg` to avoid blocking the Discord event loop
|
||||
- **Initialization:** Override `Bot.start()` or use a `setup_hook` to connect to database on startup
|
||||
- **Connection Pooling:** Configure connection pool to handle concurrent requests from message handlers
|
||||
|
||||
**Two Core Tables:**
|
||||
|
||||
1. **emoji_dictionary**
|
||||
- `emoji_string` (TEXT, PRIMARY KEY): The emoji character(s) or custom emoji format
|
||||
- `custom_emoji_id` (BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)
|
||||
- `meaning` (TEXT): The learned meaning
|
||||
- `created_at` (TIMESTAMP): When first learned
|
||||
- `updated_at` (TIMESTAMP): Last update time
|
||||
- `updated_by_user_id` (BIGINT, NULLABLE): User ID of who taught/corrected this
|
||||
- `updated_by_member_id` (TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)
|
||||
- `created_in_guild` (BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)
|
||||
- Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
|
||||
|
||||
2. **server_configuration**
|
||||
- `guild_id` (BIGINT, PRIMARY KEY): Discord server ID
|
||||
- `auto_translate` (BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require `/translate` command
|
||||
- `created_at` (TIMESTAMP): When server config created
|
||||
- Updated by Configuration Command Handler
|
||||
|
||||
**Important Design Decisions:**
|
||||
- **Global dictionary:** Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
|
||||
- **Per-server config:** Each server has its own settings (auto vs. on-demand mode).
|
||||
- **User attribution:** Track who taught each emoji for transparency and conflict resolution.
|
||||
- **No per-server emoji variants:** The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.
|
||||
|
||||
---
|
||||
|
||||
### 6. Command Handler (Teaching & Configuration)
|
||||
|
||||
**Responsibility:** Process bot commands for teaching emojis and configuring server behavior.
|
||||
|
||||
**Key Details:**
|
||||
- **Tech:** Discord.py `commands.Cog` extension for modular command organization
|
||||
- **Commands to Implement:**
|
||||
|
||||
1. `/teach <emoji_sequence> <meaning>`
|
||||
- Extract emojis from the sequence using the Emoji Parser
|
||||
- Insert or update each emoji in the database
|
||||
- Confirm: "Learned: 😷 = sick, 2️⃣ = two, etc."
|
||||
- Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
|
||||
|
||||
2. `/forget <emoji>`
|
||||
- Delete emoji from dictionary
|
||||
- Confirm deletion
|
||||
|
||||
3. `/meaning <emoji>`
|
||||
- Look up and reply with the meaning of a specific emoji
|
||||
- If unknown, reply: "I don't know that one yet."
|
||||
|
||||
4. `/config auto-translate <on|off>`
|
||||
- Update `server_configuration.auto_translate` in database
|
||||
- Only server admins can change this
|
||||
- Requires guild context (won't work in DMs)
|
||||
|
||||
5. `/translate <emoji_sequence>` (On-demand mode)
|
||||
- Manually trigger translation of an emoji sequence
|
||||
- Works in both channels and DMs
|
||||
|
||||
- **Error Handling:**
|
||||
- Graceful failures if database is unavailable
|
||||
- Clear user feedback for invalid emoji sequences
|
||||
- Require proper permissions for sensitive commands
|
||||
|
||||
---
|
||||
|
||||
### 7. Configuration Layer
|
||||
|
||||
**Responsibility:** Load bot configuration (token, database connection string, etc.) at startup.
|
||||
|
||||
**Key Details:**
|
||||
- Use environment variables for secrets: `DISCORD_TOKEN`, `DATABASE_URL`, `PLURALKIT_TOKEN` (optional, for user-specific API calls)
|
||||
- Configuration file (e.g., `config.json` or `.env`) for non-secret settings:
|
||||
- Vivi's member ID (to filter for her messages specifically)
|
||||
- Default auto-translate mode
|
||||
- Logging level
|
||||
- Initialize Config before starting the bot
|
||||
|
||||
---
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Message Reception to Response
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Discord Message Event │
|
||||
│ (user posts message proxied by PluralKit webhook) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Message Event Handler │
|
||||
│ - Check: webhook_id != None? │
|
||||
│ - Query: PluralKit API for message info │
|
||||
│ - Verify: member_id == Vivi's ID? │
|
||||
│ - Filter: Ignore self-messages │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
(YES, Vivi's message)
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Emoji Parser │
|
||||
│ - Extract emojis with regex │
|
||||
│ - Categorize: Unicode vs. custom │
|
||||
│ - Preserve order │
|
||||
│ - Output: [{type, value, id}, ...] │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Translation Engine │
|
||||
│ - For each emoji: lookup in database │
|
||||
│ - Compose natural language │
|
||||
│ - Handle unknown emojis │
|
||||
│ - Format response │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Database (emoji_dictionary) │
|
||||
│ - O(1) lookup by emoji_string (hash indexed) │
|
||||
│ - Return: meaning, metadata │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
(Lookup Results)
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Response Formatting │
|
||||
│ - Compose message │
|
||||
│ - Check context: channel vs. DM │
|
||||
│ - Apply server config: auto-translate mode │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Discord API Response │
|
||||
│ - Send reply to channel or DM │
|
||||
│ - Handle rate limits │
|
||||
│ - Log interaction │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Teaching Flow (Command-Driven)
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ User runs: /teach 😷 2️⃣ "Vivi is sick, not sinuses" │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Command Handler │
|
||||
│ - Parse command arguments │
|
||||
│ - Authenticate: Is user authorized to teach? │
|
||||
│ - Extract emojis from sequence │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Database Layer │
|
||||
│ - INSERT or UPDATE emoji_dictionary │
|
||||
│ - Set: emoji_string, meaning, updated_by, timestamp │
|
||||
│ - Commit transaction │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
↓
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Confirmation Reply │
|
||||
│ - "Learned: 😷 = sick, 2️⃣ = two" │
|
||||
│ - Post in same context (channel or DM) │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PluralKit Integration Details
|
||||
|
||||
### Detection Approach
|
||||
|
||||
1. **Webhook Detection (First Filter):**
|
||||
- Check `message.webhook_id` property in discord.py
|
||||
- If not None, message was sent via webhook (PluralKit proxy)
|
||||
|
||||
2. **PluralKit API Query (Confirmation):**
|
||||
- Query endpoint: `GET https://api.pluralkit.me/v2/messages/{message_id}`
|
||||
- The `message_id` can be the webhook message ID or the original message ID (original works for 30 minutes)
|
||||
- Parse response to get `member` object
|
||||
|
||||
3. **Member Verification:**
|
||||
- Extract `member.id` from API response
|
||||
- Compare with Vivi's known member ID (from config)
|
||||
- If match: Process as Vivi's message
|
||||
- If not match: Ignore (message from another member)
|
||||
|
||||
4. **Alternative: Member Names (Backup):**
|
||||
- If using member ID fails, fall back to checking `member.name`
|
||||
- Look for "Vivi" or configured member name
|
||||
|
||||
### API Endpoints Used
|
||||
|
||||
| Endpoint | Purpose | Rate Limit | Response |
|
||||
|----------|---------|-----------|----------|
|
||||
| `GET /v2/messages/{message}` | Get proxied message info | 10/sec | Message object with member, sender, guild, channel, timestamp |
|
||||
| `GET /v2/systems/@me` | Get authenticated system info | 10/sec | Full system + members (requires token) |
|
||||
| `GET /v2/members/{member}` | Get specific member info | 10/sec | Member object with proxy tags, avatar, etc. |
|
||||
|
||||
**Authentication (Optional):**
|
||||
- Public queries (member lookup) don't require authentication
|
||||
- System-specific queries (private member settings) require system token via `Authorization: Bearer {token}` header
|
||||
- For Vivi's system, store the system token in environment variable `PLURALKIT_TOKEN` for authenticated access
|
||||
|
||||
### Implementation in discord.py
|
||||
|
||||
```python
|
||||
import aiohttp
|
||||
|
||||
async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
|
||||
"""Check if message was proxied by Vivi via PluralKit."""
|
||||
|
||||
# Step 1: Check if message is from a webhook
|
||||
if message.webhook_id is None:
|
||||
return False # Not proxied
|
||||
|
||||
# Step 2: Query PluralKit API
|
||||
async with aiohttp.ClientSession() as session:
|
||||
try:
|
||||
async with session.get(
|
||||
f"https://api.pluralkit.me/v2/messages/{message.id}"
|
||||
) as resp:
|
||||
if resp.status != 200:
|
||||
return False # Not a PluralKit message
|
||||
|
||||
data = await resp.json()
|
||||
|
||||
# Step 3: Check member ID matches Vivi
|
||||
if data.get("member", {}).get("id") == vivi_member_id:
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
except Exception as e:
|
||||
# Log error, but don't crash
|
||||
print(f"PluralKit API error: {e}")
|
||||
return False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
### emoji_dictionary Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE emoji_dictionary (
|
||||
emoji_string TEXT PRIMARY KEY,
|
||||
custom_emoji_id BIGINT NULLABLE,
|
||||
meaning TEXT NOT NULL,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_by_user_id BIGINT NULLABLE,
|
||||
updated_by_member_id TEXT NULLABLE,
|
||||
created_in_guild BIGINT NULLABLE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
|
||||
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);
|
||||
```
|
||||
|
||||
### server_configuration Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE server_configuration (
|
||||
guild_id BIGINT PRIMARY KEY,
|
||||
auto_translate BOOLEAN DEFAULT TRUE,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE INDEX idx_guild_id ON server_configuration(guild_id);
|
||||
```
|
||||
|
||||
### Alternative: SQLAlchemy ORM Definitions
|
||||
|
||||
```python
|
||||
from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
|
||||
from sqlalchemy.ext.declarative import declarative_base
|
||||
from datetime import datetime
|
||||
|
||||
Base = declarative_base()
|
||||
|
||||
class EmojiDictionary(Base):
|
||||
__tablename__ = "emoji_dictionary"
|
||||
|
||||
emoji_string = Column(String, primary_key=True)
|
||||
custom_emoji_id = Column(BigInteger, nullable=True)
|
||||
meaning = Column(Text, nullable=False)
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||||
updated_by_user_id = Column(BigInteger, nullable=True)
|
||||
updated_by_member_id = Column(String, nullable=True)
|
||||
created_in_guild = Column(BigInteger, nullable=True)
|
||||
|
||||
class ServerConfiguration(Base):
|
||||
__tablename__ = "server_configuration"
|
||||
|
||||
guild_id = Column(BigInteger, primary_key=True)
|
||||
auto_translate = Column(Boolean, default=True)
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Suggested Build Order
|
||||
|
||||
### Phase 1: Foundation (Week 1-2)
|
||||
**Goal:** Get Vivi messages detected and logged.
|
||||
|
||||
1. **Set up Discord bot:**
|
||||
- Create Discord application and token
|
||||
- Initialize discord.py Client/Bot with required Intents
|
||||
- Implement basic `on_message` event handler
|
||||
- Test basic logging
|
||||
|
||||
2. **Implement PluralKit detection:**
|
||||
- Add webhook detection (check `message.webhook_id`)
|
||||
- Add PluralKit API query and member verification
|
||||
- Log when Vivi messages are detected
|
||||
- Handle API errors gracefully
|
||||
|
||||
3. **Database initialization:**
|
||||
- Set up PostgreSQL database
|
||||
- Create emoji_dictionary and server_configuration tables
|
||||
- Test connection from bot
|
||||
|
||||
**Deliverables:** Bot logs every Vivi message to console; doesn't respond yet.
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Emoji Parsing & Translation (Week 3-4)
|
||||
**Goal:** Translate Vivi's emojis to text.
|
||||
|
||||
1. **Emoji parsing:**
|
||||
- Implement regex patterns for Unicode and custom emojis
|
||||
- Extract emoji sequences in order
|
||||
- Test with various emoji types
|
||||
|
||||
2. **Basic emoji lookup:**
|
||||
- Query emoji_dictionary table
|
||||
- Return meanings for known emojis
|
||||
- Handle unknown emojis
|
||||
|
||||
3. **Response formatting:**
|
||||
- Compose natural language from emoji meanings
|
||||
- Send reply to channel/DM
|
||||
- Handle edge cases (no emojis, all unknown)
|
||||
|
||||
4. **Manual testing:**
|
||||
- Create test emojis in database
|
||||
- Post Vivi messages and verify translations
|
||||
|
||||
**Deliverables:** Bot translates Vivi messages; appears in channels and DMs.
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Teaching Commands (Week 5-6)
|
||||
**Goal:** Allow users to teach the bot emoji meanings.
|
||||
|
||||
1. **Implement `/teach` command:**
|
||||
- Parse emoji sequences
|
||||
- Insert into database with metadata
|
||||
- Confirm to user
|
||||
|
||||
2. **Implement `/meaning` command:**
|
||||
- Look up single emoji
|
||||
- Reply with meaning or "not learned yet"
|
||||
|
||||
3. **Implement `/forget` command:**
|
||||
- Delete emoji from database
|
||||
- Require admin or Vivi permission
|
||||
|
||||
4. **Permission system:**
|
||||
- Restrict teaching to authorized users (Vivi + alters)
|
||||
- Use Discord roles or user ID allowlist
|
||||
|
||||
**Deliverables:** Users can teach and query emoji meanings via commands.
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Per-Server Configuration (Week 7)
|
||||
**Goal:** Allow servers to opt into/out of auto-translation.
|
||||
|
||||
1. **Implement `/config auto-translate` command:**
|
||||
- Toggle auto-translate on/off per server
|
||||
- Requires admin permission
|
||||
- Only works in guild context (not DMs)
|
||||
|
||||
2. **Update message handler:**
|
||||
- Check server config before auto-translating
|
||||
- Only reply if auto_translate == TRUE
|
||||
- In DMs, always translate when `/translate` is used
|
||||
|
||||
3. **On-demand translation:**
|
||||
- `/translate` command for manual translation
|
||||
- Works in any context
|
||||
|
||||
**Deliverables:** Servers can control translation behavior; bot respects preferences.
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Polish & Edge Cases (Week 8+)
|
||||
**Goal:** Handle real-world complexity.
|
||||
|
||||
1. **Natural language formatting:**
|
||||
- Improve composition of translations
|
||||
- Handle emoji modifiers (skin tones, ZWJ sequences)
|
||||
- Custom emoji descriptions
|
||||
|
||||
2. **Error handling & resilience:**
|
||||
- Database unavailability
|
||||
- PluralKit API failures
|
||||
- Rate limiting with exponential backoff
|
||||
- Graceful degradation
|
||||
|
||||
3. **Logging & monitoring:**
|
||||
- Structured logging for debugging
|
||||
- Monitor database performance
|
||||
- Track API error rates
|
||||
|
||||
4. **Codebase refactoring:**
|
||||
- Move commands to separate Cogs
|
||||
- Organize into modules: `cogs/teaching.py`, `cogs/config.py`, etc.
|
||||
- Add docstrings and type hints
|
||||
|
||||
5. **Testing:**
|
||||
- Unit tests for emoji parsing
|
||||
- Integration tests for database queries
|
||||
- End-to-end tests with Discord
|
||||
|
||||
**Deliverables:** Robust, maintainable codebase ready for production.
|
||||
|
||||
---
|
||||
|
||||
## Scaling Considerations
|
||||
|
||||
### Multi-Server Architecture
|
||||
|
||||
**Challenge:** Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Shared Emoji Dictionary:**
|
||||
- Single global PostgreSQL database with all emoji meanings
|
||||
- All servers query the same emoji_dictionary table
|
||||
- Updates are reflected across all servers immediately
|
||||
- Reduces redundancy and keeps meanings consistent
|
||||
|
||||
2. **Per-Server Configuration:**
|
||||
- Each guild has its own row in server_configuration
|
||||
- Fast lookup by guild_id (indexed)
|
||||
- Allows servers to choose auto-translate vs. on-demand
|
||||
|
||||
3. **Connection Pooling:**
|
||||
- SQLAlchemy async engine with `pool_size=20, max_overflow=10` (tunable)
|
||||
- Reuses database connections across handlers
|
||||
- Prevents connection exhaustion under load
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
1. **Emoji Lookup Performance:**
|
||||
- Primary key index on emoji_dictionary.emoji_string for O(1) lookup
|
||||
- Secondary index on custom_emoji_id for custom emoji queries
|
||||
- Consider in-memory cache (Redis) if lookups become bottleneck:
|
||||
- Query Redis first (1ms latency)
|
||||
- Fall back to PostgreSQL
|
||||
- Invalidate cache on updates
|
||||
|
||||
2. **Caching Strategy (Optional, Post-MVP):**
|
||||
- Use Redis for frequently accessed emojis
|
||||
- TTL: 1 hour (emoji meanings change rarely)
|
||||
- Invalidate cache when `/teach` or `/forget` commands update dictionary
|
||||
- Benefits: Reduced database load, lower latency
|
||||
|
||||
3. **Rate Limiting:**
|
||||
- PluralKit API: 10 requests/second (already enforced by API)
|
||||
- Discord API: 50 requests/minute per channel (built into discord.py)
|
||||
- Implement local rate limiting with `asyncio.Semaphore` for PluralKit queries:
|
||||
```python
|
||||
semaphore = asyncio.Semaphore(5) # Max 5 concurrent PluralKit queries
|
||||
```
|
||||
|
||||
4. **Message Handler Optimization:**
|
||||
- Webhook detection (local check): ~0ms
|
||||
- PluralKit API query: ~100-200ms (async, non-blocking)
|
||||
- Emoji parsing (regex): ~1-5ms
|
||||
- Database lookup: ~1-50ms
|
||||
- **Total:** ~100-250ms per message (acceptable, happens in background)
|
||||
|
||||
### Scaling Beyond 2,000 Guilds
|
||||
|
||||
**Discord Requirement:** Bots with 2,000+ guilds must implement sharding.
|
||||
|
||||
**Sharding in discord.py:**
|
||||
- discord.py handles sharding automatically if configured
|
||||
- Bot distributes connections across multiple "shards" to different Discord servers
|
||||
- Each shard handles a subset of guilds
|
||||
- Emoji dictionary remains shared across all shards (single database)
|
||||
|
||||
**Example Configuration:**
|
||||
```python
|
||||
intents = discord.Intents.default()
|
||||
bot = discord.AutoShardedBot(intents=intents) # Automatic sharding
|
||||
|
||||
# Bot will shard automatically based on guild count
|
||||
```
|
||||
|
||||
### Database Scaling
|
||||
|
||||
**For Millions of Emojis:**
|
||||
- Table partitioning by emoji language/category (if dict grows huge)
|
||||
- Read replicas for queries (if read-heavy)
|
||||
- Consider denormalization (e.g., cache popular emoji meanings in memory)
|
||||
|
||||
**Current Recommendation:** Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.
|
||||
|
||||
---
|
||||
|
||||
## Component Interaction Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ │
|
||||
│ Discord Server (Guild) │
|
||||
│ ┌──────────────────────────────────────────────────────────┐ │
|
||||
│ │ #general │ │
|
||||
│ │ Vivi (proxied): 😷 2️⃣ 🍑 ❌ 🤧 │ │
|
||||
│ │ Vivi Speech Translator: "Vivi is sick, but not in ..." │ │
|
||||
│ └──────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
↑ (message event)
|
||||
│
|
||||
┌──────────────────────────┴──────────────────────────────────────┐
|
||||
│ Discord.py Bot Framework │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Discord Client │ │
|
||||
│ │ - Maintains WebSocket connection │ │
|
||||
│ │ - Routes events to handlers │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Message Event Handler (on_message) │ │
|
||||
│ │ - Filter: webhook_id? │ │
|
||||
│ │ - Query: PluralKit API │ │
|
||||
│ │ - Verify: member_id == Vivi? │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Emoji Parser │ │
|
||||
│ │ - Extract emojis (regex) │ │
|
||||
│ │ - Categorize (Unicode/custom) │ │
|
||||
│ │ - Preserve order │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Translation Engine │ │
|
||||
│ │ - Lookup emojis in database │ │
|
||||
│ │ - Compose natural language │ │
|
||||
│ │ - Format response │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Command Handler (Cogs) │ │
|
||||
│ │ - /teach: Learn emoji meanings │ │
|
||||
│ │ - /meaning: Look up emoji │ │
|
||||
│ │ - /forget: Delete emoji │ │
|
||||
│ │ - /config: Server preferences │ │
|
||||
│ │ - /translate: Manual translation │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────┬──────────────────────────────────┘
|
||||
↓ (database queries/updates)
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ Database Layer │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ SQLAlchemy ORM + asyncio │ │
|
||||
│ │ - Async connection pool │ │
|
||||
│ │ - Connection reuse │ │
|
||||
│ │ - Transaction management │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ PostgreSQL Database │ │
|
||||
│ │ │ │
|
||||
│ │ emoji_dictionary: server_configuration: │ │
|
||||
│ │ ├─ 😷 → sick ├─ guild_123 → auto ON │ │
|
||||
│ │ ├─ 2️⃣ → two ├─ guild_456 → auto OFF │ │
|
||||
│ │ ├─ 🍑 → peach └─ guild_789 → auto ON │ │
|
||||
│ │ ├─ ❌ → no │ │
|
||||
│ │ └─ :me1: → Vivi (+ metadata, timestamps) │ │
|
||||
│ │ (+ metadata, timestamps) │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
|
||||
External APIs:
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ PluralKit API (api.pluralkit.me) │
|
||||
│ - GET /v2/messages/{id} → member info │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack Recommendations
|
||||
|
||||
| Layer | Component | Technology | Why |
|
||||
|-------|-----------|-----------|-----|
|
||||
| **Bot Framework** | Discord Integration | discord.py 2.x | Async-native, active community, rich feature set |
|
||||
| **Database ORM** | Persistence | SQLAlchemy 2.0 + asyncio | Async support, type-safe, widely adopted |
|
||||
| **Database** | Data Store | PostgreSQL | Reliable, open-source, JSONB for future extensibility |
|
||||
| **Async Runtime** | Concurrency | asyncio (built-in) | Lightweight, integrated with discord.py |
|
||||
| **Caching** | Performance (Phase 5+) | Redis | Fast in-memory lookups, TTL support, distributed |
|
||||
| **Logging** | Debugging | Python logging module | Built-in, structured logging can extend |
|
||||
| **API Requests** | HTTP Calls | aiohttp | Async-native, connection pooling |
|
||||
| **Testing** | Quality Assurance | pytest + pytest-asyncio | Async test support, fixtures |
|
||||
| **Deployment** | Hosting | Docker + systemd or cloud | Reproducible environment, easy updates |
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Recommended Architecture:**
|
||||
|
||||
Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:
|
||||
|
||||
1. **Discord Client** listens for messages and routes them through a detection pipeline
|
||||
2. **Message Event Handler** identifies when Vivi speaks (via PluralKit webhook + API verification)
|
||||
3. **Emoji Parser** extracts emoji sequences while preserving order and type information
|
||||
4. **Translation Engine** looks up meanings and composes responses
|
||||
5. **Database Layer** (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
|
||||
6. **Command Handler** (discord.py Cogs) allows teaching, querying, and configuration
|
||||
|
||||
The bot prioritizes:
|
||||
- **Reliability:** Graceful error handling, retry logic, database transactions
|
||||
- **Performance:** O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
|
||||
- **Scalability:** Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
|
||||
- **Maintainability:** Modular Cog architecture, clear component boundaries, comprehensive logging
|
||||
|
||||
Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.
|
||||
|
||||
The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [PluralKit API Reference](https://pluralkit.me/api/)
|
||||
- [discord.py Documentation](https://discordpy.readthedocs.io/)
|
||||
- [SQLAlchemy Async Documentation](https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html)
|
||||
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
|
||||
- [Redis Documentation](https://redis.io/documentation)
|
||||
Reference in New Issue
Block a user