Synthesized research findings from 4 parallel researcher agents: Key Findings: - Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration - Architecture: 7-component system with clear separation of concerns, async-native - Features: Rule-based learning system starting simple, avoiding context inference and ML - Pitfalls: 8 critical risks identified with phase assignments and prevention strategies Recommended Approach: - 5-phase build order (detection → translation → teaching → config → polish) - Focus on dysgraphia accessibility for teaching interface - Start with message detection reliability (Phase 1, load-bearing) - Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+ Confidence Levels: - Tech Stack: VERY HIGH (all production-proven, no experimental choices) - Architecture: VERY HIGH (mirrors successful production bots) - Features: HIGH (tight scope, transparent approach) - Roadmap: HIGH (logical phase progression with value delivery) Gaps to Address in Requirements: - Vivi's teaching UX preferences (dysgraphia-specific patterns) - Exact emoji coverage and naming conventions - Moderation/teaching permissions model - Multi-system scope and per-system customization needs Ready for requirements definition and roadmap creation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
39 KiB
Architecture Research: Vivi Speech Translator
Overview
Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.
This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.
Core Components
1. Discord Client
Responsibility: Establish and maintain the connection to Discord's API and WebSocket.
Key Details:
- Uses
discord.Clientordiscord.ext.commands.Botfrom discord.py library - Requires
Intentsconfiguration to specify which events the bot listens for:message_contentintent: Required to read message text (privileged intent, requires approval)guildsintent: Track guild membership and changesdirect_messagesintent: Listen for DMsdm_messagesintent: Read DM message content
- Initializes on startup and runs the main event loop via
client.run(token) - Handles connection failures and automatic reconnection
Why This Matters: Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.
2. Message Event Handler
Responsibility: Receive all messages, filter for relevance, and route to downstream processors.
Key Details:
- Implements
on_messageevent in discord.py (async callback) - Filters for:
- Webhook Detection: Check if
message.webhook_idis not None (indicates a proxied message) - PluralKit Verification: Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
- Vivi Detection: Check if the
member_idin the PluralKit response matches Vivi's registered member ID - Bot Self-Filter: Ignore messages from Vivi Speech Translator bot itself
- Webhook Detection: Check if
- Routes confirmed Vivi messages to the Emoji Parser
- Handles both guild channels and DMs
PluralKit Detection Approach: When a message is received, the bot can query the PluralKit API using the message ID:
GET https://api.pluralkit.me/v2/messages/{message_id}
This returns a Message object containing:
member: The member object that proxied the message (contains member_id, name, avatar, etc.)sender: The original user ID that sent the command (the account owner)system: The system that manages the memberstimestamp: When the message was sentguild: The guild ID where the message was sentchannel: The channel ID where the message was sent
By checking if response.member.id == vivi_member_id, the bot can verify Vivi specifically sent the message.
Rate Limiting: PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.
3. Emoji Parser
Responsibility: Extract and categorize emojis from a message into a structured sequence.
Key Details:
- Receives the confirmed Vivi message text from the Message Event Handler
- Uses regex patterns to extract:
- Unicode Emojis: Standard emoji characters (😷, ❌, etc.)
- Pattern:
\p{Extended_Pictographic}(matches full Unicode emoji range) - Alternative Python regex:
([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)
- Pattern:
- Custom Server Emojis: Discord custom emoji format
<:emoji_name:emoji_id>or<a:emoji_name:emoji_id>(for animated)- Pattern:
<a?:[^:\s]+:\d+>
- Pattern:
- Unicode Emojis: Standard emoji characters (😷, ❌, etc.)
- Preserves order of emojis as they appear left-to-right
- Returns a structured list like:
[{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}] - Handles edge cases:
- Emoji skin tone modifiers
- Zero-width joiners (ZWJ sequences like family emojis)
- Emoji variations
Why This Order Matters: The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.
4. Translation Engine
Responsibility: Convert emoji sequences into natural language using the emoji dictionary.
Key Details:
- Receives structured emoji list from Emoji Parser
- For each emoji:
- Look up its meaning in the Emoji Dictionary (database)
- Handle three cases:
- Known emoji: Include its meaning in output
- Unknown emoji: Display the emoji itself with a placeholder or skip
- Custom emoji: Look up by custom emoji ID in database
- Generates natural language output:
- If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
- If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
- If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the
/teachcommand."
- Considers emoji context (e.g., combination of emojis might have a specific meaning)
Output Format: The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).
5. Database Layer
Responsibility: Store and retrieve persistent data (emoji dictionary and server configurations).
Key Details:
- Tech Stack: SQLAlchemy ORM with PostgreSQL for production reliability
- Async Support: Use
sqlalchemy.ext.asyncioorasyncpgto avoid blocking the Discord event loop - Initialization: Override
Bot.start()or use asetup_hookto connect to database on startup - Connection Pooling: Configure connection pool to handle concurrent requests from message handlers
Two Core Tables:
-
emoji_dictionary
emoji_string(TEXT, PRIMARY KEY): The emoji character(s) or custom emoji formatcustom_emoji_id(BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)meaning(TEXT): The learned meaningcreated_at(TIMESTAMP): When first learnedupdated_at(TIMESTAMP): Last update timeupdated_by_user_id(BIGINT, NULLABLE): User ID of who taught/corrected thisupdated_by_member_id(TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)created_in_guild(BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)- Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
-
server_configuration
guild_id(BIGINT, PRIMARY KEY): Discord server IDauto_translate(BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require/translatecommandcreated_at(TIMESTAMP): When server config created- Updated by Configuration Command Handler
Important Design Decisions:
- Global dictionary: Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
- Per-server config: Each server has its own settings (auto vs. on-demand mode).
- User attribution: Track who taught each emoji for transparency and conflict resolution.
- No per-server emoji variants: The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.
6. Command Handler (Teaching & Configuration)
Responsibility: Process bot commands for teaching emojis and configuring server behavior.
Key Details:
-
Tech: Discord.py
commands.Cogextension for modular command organization -
Commands to Implement:
-
/teach <emoji_sequence> <meaning>- Extract emojis from the sequence using the Emoji Parser
- Insert or update each emoji in the database
- Confirm: "Learned: 😷 = sick, 2️⃣ = two, etc."
- Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
-
/forget <emoji>- Delete emoji from dictionary
- Confirm deletion
-
/meaning <emoji>- Look up and reply with the meaning of a specific emoji
- If unknown, reply: "I don't know that one yet."
-
/config auto-translate <on|off>- Update
server_configuration.auto_translatein database - Only server admins can change this
- Requires guild context (won't work in DMs)
- Update
-
/translate <emoji_sequence>(On-demand mode)- Manually trigger translation of an emoji sequence
- Works in both channels and DMs
-
-
Error Handling:
- Graceful failures if database is unavailable
- Clear user feedback for invalid emoji sequences
- Require proper permissions for sensitive commands
7. Configuration Layer
Responsibility: Load bot configuration (token, database connection string, etc.) at startup.
Key Details:
- Use environment variables for secrets:
DISCORD_TOKEN,DATABASE_URL,PLURALKIT_TOKEN(optional, for user-specific API calls) - Configuration file (e.g.,
config.jsonor.env) for non-secret settings:- Vivi's member ID (to filter for her messages specifically)
- Default auto-translate mode
- Logging level
- Initialize Config before starting the bot
Data Flow
Message Reception to Response
┌──────────────────────────────────────────────────────────────┐
│ Discord Message Event │
│ (user posts message proxied by PluralKit webhook) │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Message Event Handler │
│ - Check: webhook_id != None? │
│ - Query: PluralKit API for message info │
│ - Verify: member_id == Vivi's ID? │
│ - Filter: Ignore self-messages │
└──────────────────────────────────────────────────────────────┘
↓
(YES, Vivi's message)
↓
┌──────────────────────────────────────────────────────────────┐
│ Emoji Parser │
│ - Extract emojis with regex │
│ - Categorize: Unicode vs. custom │
│ - Preserve order │
│ - Output: [{type, value, id}, ...] │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Translation Engine │
│ - For each emoji: lookup in database │
│ - Compose natural language │
│ - Handle unknown emojis │
│ - Format response │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Database (emoji_dictionary) │
│ - O(1) lookup by emoji_string (hash indexed) │
│ - Return: meaning, metadata │
└──────────────────────────────────────────────────────────────┘
↓
(Lookup Results)
↓
┌──────────────────────────────────────────────────────────────┐
│ Response Formatting │
│ - Compose message │
│ - Check context: channel vs. DM │
│ - Apply server config: auto-translate mode │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Discord API Response │
│ - Send reply to channel or DM │
│ - Handle rate limits │
│ - Log interaction │
└──────────────────────────────────────────────────────────────┘
Teaching Flow (Command-Driven)
┌──────────────────────────────────────────────────────────────┐
│ User runs: /teach 😷 2️⃣ "Vivi is sick, not sinuses" │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Command Handler │
│ - Parse command arguments │
│ - Authenticate: Is user authorized to teach? │
│ - Extract emojis from sequence │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Database Layer │
│ - INSERT or UPDATE emoji_dictionary │
│ - Set: emoji_string, meaning, updated_by, timestamp │
│ - Commit transaction │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Confirmation Reply │
│ - "Learned: 😷 = sick, 2️⃣ = two" │
│ - Post in same context (channel or DM) │
└──────────────────────────────────────────────────────────────┘
PluralKit Integration Details
Detection Approach
-
Webhook Detection (First Filter):
- Check
message.webhook_idproperty in discord.py - If not None, message was sent via webhook (PluralKit proxy)
- Check
-
PluralKit API Query (Confirmation):
- Query endpoint:
GET https://api.pluralkit.me/v2/messages/{message_id} - The
message_idcan be the webhook message ID or the original message ID (original works for 30 minutes) - Parse response to get
memberobject
- Query endpoint:
-
Member Verification:
- Extract
member.idfrom API response - Compare with Vivi's known member ID (from config)
- If match: Process as Vivi's message
- If not match: Ignore (message from another member)
- Extract
-
Alternative: Member Names (Backup):
- If using member ID fails, fall back to checking
member.name - Look for "Vivi" or configured member name
- If using member ID fails, fall back to checking
API Endpoints Used
| Endpoint | Purpose | Rate Limit | Response |
|---|---|---|---|
GET /v2/messages/{message} |
Get proxied message info | 10/sec | Message object with member, sender, guild, channel, timestamp |
GET /v2/systems/@me |
Get authenticated system info | 10/sec | Full system + members (requires token) |
GET /v2/members/{member} |
Get specific member info | 10/sec | Member object with proxy tags, avatar, etc. |
Authentication (Optional):
- Public queries (member lookup) don't require authentication
- System-specific queries (private member settings) require system token via
Authorization: Bearer {token}header - For Vivi's system, store the system token in environment variable
PLURALKIT_TOKENfor authenticated access
Implementation in discord.py
import aiohttp
async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
"""Check if message was proxied by Vivi via PluralKit."""
# Step 1: Check if message is from a webhook
if message.webhook_id is None:
return False # Not proxied
# Step 2: Query PluralKit API
async with aiohttp.ClientSession() as session:
try:
async with session.get(
f"https://api.pluralkit.me/v2/messages/{message.id}"
) as resp:
if resp.status != 200:
return False # Not a PluralKit message
data = await resp.json()
# Step 3: Check member ID matches Vivi
if data.get("member", {}).get("id") == vivi_member_id:
return True
else:
return False
except Exception as e:
# Log error, but don't crash
print(f"PluralKit API error: {e}")
return False
Database Schema
emoji_dictionary Table
CREATE TABLE emoji_dictionary (
emoji_string TEXT PRIMARY KEY,
custom_emoji_id BIGINT NULLABLE,
meaning TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_by_user_id BIGINT NULLABLE,
updated_by_member_id TEXT NULLABLE,
created_in_guild BIGINT NULLABLE
);
CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);
server_configuration Table
CREATE TABLE server_configuration (
guild_id BIGINT PRIMARY KEY,
auto_translate BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_guild_id ON server_configuration(guild_id);
Alternative: SQLAlchemy ORM Definitions
from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime
Base = declarative_base()
class EmojiDictionary(Base):
__tablename__ = "emoji_dictionary"
emoji_string = Column(String, primary_key=True)
custom_emoji_id = Column(BigInteger, nullable=True)
meaning = Column(Text, nullable=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
updated_by_user_id = Column(BigInteger, nullable=True)
updated_by_member_id = Column(String, nullable=True)
created_in_guild = Column(BigInteger, nullable=True)
class ServerConfiguration(Base):
__tablename__ = "server_configuration"
guild_id = Column(BigInteger, primary_key=True)
auto_translate = Column(Boolean, default=True)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
Suggested Build Order
Phase 1: Foundation (Week 1-2)
Goal: Get Vivi messages detected and logged.
-
Set up Discord bot:
- Create Discord application and token
- Initialize discord.py Client/Bot with required Intents
- Implement basic
on_messageevent handler - Test basic logging
-
Implement PluralKit detection:
- Add webhook detection (check
message.webhook_id) - Add PluralKit API query and member verification
- Log when Vivi messages are detected
- Handle API errors gracefully
- Add webhook detection (check
-
Database initialization:
- Set up PostgreSQL database
- Create emoji_dictionary and server_configuration tables
- Test connection from bot
Deliverables: Bot logs every Vivi message to console; doesn't respond yet.
Phase 2: Emoji Parsing & Translation (Week 3-4)
Goal: Translate Vivi's emojis to text.
-
Emoji parsing:
- Implement regex patterns for Unicode and custom emojis
- Extract emoji sequences in order
- Test with various emoji types
-
Basic emoji lookup:
- Query emoji_dictionary table
- Return meanings for known emojis
- Handle unknown emojis
-
Response formatting:
- Compose natural language from emoji meanings
- Send reply to channel/DM
- Handle edge cases (no emojis, all unknown)
-
Manual testing:
- Create test emojis in database
- Post Vivi messages and verify translations
Deliverables: Bot translates Vivi messages; appears in channels and DMs.
Phase 3: Teaching Commands (Week 5-6)
Goal: Allow users to teach the bot emoji meanings.
-
Implement
/teachcommand:- Parse emoji sequences
- Insert into database with metadata
- Confirm to user
-
Implement
/meaningcommand:- Look up single emoji
- Reply with meaning or "not learned yet"
-
Implement
/forgetcommand:- Delete emoji from database
- Require admin or Vivi permission
-
Permission system:
- Restrict teaching to authorized users (Vivi + alters)
- Use Discord roles or user ID allowlist
Deliverables: Users can teach and query emoji meanings via commands.
Phase 4: Per-Server Configuration (Week 7)
Goal: Allow servers to opt into/out of auto-translation.
-
Implement
/config auto-translatecommand:- Toggle auto-translate on/off per server
- Requires admin permission
- Only works in guild context (not DMs)
-
Update message handler:
- Check server config before auto-translating
- Only reply if auto_translate == TRUE
- In DMs, always translate when
/translateis used
-
On-demand translation:
/translatecommand for manual translation- Works in any context
Deliverables: Servers can control translation behavior; bot respects preferences.
Phase 5: Polish & Edge Cases (Week 8+)
Goal: Handle real-world complexity.
-
Natural language formatting:
- Improve composition of translations
- Handle emoji modifiers (skin tones, ZWJ sequences)
- Custom emoji descriptions
-
Error handling & resilience:
- Database unavailability
- PluralKit API failures
- Rate limiting with exponential backoff
- Graceful degradation
-
Logging & monitoring:
- Structured logging for debugging
- Monitor database performance
- Track API error rates
-
Codebase refactoring:
- Move commands to separate Cogs
- Organize into modules:
cogs/teaching.py,cogs/config.py, etc. - Add docstrings and type hints
-
Testing:
- Unit tests for emoji parsing
- Integration tests for database queries
- End-to-end tests with Discord
Deliverables: Robust, maintainable codebase ready for production.
Scaling Considerations
Multi-Server Architecture
Challenge: Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.
Solution:
-
Shared Emoji Dictionary:
- Single global PostgreSQL database with all emoji meanings
- All servers query the same emoji_dictionary table
- Updates are reflected across all servers immediately
- Reduces redundancy and keeps meanings consistent
-
Per-Server Configuration:
- Each guild has its own row in server_configuration
- Fast lookup by guild_id (indexed)
- Allows servers to choose auto-translate vs. on-demand
-
Connection Pooling:
- SQLAlchemy async engine with
pool_size=20, max_overflow=10(tunable) - Reuses database connections across handlers
- Prevents connection exhaustion under load
- SQLAlchemy async engine with
Performance Optimization
-
Emoji Lookup Performance:
- Primary key index on emoji_dictionary.emoji_string for O(1) lookup
- Secondary index on custom_emoji_id for custom emoji queries
- Consider in-memory cache (Redis) if lookups become bottleneck:
- Query Redis first (1ms latency)
- Fall back to PostgreSQL
- Invalidate cache on updates
-
Caching Strategy (Optional, Post-MVP):
- Use Redis for frequently accessed emojis
- TTL: 1 hour (emoji meanings change rarely)
- Invalidate cache when
/teachor/forgetcommands update dictionary - Benefits: Reduced database load, lower latency
-
Rate Limiting:
- PluralKit API: 10 requests/second (already enforced by API)
- Discord API: 50 requests/minute per channel (built into discord.py)
- Implement local rate limiting with
asyncio.Semaphorefor PluralKit queries:semaphore = asyncio.Semaphore(5) # Max 5 concurrent PluralKit queries
-
Message Handler Optimization:
- Webhook detection (local check): ~0ms
- PluralKit API query: ~100-200ms (async, non-blocking)
- Emoji parsing (regex): ~1-5ms
- Database lookup: ~1-50ms
- Total: ~100-250ms per message (acceptable, happens in background)
Scaling Beyond 2,000 Guilds
Discord Requirement: Bots with 2,000+ guilds must implement sharding.
Sharding in discord.py:
- discord.py handles sharding automatically if configured
- Bot distributes connections across multiple "shards" to different Discord servers
- Each shard handles a subset of guilds
- Emoji dictionary remains shared across all shards (single database)
Example Configuration:
intents = discord.Intents.default()
bot = discord.AutoShardedBot(intents=intents) # Automatic sharding
# Bot will shard automatically based on guild count
Database Scaling
For Millions of Emojis:
- Table partitioning by emoji language/category (if dict grows huge)
- Read replicas for queries (if read-heavy)
- Consider denormalization (e.g., cache popular emoji meanings in memory)
Current Recommendation: Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.
Component Interaction Diagram
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Discord Server (Guild) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ #general │ │
│ │ Vivi (proxied): 😷 2️⃣ 🍑 ❌ 🤧 │ │
│ │ Vivi Speech Translator: "Vivi is sick, but not in ..." │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
↑ (message event)
│
┌──────────────────────────┴──────────────────────────────────────┐
│ Discord.py Bot Framework │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Discord Client │ │
│ │ - Maintains WebSocket connection │ │
│ │ - Routes events to handlers │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Message Event Handler (on_message) │ │
│ │ - Filter: webhook_id? │ │
│ │ - Query: PluralKit API │ │
│ │ - Verify: member_id == Vivi? │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Emoji Parser │ │
│ │ - Extract emojis (regex) │ │
│ │ - Categorize (Unicode/custom) │ │
│ │ - Preserve order │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Translation Engine │ │
│ │ - Lookup emojis in database │ │
│ │ - Compose natural language │ │
│ │ - Format response │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Command Handler (Cogs) │ │
│ │ - /teach: Learn emoji meanings │ │
│ │ - /meaning: Look up emoji │ │
│ │ - /forget: Delete emoji │ │
│ │ - /config: Server preferences │ │
│ │ - /translate: Manual translation │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────┬──────────────────────────────────┘
↓ (database queries/updates)
┌──────────────────────────────────────────────────────────────────┐
│ Database Layer │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ SQLAlchemy ORM + asyncio │ │
│ │ - Async connection pool │ │
│ │ - Connection reuse │ │
│ │ - Transaction management │ │
│ └────────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ PostgreSQL Database │ │
│ │ │ │
│ │ emoji_dictionary: server_configuration: │ │
│ │ ├─ 😷 → sick ├─ guild_123 → auto ON │ │
│ │ ├─ 2️⃣ → two ├─ guild_456 → auto OFF │ │
│ │ ├─ 🍑 → peach └─ guild_789 → auto ON │ │
│ │ ├─ ❌ → no │ │
│ │ └─ :me1: → Vivi (+ metadata, timestamps) │ │
│ │ (+ metadata, timestamps) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
External APIs:
┌────────────────────────────────────────────────────────────┐
│ PluralKit API (api.pluralkit.me) │
│ - GET /v2/messages/{id} → member info │
└────────────────────────────────────────────────────────────┘
Technology Stack Recommendations
| Layer | Component | Technology | Why |
|---|---|---|---|
| Bot Framework | Discord Integration | discord.py 2.x | Async-native, active community, rich feature set |
| Database ORM | Persistence | SQLAlchemy 2.0 + asyncio | Async support, type-safe, widely adopted |
| Database | Data Store | PostgreSQL | Reliable, open-source, JSONB for future extensibility |
| Async Runtime | Concurrency | asyncio (built-in) | Lightweight, integrated with discord.py |
| Caching | Performance (Phase 5+) | Redis | Fast in-memory lookups, TTL support, distributed |
| Logging | Debugging | Python logging module | Built-in, structured logging can extend |
| API Requests | HTTP Calls | aiohttp | Async-native, connection pooling |
| Testing | Quality Assurance | pytest + pytest-asyncio | Async test support, fixtures |
| Deployment | Hosting | Docker + systemd or cloud | Reproducible environment, easy updates |
Summary
Recommended Architecture:
Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:
- Discord Client listens for messages and routes them through a detection pipeline
- Message Event Handler identifies when Vivi speaks (via PluralKit webhook + API verification)
- Emoji Parser extracts emoji sequences while preserving order and type information
- Translation Engine looks up meanings and composes responses
- Database Layer (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
- Command Handler (discord.py Cogs) allows teaching, querying, and configuration
The bot prioritizes:
- Reliability: Graceful error handling, retry logic, database transactions
- Performance: O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
- Scalability: Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
- Maintainability: Modular Cog architecture, clear component boundaries, comprehensive logging
Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.
The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).