Files
Vivi-Speech/.planning/research/ARCHITECTURE.md
Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:02:32 -05:00

39 KiB
Raw Permalink Blame History

Architecture Research: Vivi Speech Translator

Overview

Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.

This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.


Core Components

1. Discord Client

Responsibility: Establish and maintain the connection to Discord's API and WebSocket.

Key Details:

  • Uses discord.Client or discord.ext.commands.Bot from discord.py library
  • Requires Intents configuration to specify which events the bot listens for:
    • message_content intent: Required to read message text (privileged intent, requires approval)
    • guilds intent: Track guild membership and changes
    • direct_messages intent: Listen for DMs
    • dm_messages intent: Read DM message content
  • Initializes on startup and runs the main event loop via client.run(token)
  • Handles connection failures and automatic reconnection

Why This Matters: Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.


2. Message Event Handler

Responsibility: Receive all messages, filter for relevance, and route to downstream processors.

Key Details:

  • Implements on_message event in discord.py (async callback)
  • Filters for:
    1. Webhook Detection: Check if message.webhook_id is not None (indicates a proxied message)
    2. PluralKit Verification: Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
    3. Vivi Detection: Check if the member_id in the PluralKit response matches Vivi's registered member ID
    4. Bot Self-Filter: Ignore messages from Vivi Speech Translator bot itself
  • Routes confirmed Vivi messages to the Emoji Parser
  • Handles both guild channels and DMs

PluralKit Detection Approach: When a message is received, the bot can query the PluralKit API using the message ID:

GET https://api.pluralkit.me/v2/messages/{message_id}

This returns a Message object containing:

  • member: The member object that proxied the message (contains member_id, name, avatar, etc.)
  • sender: The original user ID that sent the command (the account owner)
  • system: The system that manages the members
  • timestamp: When the message was sent
  • guild: The guild ID where the message was sent
  • channel: The channel ID where the message was sent

By checking if response.member.id == vivi_member_id, the bot can verify Vivi specifically sent the message.

Rate Limiting: PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.


3. Emoji Parser

Responsibility: Extract and categorize emojis from a message into a structured sequence.

Key Details:

  • Receives the confirmed Vivi message text from the Message Event Handler
  • Uses regex patterns to extract:
    1. Unicode Emojis: Standard emoji characters (😷, , etc.)
      • Pattern: \p{Extended_Pictographic} (matches full Unicode emoji range)
      • Alternative Python regex: ([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)
    2. Custom Server Emojis: Discord custom emoji format <:emoji_name:emoji_id> or <a:emoji_name:emoji_id> (for animated)
      • Pattern: <a?:[^:\s]+:\d+>
  • Preserves order of emojis as they appear left-to-right
  • Returns a structured list like: [{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}]
  • Handles edge cases:
    • Emoji skin tone modifiers
    • Zero-width joiners (ZWJ sequences like family emojis)
    • Emoji variations

Why This Order Matters: The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.


4. Translation Engine

Responsibility: Convert emoji sequences into natural language using the emoji dictionary.

Key Details:

  • Receives structured emoji list from Emoji Parser
  • For each emoji:
    1. Look up its meaning in the Emoji Dictionary (database)
    2. Handle three cases:
      • Known emoji: Include its meaning in output
      • Unknown emoji: Display the emoji itself with a placeholder or skip
      • Custom emoji: Look up by custom emoji ID in database
  • Generates natural language output:
    • If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
    • If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
    • If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the /teach command."
  • Considers emoji context (e.g., combination of emojis might have a specific meaning)

Output Format: The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).


5. Database Layer

Responsibility: Store and retrieve persistent data (emoji dictionary and server configurations).

Key Details:

  • Tech Stack: SQLAlchemy ORM with PostgreSQL for production reliability
  • Async Support: Use sqlalchemy.ext.asyncio or asyncpg to avoid blocking the Discord event loop
  • Initialization: Override Bot.start() or use a setup_hook to connect to database on startup
  • Connection Pooling: Configure connection pool to handle concurrent requests from message handlers

Two Core Tables:

  1. emoji_dictionary

    • emoji_string (TEXT, PRIMARY KEY): The emoji character(s) or custom emoji format
    • custom_emoji_id (BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)
    • meaning (TEXT): The learned meaning
    • created_at (TIMESTAMP): When first learned
    • updated_at (TIMESTAMP): Last update time
    • updated_by_user_id (BIGINT, NULLABLE): User ID of who taught/corrected this
    • updated_by_member_id (TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)
    • created_in_guild (BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)
    • Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
  2. server_configuration

    • guild_id (BIGINT, PRIMARY KEY): Discord server ID
    • auto_translate (BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require /translate command
    • created_at (TIMESTAMP): When server config created
    • Updated by Configuration Command Handler

Important Design Decisions:

  • Global dictionary: Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
  • Per-server config: Each server has its own settings (auto vs. on-demand mode).
  • User attribution: Track who taught each emoji for transparency and conflict resolution.
  • No per-server emoji variants: The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.

6. Command Handler (Teaching & Configuration)

Responsibility: Process bot commands for teaching emojis and configuring server behavior.

Key Details:

  • Tech: Discord.py commands.Cog extension for modular command organization

  • Commands to Implement:

    1. /teach <emoji_sequence> <meaning>

      • Extract emojis from the sequence using the Emoji Parser
      • Insert or update each emoji in the database
      • Confirm: "Learned: 😷 = sick, 2 = two, etc."
      • Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
    2. /forget <emoji>

      • Delete emoji from dictionary
      • Confirm deletion
    3. /meaning <emoji>

      • Look up and reply with the meaning of a specific emoji
      • If unknown, reply: "I don't know that one yet."
    4. /config auto-translate <on|off>

      • Update server_configuration.auto_translate in database
      • Only server admins can change this
      • Requires guild context (won't work in DMs)
    5. /translate <emoji_sequence> (On-demand mode)

      • Manually trigger translation of an emoji sequence
      • Works in both channels and DMs
  • Error Handling:

    • Graceful failures if database is unavailable
    • Clear user feedback for invalid emoji sequences
    • Require proper permissions for sensitive commands

7. Configuration Layer

Responsibility: Load bot configuration (token, database connection string, etc.) at startup.

Key Details:

  • Use environment variables for secrets: DISCORD_TOKEN, DATABASE_URL, PLURALKIT_TOKEN (optional, for user-specific API calls)
  • Configuration file (e.g., config.json or .env) for non-secret settings:
    • Vivi's member ID (to filter for her messages specifically)
    • Default auto-translate mode
    • Logging level
  • Initialize Config before starting the bot

Data Flow

Message Reception to Response

┌──────────────────────────────────────────────────────────────┐
│  Discord Message Event                                       │
│  (user posts message proxied by PluralKit webhook)          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Message Event Handler                                       │
│  - Check: webhook_id != None?                               │
│  - Query: PluralKit API for message info                    │
│  - Verify: member_id == Vivi's ID?                         │
│  - Filter: Ignore self-messages                            │
└──────────────────────────────────────────────────────────────┘
                            ↓
                     (YES, Vivi's message)
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Emoji Parser                                                │
│  - Extract emojis with regex                                │
│  - Categorize: Unicode vs. custom                           │
│  - Preserve order                                           │
│  - Output: [{type, value, id}, ...]                        │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Translation Engine                                          │
│  - For each emoji: lookup in database                       │
│  - Compose natural language                                 │
│  - Handle unknown emojis                                    │
│  - Format response                                          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Database (emoji_dictionary)                                 │
│  - O(1) lookup by emoji_string (hash indexed)              │
│  - Return: meaning, metadata                               │
└──────────────────────────────────────────────────────────────┘
                            ↓
                  (Lookup Results)
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Response Formatting                                         │
│  - Compose message                                           │
│  - Check context: channel vs. DM                            │
│  - Apply server config: auto-translate mode                 │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Discord API Response                                        │
│  - Send reply to channel or DM                              │
│  - Handle rate limits                                       │
│  - Log interaction                                          │
└──────────────────────────────────────────────────────────────┘

Teaching Flow (Command-Driven)

┌──────────────────────────────────────────────────────────────┐
│  User runs: /teach 😷 2⃣ "Vivi is sick, not sinuses"      │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Command Handler                                             │
│  - Parse command arguments                                  │
│  - Authenticate: Is user authorized to teach?              │
│  - Extract emojis from sequence                             │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Database Layer                                              │
│  - INSERT or UPDATE emoji_dictionary                        │
│  - Set: emoji_string, meaning, updated_by, timestamp       │
│  - Commit transaction                                       │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Confirmation Reply                                          │
│  - "Learned: 😷 = sick, 2⃣ = two"                           │
│  - Post in same context (channel or DM)                     │
└──────────────────────────────────────────────────────────────┘

PluralKit Integration Details

Detection Approach

  1. Webhook Detection (First Filter):

    • Check message.webhook_id property in discord.py
    • If not None, message was sent via webhook (PluralKit proxy)
  2. PluralKit API Query (Confirmation):

    • Query endpoint: GET https://api.pluralkit.me/v2/messages/{message_id}
    • The message_id can be the webhook message ID or the original message ID (original works for 30 minutes)
    • Parse response to get member object
  3. Member Verification:

    • Extract member.id from API response
    • Compare with Vivi's known member ID (from config)
    • If match: Process as Vivi's message
    • If not match: Ignore (message from another member)
  4. Alternative: Member Names (Backup):

    • If using member ID fails, fall back to checking member.name
    • Look for "Vivi" or configured member name

API Endpoints Used

Endpoint Purpose Rate Limit Response
GET /v2/messages/{message} Get proxied message info 10/sec Message object with member, sender, guild, channel, timestamp
GET /v2/systems/@me Get authenticated system info 10/sec Full system + members (requires token)
GET /v2/members/{member} Get specific member info 10/sec Member object with proxy tags, avatar, etc.

Authentication (Optional):

  • Public queries (member lookup) don't require authentication
  • System-specific queries (private member settings) require system token via Authorization: Bearer {token} header
  • For Vivi's system, store the system token in environment variable PLURALKIT_TOKEN for authenticated access

Implementation in discord.py

import aiohttp

async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
    """Check if message was proxied by Vivi via PluralKit."""

    # Step 1: Check if message is from a webhook
    if message.webhook_id is None:
        return False  # Not proxied

    # Step 2: Query PluralKit API
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(
                f"https://api.pluralkit.me/v2/messages/{message.id}"
            ) as resp:
                if resp.status != 200:
                    return False  # Not a PluralKit message

                data = await resp.json()

                # Step 3: Check member ID matches Vivi
                if data.get("member", {}).get("id") == vivi_member_id:
                    return True
                else:
                    return False
        except Exception as e:
            # Log error, but don't crash
            print(f"PluralKit API error: {e}")
            return False

Database Schema

emoji_dictionary Table

CREATE TABLE emoji_dictionary (
    emoji_string TEXT PRIMARY KEY,
    custom_emoji_id BIGINT NULLABLE,
    meaning TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_by_user_id BIGINT NULLABLE,
    updated_by_member_id TEXT NULLABLE,
    created_in_guild BIGINT NULLABLE
);

CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);

server_configuration Table

CREATE TABLE server_configuration (
    guild_id BIGINT PRIMARY KEY,
    auto_translate BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_guild_id ON server_configuration(guild_id);

Alternative: SQLAlchemy ORM Definitions

from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime

Base = declarative_base()

class EmojiDictionary(Base):
    __tablename__ = "emoji_dictionary"

    emoji_string = Column(String, primary_key=True)
    custom_emoji_id = Column(BigInteger, nullable=True)
    meaning = Column(Text, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    updated_by_user_id = Column(BigInteger, nullable=True)
    updated_by_member_id = Column(String, nullable=True)
    created_in_guild = Column(BigInteger, nullable=True)

class ServerConfiguration(Base):
    __tablename__ = "server_configuration"

    guild_id = Column(BigInteger, primary_key=True)
    auto_translate = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

Suggested Build Order

Phase 1: Foundation (Week 1-2)

Goal: Get Vivi messages detected and logged.

  1. Set up Discord bot:

    • Create Discord application and token
    • Initialize discord.py Client/Bot with required Intents
    • Implement basic on_message event handler
    • Test basic logging
  2. Implement PluralKit detection:

    • Add webhook detection (check message.webhook_id)
    • Add PluralKit API query and member verification
    • Log when Vivi messages are detected
    • Handle API errors gracefully
  3. Database initialization:

    • Set up PostgreSQL database
    • Create emoji_dictionary and server_configuration tables
    • Test connection from bot

Deliverables: Bot logs every Vivi message to console; doesn't respond yet.


Phase 2: Emoji Parsing & Translation (Week 3-4)

Goal: Translate Vivi's emojis to text.

  1. Emoji parsing:

    • Implement regex patterns for Unicode and custom emojis
    • Extract emoji sequences in order
    • Test with various emoji types
  2. Basic emoji lookup:

    • Query emoji_dictionary table
    • Return meanings for known emojis
    • Handle unknown emojis
  3. Response formatting:

    • Compose natural language from emoji meanings
    • Send reply to channel/DM
    • Handle edge cases (no emojis, all unknown)
  4. Manual testing:

    • Create test emojis in database
    • Post Vivi messages and verify translations

Deliverables: Bot translates Vivi messages; appears in channels and DMs.


Phase 3: Teaching Commands (Week 5-6)

Goal: Allow users to teach the bot emoji meanings.

  1. Implement /teach command:

    • Parse emoji sequences
    • Insert into database with metadata
    • Confirm to user
  2. Implement /meaning command:

    • Look up single emoji
    • Reply with meaning or "not learned yet"
  3. Implement /forget command:

    • Delete emoji from database
    • Require admin or Vivi permission
  4. Permission system:

    • Restrict teaching to authorized users (Vivi + alters)
    • Use Discord roles or user ID allowlist

Deliverables: Users can teach and query emoji meanings via commands.


Phase 4: Per-Server Configuration (Week 7)

Goal: Allow servers to opt into/out of auto-translation.

  1. Implement /config auto-translate command:

    • Toggle auto-translate on/off per server
    • Requires admin permission
    • Only works in guild context (not DMs)
  2. Update message handler:

    • Check server config before auto-translating
    • Only reply if auto_translate == TRUE
    • In DMs, always translate when /translate is used
  3. On-demand translation:

    • /translate command for manual translation
    • Works in any context

Deliverables: Servers can control translation behavior; bot respects preferences.


Phase 5: Polish & Edge Cases (Week 8+)

Goal: Handle real-world complexity.

  1. Natural language formatting:

    • Improve composition of translations
    • Handle emoji modifiers (skin tones, ZWJ sequences)
    • Custom emoji descriptions
  2. Error handling & resilience:

    • Database unavailability
    • PluralKit API failures
    • Rate limiting with exponential backoff
    • Graceful degradation
  3. Logging & monitoring:

    • Structured logging for debugging
    • Monitor database performance
    • Track API error rates
  4. Codebase refactoring:

    • Move commands to separate Cogs
    • Organize into modules: cogs/teaching.py, cogs/config.py, etc.
    • Add docstrings and type hints
  5. Testing:

    • Unit tests for emoji parsing
    • Integration tests for database queries
    • End-to-end tests with Discord

Deliverables: Robust, maintainable codebase ready for production.


Scaling Considerations

Multi-Server Architecture

Challenge: Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.

Solution:

  1. Shared Emoji Dictionary:

    • Single global PostgreSQL database with all emoji meanings
    • All servers query the same emoji_dictionary table
    • Updates are reflected across all servers immediately
    • Reduces redundancy and keeps meanings consistent
  2. Per-Server Configuration:

    • Each guild has its own row in server_configuration
    • Fast lookup by guild_id (indexed)
    • Allows servers to choose auto-translate vs. on-demand
  3. Connection Pooling:

    • SQLAlchemy async engine with pool_size=20, max_overflow=10 (tunable)
    • Reuses database connections across handlers
    • Prevents connection exhaustion under load

Performance Optimization

  1. Emoji Lookup Performance:

    • Primary key index on emoji_dictionary.emoji_string for O(1) lookup
    • Secondary index on custom_emoji_id for custom emoji queries
    • Consider in-memory cache (Redis) if lookups become bottleneck:
      • Query Redis first (1ms latency)
      • Fall back to PostgreSQL
      • Invalidate cache on updates
  2. Caching Strategy (Optional, Post-MVP):

    • Use Redis for frequently accessed emojis
    • TTL: 1 hour (emoji meanings change rarely)
    • Invalidate cache when /teach or /forget commands update dictionary
    • Benefits: Reduced database load, lower latency
  3. Rate Limiting:

    • PluralKit API: 10 requests/second (already enforced by API)
    • Discord API: 50 requests/minute per channel (built into discord.py)
    • Implement local rate limiting with asyncio.Semaphore for PluralKit queries:
      semaphore = asyncio.Semaphore(5)  # Max 5 concurrent PluralKit queries
      
  4. Message Handler Optimization:

    • Webhook detection (local check): ~0ms
    • PluralKit API query: ~100-200ms (async, non-blocking)
    • Emoji parsing (regex): ~1-5ms
    • Database lookup: ~1-50ms
    • Total: ~100-250ms per message (acceptable, happens in background)

Scaling Beyond 2,000 Guilds

Discord Requirement: Bots with 2,000+ guilds must implement sharding.

Sharding in discord.py:

  • discord.py handles sharding automatically if configured
  • Bot distributes connections across multiple "shards" to different Discord servers
  • Each shard handles a subset of guilds
  • Emoji dictionary remains shared across all shards (single database)

Example Configuration:

intents = discord.Intents.default()
bot = discord.AutoShardedBot(intents=intents)  # Automatic sharding

# Bot will shard automatically based on guild count

Database Scaling

For Millions of Emojis:

  • Table partitioning by emoji language/category (if dict grows huge)
  • Read replicas for queries (if read-heavy)
  • Consider denormalization (e.g., cache popular emoji meanings in memory)

Current Recommendation: Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.


Component Interaction Diagram

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Discord Server (Guild)                                         │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ #general                                                 │  │
│  │ Vivi (proxied): 😷 2⃣ 🍑 ❌ 🤧                         │  │
│  │ Vivi Speech Translator: "Vivi is sick, but not in ..."  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                            ↑ (message event)
                            │
┌──────────────────────────┴──────────────────────────────────────┐
│  Discord.py Bot Framework                                       │
│                                                                 │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Discord Client                                         │    │
│  │ - Maintains WebSocket connection                       │    │
│  │ - Routes events to handlers                            │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Message Event Handler (on_message)                     │    │
│  │ - Filter: webhook_id?                                 │    │
│  │ - Query: PluralKit API                                │    │
│  │ - Verify: member_id == Vivi?                          │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Emoji Parser                                           │    │
│  │ - Extract emojis (regex)                              │    │
│  │ - Categorize (Unicode/custom)                         │    │
│  │ - Preserve order                                      │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Translation Engine                                     │    │
│  │ - Lookup emojis in database                           │    │
│  │ - Compose natural language                            │    │
│  │ - Format response                                     │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Command Handler (Cogs)                                 │    │
│  │ - /teach: Learn emoji meanings                        │    │
│  │ - /meaning: Look up emoji                             │    │
│  │ - /forget: Delete emoji                               │    │
│  │ - /config: Server preferences                         │    │
│  │ - /translate: Manual translation                      │    │
│  └────────────────────────────────────────────────────────┘    │
│                                                                 │
└──────────────────────────────┬──────────────────────────────────┘
                               ↓ (database queries/updates)
┌──────────────────────────────────────────────────────────────────┐
│  Database Layer                                                  │
│                                                                  │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ SQLAlchemy ORM + asyncio                               │     │
│  │ - Async connection pool                                │     │
│  │ - Connection reuse                                     │     │
│  │ - Transaction management                              │     │
│  └────────────────────────────────────────────────────────┘     │
│                            ↓                                      │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ PostgreSQL Database                                    │     │
│  │                                                        │     │
│  │  emoji_dictionary:          server_configuration:     │     │
│  │  ├─ 😷 → sick               ├─ guild_123 → auto ON   │     │
│  │  ├─ 2⃣ → two               ├─ guild_456 → auto OFF  │     │
│  │  ├─ 🍑 → peach             └─ guild_789 → auto ON    │     │
│  │  ├─ ❌ → no                                            │     │
│  │  └─ :me1: → Vivi           (+ metadata, timestamps)   │     │
│  │  (+ metadata, timestamps)                             │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

External APIs:
┌────────────────────────────────────────────────────────────┐
│ PluralKit API (api.pluralkit.me)                           │
│ - GET /v2/messages/{id} → member info                     │
└────────────────────────────────────────────────────────────┘

Technology Stack Recommendations

Layer Component Technology Why
Bot Framework Discord Integration discord.py 2.x Async-native, active community, rich feature set
Database ORM Persistence SQLAlchemy 2.0 + asyncio Async support, type-safe, widely adopted
Database Data Store PostgreSQL Reliable, open-source, JSONB for future extensibility
Async Runtime Concurrency asyncio (built-in) Lightweight, integrated with discord.py
Caching Performance (Phase 5+) Redis Fast in-memory lookups, TTL support, distributed
Logging Debugging Python logging module Built-in, structured logging can extend
API Requests HTTP Calls aiohttp Async-native, connection pooling
Testing Quality Assurance pytest + pytest-asyncio Async test support, fixtures
Deployment Hosting Docker + systemd or cloud Reproducible environment, easy updates

Summary

Recommended Architecture:

Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:

  1. Discord Client listens for messages and routes them through a detection pipeline
  2. Message Event Handler identifies when Vivi speaks (via PluralKit webhook + API verification)
  3. Emoji Parser extracts emoji sequences while preserving order and type information
  4. Translation Engine looks up meanings and composes responses
  5. Database Layer (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
  6. Command Handler (discord.py Cogs) allows teaching, querying, and configuration

The bot prioritizes:

  • Reliability: Graceful error handling, retry logic, database transactions
  • Performance: O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
  • Scalability: Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
  • Maintainability: Modular Cog architecture, clear component boundaries, comprehensive logging

Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.

The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).