Files

Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:02:32 -05:00

39 KiB

Raw Permalink Blame History

Architecture Research: Vivi Speech Translator

Overview

Vivi Speech Translator is a Discord bot that detects emoji-based messages proxied by PluralKit, parses emoji sequences, looks up their meanings in a persistent global dictionary, and replies with natural language translations. The bot must operate across multiple servers, handle both channel and DM messages, and learn new emoji meanings over time.

This document outlines the recommended high-level architecture, component responsibilities, data flows, and scaling strategies.

Core Components

1. Discord Client

Responsibility: Establish and maintain the connection to Discord's API and WebSocket.

Key Details:

Uses discord.Client or discord.ext.commands.Bot from discord.py library
Requires Intents configuration to specify which events the bot listens for:
- message_content intent: Required to read message text (privileged intent, requires approval)
- guilds intent: Track guild membership and changes
- direct_messages intent: Listen for DMs
- dm_messages intent: Read DM message content
Initializes on startup and runs the main event loop via client.run(token)
Handles connection failures and automatic reconnection

Why This Matters: Discord's event-driven architecture means the Client is the foundation—without it, the bot cannot receive any messages or respond to events.

2. Message Event Handler

Responsibility: Receive all messages, filter for relevance, and route to downstream processors.

Key Details:

Implements on_message event in discord.py (async callback)
Filters for:
1. Webhook Detection: Check if message.webhook_id is not None (indicates a proxied message)
2. PluralKit Verification: Query PluralKit API to confirm message was proxied by PluralKit (not another webhook system)
3. Vivi Detection: Check if the member_id in the PluralKit response matches Vivi's registered member ID
4. Bot Self-Filter: Ignore messages from Vivi Speech Translator bot itself
Routes confirmed Vivi messages to the Emoji Parser
Handles both guild channels and DMs

PluralKit Detection Approach: When a message is received, the bot can query the PluralKit API using the message ID:

GET https://api.pluralkit.me/v2/messages/{message_id}

This returns a Message object containing:

member: The member object that proxied the message (contains member_id, name, avatar, etc.)
sender: The original user ID that sent the command (the account owner)
system: The system that manages the members
timestamp: When the message was sent
guild: The guild ID where the message was sent
channel: The channel ID where the message was sent

By checking if response.member.id == vivi_member_id, the bot can verify Vivi specifically sent the message.

Rate Limiting: PluralKit API has a 10/second rate limit for message lookups. The bot should handle rate limit responses gracefully with exponential backoff.

3. Emoji Parser

Responsibility: Extract and categorize emojis from a message into a structured sequence.

Key Details:

Receives the confirmed Vivi message text from the Message Event Handler
Uses regex patterns to extract:
1. Unicode Emojis: Standard emoji characters (😷, ❌, etc.)
  - Pattern: \p{Extended_Pictographic} (matches full Unicode emoji range)
  - Alternative Python regex: ([\u00a9\u00ae\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff]|\ufe0f)
2. Custom Server Emojis: Discord custom emoji format <:emoji_name:emoji_id> or <a:emoji_name:emoji_id> (for animated)
  - Pattern: <a?:[^:\s]+:\d+>
Preserves order of emojis as they appear left-to-right
Returns a structured list like: [{type: "emoji", value: "😷", id: None}, {type: "custom", value: "me1", id: "123456789"}]
Handles edge cases:
- Emoji skin tone modifiers
- Zero-width joiners (ZWJ sequences like family emojis)
- Emoji variations

Why This Order Matters: The project spec notes that emoji sequences are compositional and context-dependent. Preserving order and distinguishing types allows the Translation Engine to understand the full intended meaning.

4. Translation Engine

Responsibility: Convert emoji sequences into natural language using the emoji dictionary.

Key Details:

Receives structured emoji list from Emoji Parser
For each emoji:
1. Look up its meaning in the Emoji Dictionary (database)
2. Handle three cases:
  - Known emoji: Include its meaning in output
  - Unknown emoji: Display the emoji itself with a placeholder or skip
  - Custom emoji: Look up by custom emoji ID in database
Generates natural language output:
- If all emojis are known: Compose as a sentence ("Vivi is sick, but not in the sinuses")
- If some are unknown: Format as: "Known meanings: ... [Unknown emoji] ..."
- If none are known: Reply: "I don't know what these emojis mean yet. You can teach me with the /teach command."
Considers emoji context (e.g., combination of emojis might have a specific meaning)

Output Format: The bot should reply in a Discord message, either in the same channel (if public) or as a DM (if DM context).

5. Database Layer

Responsibility: Store and retrieve persistent data (emoji dictionary and server configurations).

Key Details:

Tech Stack: SQLAlchemy ORM with PostgreSQL for production reliability
Async Support: Use sqlalchemy.ext.asyncio or asyncpg to avoid blocking the Discord event loop
Initialization: Override Bot.start() or use a setup_hook to connect to database on startup
Connection Pooling: Configure connection pool to handle concurrent requests from message handlers

Two Core Tables:

emoji_dictionary
- emoji_string (TEXT, PRIMARY KEY): The emoji character(s) or custom emoji format
- custom_emoji_id (BIGINT, NULLABLE): Discord custom emoji ID (if custom emoji)
- meaning (TEXT): The learned meaning
- created_at (TIMESTAMP): When first learned
- updated_at (TIMESTAMP): Last update time
- updated_by_user_id (BIGINT, NULLABLE): User ID of who taught/corrected this
- updated_by_member_id (TEXT, NULLABLE): PluralKit member ID (e.g., Vivi's ID)
- created_in_guild (BIGINT, NULLABLE): Guild ID where first learned (for tracking origin, optional)
- Indexes: emoji_string (for fast lookups), custom_emoji_id (for custom emoji queries)
server_configuration
- guild_id (BIGINT, PRIMARY KEY): Discord server ID
- auto_translate (BOOLEAN, DEFAULT TRUE): Auto-translate all Vivi messages or require /translate command
- created_at (TIMESTAMP): When server config created
- Updated by Configuration Command Handler

Important Design Decisions:

Global dictionary: Emoji meanings are shared across all servers. Different systems can update meanings, but there's a single source of truth per emoji.
Per-server config: Each server has its own settings (auto vs. on-demand mode).
User attribution: Track who taught each emoji for transparency and conflict resolution.
No per-server emoji variants: The spec intends a global dictionary, so "😷" means the same thing everywhere. Per-server overrides could be added later if needed.

6. Command Handler (Teaching & Configuration)

Responsibility: Process bot commands for teaching emojis and configuring server behavior.

Key Details:

Tech: Discord.py commands.Cog extension for modular command organization
Commands to Implement:
1. /teach <emoji_sequence> <meaning>
  - Extract emojis from the sequence using the Emoji Parser
  - Insert or update each emoji in the database
  - Confirm: "Learned: 😷 = sick, 2️⃣ = two, etc."
  - Only Vivi or approved users should be able to teach (can be restricted by user role or system authentication)
2. /forget <emoji>
  - Delete emoji from dictionary
  - Confirm deletion
3. /meaning <emoji>
  - Look up and reply with the meaning of a specific emoji
  - If unknown, reply: "I don't know that one yet."
4. /config auto-translate <on|off>
  - Update server_configuration.auto_translate in database
  - Only server admins can change this
  - Requires guild context (won't work in DMs)
5. /translate <emoji_sequence> (On-demand mode)
  - Manually trigger translation of an emoji sequence
  - Works in both channels and DMs
Error Handling:
- Graceful failures if database is unavailable
- Clear user feedback for invalid emoji sequences
- Require proper permissions for sensitive commands

7. Configuration Layer

Responsibility: Load bot configuration (token, database connection string, etc.) at startup.

Key Details:

Use environment variables for secrets: DISCORD_TOKEN, DATABASE_URL, PLURALKIT_TOKEN (optional, for user-specific API calls)
Configuration file (e.g., config.json or .env) for non-secret settings:
- Vivi's member ID (to filter for her messages specifically)
- Default auto-translate mode
- Logging level
Initialize Config before starting the bot

Data Flow

Message Reception to Response

┌──────────────────────────────────────────────────────────────┐
│  Discord Message Event                                       │
│  (user posts message proxied by PluralKit webhook)          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Message Event Handler                                       │
│  - Check: webhook_id != None?                               │
│  - Query: PluralKit API for message info                    │
│  - Verify: member_id == Vivi's ID?                         │
│  - Filter: Ignore self-messages                            │
└──────────────────────────────────────────────────────────────┘
                            ↓
                     (YES, Vivi's message)
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Emoji Parser                                                │
│  - Extract emojis with regex                                │
│  - Categorize: Unicode vs. custom                           │
│  - Preserve order                                           │
│  - Output: [{type, value, id}, ...]                        │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Translation Engine                                          │
│  - For each emoji: lookup in database                       │
│  - Compose natural language                                 │
│  - Handle unknown emojis                                    │
│  - Format response                                          │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Database (emoji_dictionary)                                 │
│  - O(1) lookup by emoji_string (hash indexed)              │
│  - Return: meaning, metadata                               │
└──────────────────────────────────────────────────────────────┘
                            ↓
                  (Lookup Results)
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Response Formatting                                         │
│  - Compose message                                           │
│  - Check context: channel vs. DM                            │
│  - Apply server config: auto-translate mode                 │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Discord API Response                                        │
│  - Send reply to channel or DM                              │
│  - Handle rate limits                                       │
│  - Log interaction                                          │
└──────────────────────────────────────────────────────────────┘

Teaching Flow (Command-Driven)

┌──────────────────────────────────────────────────────────────┐
│  User runs: /teach 😷 2️⃣ "Vivi is sick, not sinuses"      │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Command Handler                                             │
│  - Parse command arguments                                  │
│  - Authenticate: Is user authorized to teach?              │
│  - Extract emojis from sequence                             │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Database Layer                                              │
│  - INSERT or UPDATE emoji_dictionary                        │
│  - Set: emoji_string, meaning, updated_by, timestamp       │
│  - Commit transaction                                       │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│  Confirmation Reply                                          │
│  - "Learned: 😷 = sick, 2️⃣ = two"                           │
│  - Post in same context (channel or DM)                     │
└──────────────────────────────────────────────────────────────┘

PluralKit Integration Details

Detection Approach

Webhook Detection (First Filter):
- Check message.webhook_id property in discord.py
- If not None, message was sent via webhook (PluralKit proxy)
PluralKit API Query (Confirmation):
- Query endpoint: GET https://api.pluralkit.me/v2/messages/{message_id}
- The message_id can be the webhook message ID or the original message ID (original works for 30 minutes)
- Parse response to get member object
Member Verification:
- Extract member.id from API response
- Compare with Vivi's known member ID (from config)
- If match: Process as Vivi's message
- If not match: Ignore (message from another member)
Alternative: Member Names (Backup):
- If using member ID fails, fall back to checking member.name
- Look for "Vivi" or configured member name

API Endpoints Used

Endpoint	Purpose	Rate Limit	Response
`GET /v2/messages/{message}`	Get proxied message info	10/sec	Message object with member, sender, guild, channel, timestamp
`GET /v2/systems/@me`	Get authenticated system info	10/sec	Full system + members (requires token)
`GET /v2/members/{member}`	Get specific member info	10/sec	Member object with proxy tags, avatar, etc.

Authentication (Optional):

Public queries (member lookup) don't require authentication
System-specific queries (private member settings) require system token via Authorization: Bearer {token} header
For Vivi's system, store the system token in environment variable PLURALKIT_TOKEN for authenticated access

Implementation in discord.py

import aiohttp

async def check_vivi_message(message: discord.Message, vivi_member_id: str) -> bool:
    """Check if message was proxied by Vivi via PluralKit."""

    # Step 1: Check if message is from a webhook
    if message.webhook_id is None:
        return False  # Not proxied

    # Step 2: Query PluralKit API
    async with aiohttp.ClientSession() as session:
        try:
            async with session.get(
                f"https://api.pluralkit.me/v2/messages/{message.id}"
            ) as resp:
                if resp.status != 200:
                    return False  # Not a PluralKit message

                data = await resp.json()

                # Step 3: Check member ID matches Vivi
                if data.get("member", {}).get("id") == vivi_member_id:
                    return True
                else:
                    return False
        except Exception as e:
            # Log error, but don't crash
            print(f"PluralKit API error: {e}")
            return False

Database Schema

emoji_dictionary Table

CREATE TABLE emoji_dictionary (
    emoji_string TEXT PRIMARY KEY,
    custom_emoji_id BIGINT NULLABLE,
    meaning TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_by_user_id BIGINT NULLABLE,
    updated_by_member_id TEXT NULLABLE,
    created_in_guild BIGINT NULLABLE
);

CREATE INDEX idx_emoji_string ON emoji_dictionary(emoji_string);
CREATE INDEX idx_custom_emoji_id ON emoji_dictionary(custom_emoji_id);

server_configuration Table

CREATE TABLE server_configuration (
    guild_id BIGINT PRIMARY KEY,
    auto_translate BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_guild_id ON server_configuration(guild_id);

Alternative: SQLAlchemy ORM Definitions

from sqlalchemy import Column, String, BigInteger, Boolean, DateTime, Text
from sqlalchemy.ext.declarative import declarative_base
from datetime import datetime

Base = declarative_base()

class EmojiDictionary(Base):
    __tablename__ = "emoji_dictionary"

    emoji_string = Column(String, primary_key=True)
    custom_emoji_id = Column(BigInteger, nullable=True)
    meaning = Column(Text, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    updated_by_user_id = Column(BigInteger, nullable=True)
    updated_by_member_id = Column(String, nullable=True)
    created_in_guild = Column(BigInteger, nullable=True)

class ServerConfiguration(Base):
    __tablename__ = "server_configuration"

    guild_id = Column(BigInteger, primary_key=True)
    auto_translate = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

Suggested Build Order

Phase 1: Foundation (Week 1-2)

Goal: Get Vivi messages detected and logged.

Set up Discord bot:
- Create Discord application and token
- Initialize discord.py Client/Bot with required Intents
- Implement basic on_message event handler
- Test basic logging
Implement PluralKit detection:
- Add webhook detection (check message.webhook_id)
- Add PluralKit API query and member verification
- Log when Vivi messages are detected
- Handle API errors gracefully
Database initialization:
- Set up PostgreSQL database
- Create emoji_dictionary and server_configuration tables
- Test connection from bot

Deliverables: Bot logs every Vivi message to console; doesn't respond yet.

Phase 2: Emoji Parsing & Translation (Week 3-4)

Goal: Translate Vivi's emojis to text.

Emoji parsing:
- Implement regex patterns for Unicode and custom emojis
- Extract emoji sequences in order
- Test with various emoji types
Basic emoji lookup:
- Query emoji_dictionary table
- Return meanings for known emojis
- Handle unknown emojis
Response formatting:
- Compose natural language from emoji meanings
- Send reply to channel/DM
- Handle edge cases (no emojis, all unknown)
Manual testing:
- Create test emojis in database
- Post Vivi messages and verify translations

Deliverables: Bot translates Vivi messages; appears in channels and DMs.

Phase 3: Teaching Commands (Week 5-6)

Goal: Allow users to teach the bot emoji meanings.

Implement /teach command:
- Parse emoji sequences
- Insert into database with metadata
- Confirm to user
Implement /meaning command:
- Look up single emoji
- Reply with meaning or "not learned yet"
Implement /forget command:
- Delete emoji from database
- Require admin or Vivi permission
Permission system:
- Restrict teaching to authorized users (Vivi + alters)
- Use Discord roles or user ID allowlist

Deliverables: Users can teach and query emoji meanings via commands.

Phase 4: Per-Server Configuration (Week 7)

Goal: Allow servers to opt into/out of auto-translation.

Implement /config auto-translate command:
- Toggle auto-translate on/off per server
- Requires admin permission
- Only works in guild context (not DMs)
Update message handler:
- Check server config before auto-translating
- Only reply if auto_translate == TRUE
- In DMs, always translate when /translate is used
On-demand translation:
- /translate command for manual translation
- Works in any context

Deliverables: Servers can control translation behavior; bot respects preferences.

Phase 5: Polish & Edge Cases (Week 8+)

Goal: Handle real-world complexity.

Natural language formatting:
- Improve composition of translations
- Handle emoji modifiers (skin tones, ZWJ sequences)
- Custom emoji descriptions
Error handling & resilience:
- Database unavailability
- PluralKit API failures
- Rate limiting with exponential backoff
- Graceful degradation
Logging & monitoring:
- Structured logging for debugging
- Monitor database performance
- Track API error rates
Codebase refactoring:
- Move commands to separate Cogs
- Organize into modules: cogs/teaching.py, cogs/config.py, etc.
- Add docstrings and type hints
Testing:
- Unit tests for emoji parsing
- Integration tests for database queries
- End-to-end tests with Discord

Deliverables: Robust, maintainable codebase ready for production.

Scaling Considerations

Multi-Server Architecture

Challenge: Bot will operate in many Discord servers simultaneously, each with potentially thousands of members.

Solution:

Shared Emoji Dictionary:
- Single global PostgreSQL database with all emoji meanings
- All servers query the same emoji_dictionary table
- Updates are reflected across all servers immediately
- Reduces redundancy and keeps meanings consistent
Per-Server Configuration:
- Each guild has its own row in server_configuration
- Fast lookup by guild_id (indexed)
- Allows servers to choose auto-translate vs. on-demand
Connection Pooling:
- SQLAlchemy async engine with pool_size=20, max_overflow=10 (tunable)
- Reuses database connections across handlers
- Prevents connection exhaustion under load

Performance Optimization

Emoji Lookup Performance:
- Primary key index on emoji_dictionary.emoji_string for O(1) lookup
- Secondary index on custom_emoji_id for custom emoji queries
- Consider in-memory cache (Redis) if lookups become bottleneck:
  - Query Redis first (1ms latency)
  - Fall back to PostgreSQL
  - Invalidate cache on updates
Caching Strategy (Optional, Post-MVP):
- Use Redis for frequently accessed emojis
- TTL: 1 hour (emoji meanings change rarely)
- Invalidate cache when /teach or /forget commands update dictionary
- Benefits: Reduced database load, lower latency
Rate Limiting:
- PluralKit API: 10 requests/second (already enforced by API)
- Discord API: 50 requests/minute per channel (built into discord.py)
- Implement local rate limiting with asyncio.Semaphore for PluralKit queries:
```
semaphore = asyncio.Semaphore(5)  # Max 5 concurrent PluralKit queries
```
Message Handler Optimization:
- Webhook detection (local check): ~0ms
- PluralKit API query: ~100-200ms (async, non-blocking)
- Emoji parsing (regex): ~1-5ms
- Database lookup: ~1-50ms
- Total: ~100-250ms per message (acceptable, happens in background)

Scaling Beyond 2,000 Guilds

Discord Requirement: Bots with 2,000+ guilds must implement sharding.

Sharding in discord.py:

discord.py handles sharding automatically if configured
Bot distributes connections across multiple "shards" to different Discord servers
Each shard handles a subset of guilds
Emoji dictionary remains shared across all shards (single database)

Example Configuration:

intents = discord.Intents.default()
bot = discord.AutoShardedBot(intents=intents)  # Automatic sharding

# Bot will shard automatically based on guild count

Database Scaling

For Millions of Emojis:

Table partitioning by emoji language/category (if dict grows huge)
Read replicas for queries (if read-heavy)
Consider denormalization (e.g., cache popular emoji meanings in memory)

Current Recommendation: Single PostgreSQL database is sufficient for MVP. Scale if needed post-launch.

Component Interaction Diagram

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Discord Server (Guild)                                         │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ #general                                                 │  │
│  │ Vivi (proxied): 😷 2️⃣ 🍑 ❌ 🤧                         │  │
│  │ Vivi Speech Translator: "Vivi is sick, but not in ..."  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                            ↑ (message event)
                            │
┌──────────────────────────┴──────────────────────────────────────┐
│  Discord.py Bot Framework                                       │
│                                                                 │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Discord Client                                         │    │
│  │ - Maintains WebSocket connection                       │    │
│  │ - Routes events to handlers                            │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Message Event Handler (on_message)                     │    │
│  │ - Filter: webhook_id?                                 │    │
│  │ - Query: PluralKit API                                │    │
│  │ - Verify: member_id == Vivi?                          │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Emoji Parser                                           │    │
│  │ - Extract emojis (regex)                              │    │
│  │ - Categorize (Unicode/custom)                         │    │
│  │ - Preserve order                                      │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Translation Engine                                     │    │
│  │ - Lookup emojis in database                           │    │
│  │ - Compose natural language                            │    │
│  │ - Format response                                     │    │
│  └────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Command Handler (Cogs)                                 │    │
│  │ - /teach: Learn emoji meanings                        │    │
│  │ - /meaning: Look up emoji                             │    │
│  │ - /forget: Delete emoji                               │    │
│  │ - /config: Server preferences                         │    │
│  │ - /translate: Manual translation                      │    │
│  └────────────────────────────────────────────────────────┘    │
│                                                                 │
└──────────────────────────────┬──────────────────────────────────┘
                               ↓ (database queries/updates)
┌──────────────────────────────────────────────────────────────────┐
│  Database Layer                                                  │
│                                                                  │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ SQLAlchemy ORM + asyncio                               │     │
│  │ - Async connection pool                                │     │
│  │ - Connection reuse                                     │     │
│  │ - Transaction management                              │     │
│  └────────────────────────────────────────────────────────┘     │
│                            ↓                                      │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ PostgreSQL Database                                    │     │
│  │                                                        │     │
│  │  emoji_dictionary:          server_configuration:     │     │
│  │  ├─ 😷 → sick               ├─ guild_123 → auto ON   │     │
│  │  ├─ 2️⃣ → two               ├─ guild_456 → auto OFF  │     │
│  │  ├─ 🍑 → peach             └─ guild_789 → auto ON    │     │
│  │  ├─ ❌ → no                                            │     │
│  │  └─ :me1: → Vivi           (+ metadata, timestamps)   │     │
│  │  (+ metadata, timestamps)                             │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

External APIs:
┌────────────────────────────────────────────────────────────┐
│ PluralKit API (api.pluralkit.me)                           │
│ - GET /v2/messages/{id} → member info                     │
└────────────────────────────────────────────────────────────┘

Technology Stack Recommendations

Layer	Component	Technology	Why
Bot Framework	Discord Integration	discord.py 2.x	Async-native, active community, rich feature set
Database ORM	Persistence	SQLAlchemy 2.0 + asyncio	Async support, type-safe, widely adopted
Database	Data Store	PostgreSQL	Reliable, open-source, JSONB for future extensibility
Async Runtime	Concurrency	asyncio (built-in)	Lightweight, integrated with discord.py
Caching	Performance (Phase 5+)	Redis	Fast in-memory lookups, TTL support, distributed
Logging	Debugging	Python logging module	Built-in, structured logging can extend
API Requests	HTTP Calls	aiohttp	Async-native, connection pooling
Testing	Quality Assurance	pytest + pytest-asyncio	Async test support, fixtures
Deployment	Hosting	Docker + systemd or cloud	Reproducible environment, easy updates

Summary

Recommended Architecture:

Vivi Speech Translator is a modular Discord bot with a clear separation of concerns:

Discord Client listens for messages and routes them through a detection pipeline
Message Event Handler identifies when Vivi speaks (via PluralKit webhook + API verification)
Emoji Parser extracts emoji sequences while preserving order and type information
Translation Engine looks up meanings and composes responses
Database Layer (PostgreSQL + SQLAlchemy) stores a shared global emoji dictionary and per-server configurations
Command Handler (discord.py Cogs) allows teaching, querying, and configuration

The bot prioritizes:

Reliability: Graceful error handling, retry logic, database transactions
Performance: O(1) emoji lookups via indexing, async operations to avoid blocking, caching for scale
Scalability: Shared emoji dictionary, per-server configs, optional Redis caching, Discord sharding support
Maintainability: Modular Cog architecture, clear component boundaries, comprehensive logging

Build in phases: detection → parsing → translation → teaching → configuration → polish. This delivers value early (Phase 2) while establishing the foundation for features.

The bot can grow from a single server to thousands, limited primarily by PluralKit API rate limits (easily worked around) and database performance (PostgreSQL handles millions of rows efficiently).

39 KiB Raw Permalink Blame History Unescape Escape

Architecture Research: Vivi Speech Translator

Overview

Core Components

1. Discord Client

2. Message Event Handler

3. Emoji Parser

4. Translation Engine

5. Database Layer

6. Command Handler (Teaching & Configuration)

7. Configuration Layer

Data Flow

Message Reception to Response

Teaching Flow (Command-Driven)

PluralKit Integration Details

Detection Approach

API Endpoints Used

Implementation in discord.py

Database Schema

emoji_dictionary Table

server_configuration Table

Alternative: SQLAlchemy ORM Definitions

Suggested Build Order

Phase 1: Foundation (Week 1-2)

Phase 2: Emoji Parsing & Translation (Week 3-4)

Phase 3: Teaching Commands (Week 5-6)

Phase 4: Per-Server Configuration (Week 7)

Phase 5: Polish & Edge Cases (Week 8+)

Scaling Considerations

Multi-Server Architecture

Performance Optimization

Scaling Beyond 2,000 Guilds

Database Scaling

Component Interaction Diagram

Technology Stack Recommendations

Summary

Related Documentation

39 KiB

Raw Permalink Blame History