Files
Vivi-Speech/.planning/research/STACK.md
Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 11:02:32 -05:00

23 KiB

Stack Research: Vivi Speech Translator

Last Updated: January 29, 2025 Research Scope: Production-ready 2025 tech stack for Discord bot with PluralKit/Tupperbox integration


Executive Summary

For the Vivi Speech Translator project, the recommended 2025 stack is discord.py 2.6.4 (Python) with PostgreSQL/SQLite for emoji mapping storage, pluralkit.py for PluralKit integration via webhook dispatch, and Railway or Oracle Cloud for hosting. This combination offers mature frameworks, proven ecosystem integration, and cost-effectiveness while avoiding deprecated or unmaintained projects.


Discord Bot Framework

Recommendation: discord.py 2.6.4 (Python)

Why:

  • Actively Maintained: Latest version 2.6.4 released October 8, 2025 with healthy release cadence (new versions every 3 months)
  • Mature Ecosystem: 7+ years of development, largest Python Discord bot community, extensive documentation and third-party libraries
  • Slash Commands: Built-in support for modern Discord interactions without requiring message content intent for command parsing
  • Async-First Design: Native asyncio support essential for handling multiple concurrent API calls (PluralKit queries, webhook processing)
  • Production Proven: Powers many enterprise Discord communities with robust error handling and performance

Alternatives:

  • Pycord (py-cord): Fork of discord.py with enhanced UI components, but no new releases to PyPI in 12+ months - marked as inactive/discontinued as of 2025. Not recommended for greenfield projects.
  • discord.js (TypeScript/JavaScript): Popular but slower than Python at CPU-bound tasks. Better for teams comfortable with Node.js ecosystem.
  • Serenity/Twilight (Rust): Excellent performance but steep learning curve, overkill for a learning/utility bot, smaller community.
  • Go (discordgo): Good performance but emoji/text processing libraries less mature than Python ecosystem.

Confidence: High - discord.py is the de facto standard for Python Discord bot development in 2025.


Language

Recommendation: Python 3.10+

Why:

  • Rich Text Processing: Python has the most mature emoji handling libraries (emoji 2.x, regex, unicode support)
  • Data Validation: Pydantic ecosystem dominates for structured data (emoji mappings, system configs)
  • Community Resources: Largest Discord bot community uses Python, easiest to find tutorials and debugging help
  • Rapid Prototyping: Fast iteration on emoji detection/translation logic before optimization
  • Integration Libraries: pluralkit.py, aiosqlite, and asyncpg all have high-quality Python implementations

Version Specifics:

  • Minimum: Python 3.8 (discord.py requirement)
  • Recommended: Python 3.10 or 3.11 (pattern matching, better async, better type hints)
  • Support through: Python 3.12 confirmed by discord.py

Alternatives:

  • JavaScript/TypeScript: discord.js is feature-complete, but text emoji processing slower. Consider if team prefers TypeScript for type safety.
  • Rust: serenity/twilight offer 5-10x performance gains if emoji translation becomes CPU-bound with millions of mappings. Not needed initially.
  • Go: discordgo is simpler than Rust but emoji libraries less mature than Python.

Confidence: High - Python is the optimal choice for this project's text processing and ecosystem needs.


Database

Recommendation: PostgreSQL 15+ (production/scaling) or SQLite 3 (MVP/single-instance)

Schema Overview:

-- Global emoji dictionary
CREATE TABLE emoji_mappings (
    id SERIAL PRIMARY KEY,
    emoji TEXT NOT NULL UNIQUE,
    meanings TEXT[] NOT NULL,  -- array of translations
    created_at TIMESTAMP DEFAULT NOW(),
    confidence FLOAT DEFAULT 0.5,
    usage_count INT DEFAULT 0
);

-- Per-server overrides (future feature)
CREATE TABLE server_overrides (
    id SERIAL PRIMARY KEY,
    server_id BIGINT NOT NULL,
    emoji TEXT NOT NULL,
    custom_meaning TEXT NOT NULL,
    created_by BIGINT NOT NULL,
    UNIQUE(server_id, emoji)
);

-- PluralKit system tracking
CREATE TABLE pk_systems (
    id SERIAL PRIMARY KEY,
    pk_system_id TEXT NOT NULL UNIQUE,
    discord_user_id BIGINT NOT NULL,
    last_synced TIMESTAMP DEFAULT NOW(),
    member_count INT DEFAULT 0
);

-- Learning history for future model training
CREATE TABLE translation_history (
    id SERIAL PRIMARY KEY,
    emoji TEXT NOT NULL,
    translation TEXT NOT NULL,
    system_id BIGINT,
    context TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

Detailed Comparison

PostgreSQL (Recommended for Production)

Advantages:

  • Handles complex queries for learning/analytics (emoji co-occurrence, translation frequency)
  • Supports array types natively (efficient emoji->meanings mappings)
  • JSONB support for extensible emoji metadata
  • Scales to millions of emoji mappings across thousands of servers
  • Transaction support ensures data consistency during learning updates
  • Free tier available on Railway, Render, or self-hosted

Setup:

# Using asyncpg (async driver for discord.py)
pip install asyncpg

Considerations:

  • Requires external database service if cloud-hosted ($5-15/month)
  • Overkill for MVP with <10 servers, <1000 emoji mappings
  • Network latency adds 5-50ms per query (mitigated with caching)

SQLite (Recommended for MVP)

Advantages:

  • Zero setup: single file database, no server needed
  • Free and embedded
  • Fast for <10K emoji mappings and <100 concurrent users
  • Migrate to PostgreSQL later without API changes (SQLAlchemy compatibility)
  • Excellent for local development and testing

Setup:

# Using aiosqlite (async driver for discord.py)
pip install aiosqlite

Limitations:

  • One writer at a time (concurrent updates block)
  • No network access (bot must run on same machine)
  • Not suitable if bot replicates across multiple servers
  • No native array types (serialize to JSON)

Use SQLite when:

  • MVP with single bot instance
  • <1000 servers, <50K emoji mappings
  • Learning phase before optimization

Decision Framework:

Scenario Recommendation Rationale
MVP (Weeks 1-4) SQLite + aiosqlite Fast iteration, zero ops overhead
Public Bot (Month 2+) PostgreSQL + asyncpg Scale across communities, learn patterns
Enterprise (100+ servers) PostgreSQL + Redis cache layer Millions of mappings, sub-100ms response

Confidence: High - This structure mirrors successful Discord bot implementations (Logiq, MEE6, others).


PluralKit Integration

How PluralKit Works

PluralKit uses Discord webhook proxying to detect and rewrite messages:

  1. User configures bracket patterns (e.g., [Name] for member "Name")
  2. User sends: [Name] 🎭💫 means "happy performance"
  3. PluralKit intercepts, detects brackets, replaces message under webhook as "Name" profile
  4. Result: Message appears as if sent by that member's profile

Detection Mechanisms

Option A: Webhook Dispatch Events (Recommended)

  • PluralKit sends JSON webhooks when members are created/updated/deleted
  • Webhook payload includes member ID, modified fields, system ID
  • Signing token for security validation
  • No message content parsing required

Payload Example:

{
  "id": "webhook-event-id",
  "type": "UPDATE_MEMBER",
  "system": "system-id",
  "key": "member-id",
  "data": {
    "name": "Vivi",
    "avatar_url": "https://..."
  },
  "signing_token": "verify-this"
}

Option B: Message Content Intent (Fallback)

  • Listen for all messages, check for PluralKit proxy brackets
  • Requires MESSAGE_CONTENT privileged intent
  • Higher latency, more complex parsing
  • Use only if webhook dispatch unavailable

Implementation Approach for discord.py

# 1. Create webhook listener endpoint
from aiohttp import web

async def pk_webhook_handler(request):
    """Receive PluralKit dispatch webhooks"""
    data = await request.json()
    signing_token = request.headers.get('X-Signature-Ed25519')

    # Verify signature
    if not verify_signature(data, signing_token, PK_SECRET):
        return web.Response(status=401, text='Unauthorized')

    # Handle event types
    if data['type'] == 'UPDATE_MEMBER':
        await update_emoji_mappings(data['system'], data['key'])

    return web.Response(text='OK')

# 2. Register webhook with PluralKit API
async def register_pk_webhook():
    """Call PluralKit API to register webhook URL"""
    async with aiohttp.ClientSession() as session:
        headers = {'Authorization': PK_SYSTEM_TOKEN}
        payload = {
            'url': 'https://your-bot-domain.com/webhooks/pk',
            'events': ['UPDATE_MEMBER', 'DELETE_MEMBER', 'CREATE_MEMBER']
        }
        await session.post(
            'https://api.pluralkit.me/v2/systems/webhooks',
            json=payload,
            headers=headers
        )

# 3. Query system info for Vivi
from pluralkit import Client

async def get_system_members(system_id):
    """Fetch Vivi's system members using pluralkit.py library"""
    client = Client(token=PK_SYSTEM_TOKEN)
    system = await client.get_system(system_id)
    members = await client.get_system_members(system_id)
    return members

# 4. Detect Vivi's messages
async def on_message(message):
    """Intercept all messages, check if from Vivi's system"""
    if message.author.id == VIVI_USER_ID:
        # Check if this is a proxied message using PluralKit API
        try:
            proxied = await client.get_message(message.id)
            if proxied and proxied.system:
                await handle_vivi_message(message)
        except Exception:
            pass  # Not a proxied message

Integration Libraries

  • pluralkit.py: Client library for PluralKit API v2 (GitHub: PluralKit/PluralKit)
    • Install: pip install pluralkit
    • Handles auth, models, rate limiting
    • Current version: 1.1.5+

API Endpoints Needed

Endpoint Purpose Frequency
GET /systems/{id} Fetch system info On startup, cache for 1 hour
GET /systems/{id}/members List all members On startup, update on webhook event
GET /messages/{id} Query if message proxied Per message (optional, high quota cost)
POST /systems/webhooks Register webhook On startup

Rate Limits

  • Standard: 2 requests/second
  • Burst: 10 requests/second
  • Message endpoint: Separate 1 request/second quota
  • Webhook dispatch: No rate limits, server-initiated

Recommendation: Cache member lists in-memory with 1-hour TTL, update only on webhook events. Avoid polling GET /messages/{id} for every message (expensive quota).


Key Libraries

Purpose Library Version Installation Notes
Discord API discord.py 2.6.4+ pip install discord.py Modern interactions, slash commands, intents
PluralKit API pluralkit.py 1.1.5+ pip install pluralkit Type-safe member/system models
Async Database aiosqlite 0.19.0+ pip install aiosqlite SQLite with asyncio (MVP)
Async Database asyncpg 0.29.0+ pip install asyncpg PostgreSQL with asyncio (production)
Emoji Handling emoji 2.11.0+ pip install emoji Convert emoji ↔ names, demojize/emojize
Data Validation pydantic 2.5.0+ pip install pydantic Validate emoji mappings, system configs
HTTP Requests aiohttp 3.9.0+ pip install aiohttp Async webhook server for PluralKit
Environment Config python-dotenv 1.0.0+ pip install python-dotenv Manage tokens, API keys safely
JSON Handling jsonschema 4.20.0+ pip install jsonschema Validate PluralKit webhook payloads

Why These Specific Libraries

emoji 2.11.0+:

  • Supports full Unicode 15.0 emoji set (2025 standard)
  • emoji.demojize() → emoji to :name: codes
  • emoji.emojize() → codes to emoji
  • Handles variant selectors and skin tone modifiers
  • Example: emoji.demojize("😊")":smiling_face_with_smiling_eyes:"

pydantic 2.5.0+:

  • Runtime type validation (catch invalid emoji mappings before DB save)
  • Auto-generate JSON schemas for API documentation
  • Configuration management for bot settings
  • Example:
from pydantic import BaseModel, validator

class EmojiMapping(BaseModel):
    emoji: str
    meanings: list[str]

    @validator('emoji')
    def validate_emoji(cls, v):
        if not emoji.is_emoji(v):
            raise ValueError('Invalid emoji')
        return v

asyncpg over psycopg2:

  • Native async/await (required for discord.py bot loop)
  • 2-3x faster than sync driver in async context
  • Connection pooling built-in
  • No threading overhead

Hosting & Deployment

Primary Recommendation: Railway + PostgreSQL (Managed)

Setup:

  1. Discord bot code hosted on Railway
  2. PostgreSQL database also on Railway
  3. Public URL for webhook endpoint (PluralKit dispatch)
  4. $5/month free credits, ~$0-10/month if modest usage

Why Railway:

  • Automatic deployments from GitHub (git push = live update)
  • Built-in PostgreSQL add-on ($15/month or included in free tier for small projects)
  • Environment variables for secrets (tokens, API keys)
  • Good uptime (99.95%), supports long-running processes
  • Easy scaling if needed later
  • Free domain with SSL certificate

Setup Commands:

# Install Railway CLI
curl -fsSL https://railway.app/install.sh | bash

# Login
railway login

# Initialize project
railway init

# Deploy
git push  # automatic if GitHub connected

# View logs
railway logs

Alternative Options

Option 2: Oracle Cloud (Free Tier) + Self-Hosted Bot

Services:

  • Oracle Cloud Always-Free VM (4 CPU, 24GB RAM, 200GB storage) - runs bot + PostgreSQL
  • Bot code in Docker container
  • Systemd or supervisor for process management

Advantages:

  • Completely free for life
  • Plenty of resources for 1000+ emoji mappings
  • Full control over environment

Disadvantages:

  • Oracle may delete instances after 60 days of inactivity (unpredictable)
  • Requires Linux/Docker knowledge
  • Manual SSL certificate renewal (Let's Encrypt)
  • No automatic redeploys

Option 3: Render (Free Tier Deprecated)

Status: Render removed free tier in 2024. Not recommended for budget projects.

Option 4: Self-Hosted on Raspberry Pi / Home Server

Setup:

  • Raspberry Pi 5 ($80 hardware) or old laptop
  • SQLite database
  • Systemd service runner
  • NGINX reverse proxy for webhooks
  • Dynamic DNS for public URL (Cloudflare, DuckDNS)

Cost: Electricity only (~$10/year) Reliability: Depends on home internet uptime Best for: Learning/hobby projects, not community-facing bots


Authentication & Permissions (Discord OAuth2/Intents)

Required Intents

intents = discord.Intents.default()
intents.guilds = True              # Guild events (joins, member counts)
intents.members = True             # Member info for presence checks
# intents.message_content = True   # ONLY if using prefix commands or parsing raw messages
# For slash commands: NOT REQUIRED

bot = discord.Bot(intents=intents)

Why NOT to Request Message Content Intent

Slash Commands Don't Need It:

  • /translate 🎭 → Works without message content intent
  • Discord sends interaction object with full data

Don't Use Prefix Commands:

  • Prefix commands (e.g., !translate 🎭) require message_content intent
  • Adds compliance burden (privacy concern)
  • Slash commands are standard for new bots in 2025

Required Permissions

Bot Invite URL Permissions (decimal: 536996928):
- Read Messages/View Channels (1024)
- Send Messages (2048)
- Embed Links (16384)
- Read Message History (65536)
- Use Slash Commands (274877906944)  # auto-included in interactions

Don't request:
- Manage Messages (edit other users' messages) - not needed
- Administrator - major red flag, users won't add bot

OAuth2 Setup

  1. Register Bot in Discord Developer Portal:

    • Create application → Create bot user
    • Copy bot token, store in .env
    • Enable Intents: GUILD_MEMBERS, GUILDS
  2. Generate Invite Link:

    • Use Discord Permissions Calculator
    • Share: https://discord.com/oauth2/authorize?client_id={CLIENT_ID}&scope=bot&permissions=536996928
  3. Bot Token Management:

    import os
    from dotenv import load_dotenv
    
    load_dotenv()
    TOKEN = os.getenv('DISCORD_TOKEN')
    bot.run(TOKEN)
    

MFA Requirement

If bot has elevated permissions (marked with asterisk in permissions list) and added to guild with MFA enabled, bot owner must enable 2FA on Discord account. Plan for this before public release.


Using Pycord in 2025

Why Not:

  • No new PyPI releases since November 2023 (12+ months)
  • Actively marked as "discontinued" or "low priority maintenance"
  • discord.py 2.6.4 is more stable and has better community support
  • Migration from Pycord → discord.py requires minimal changes (compatible imports)

If you inherit Pycord code: Plan migration to discord.py, but it's not urgent.


Storing Bot Token in Code

Why Not:

  • GitHub will scan and revoke tokens automatically (good) but bot will be compromised
  • Attacker gets full bot access, can impersonate, delete, spam communities

Correct Approach:

# ✅ Use environment variables
from dotenv import load_dotenv
import os
load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')

# ✅ Git should ignore .env
echo ".env" >> .gitignore

Requesting MessageContent Intent "Just in Case"

Why Not:

  • Discord tracks intent abuse (compliance review for 100+ guilds)
  • Shows poor design (should use slash commands instead)
  • Privacy red flag for communities
  • Adds API request latency for every message

When you actually need it:

  • Prefix commands ONLY (not applicable for Vivi bot)
  • Raw message parsing (not needed for emoji detection via webhooks)
  • Chat bots that need to understand full conversation

Syncing Emoji Mappings via REST API Polling

Why Not:

  • PluralKit rate limits API calls (2 requests/sec)
  • Polling every 30 seconds across 100 members = 200+ API calls/30s (throttled, errors)
  • High latency (5+ second delay to sync new member)

Correct Approach:

  • Use webhook dispatch (PluralKit pushes updates to you)
  • Cache member list in-memory
  • Update only on webhook events

Building Custom PluralKit Webhook Signature Verification

Why Not:

  • Ed25519 signature verification is cryptographically complex
  • One mistake = accepts forged webhooks (security vulnerability)

Correct Approach:

# Use library instead
from nacl.signing import VerifyKey
from nacl.exceptions import BadSignatureError

def verify_pk_signature(body: bytes, signature: str, public_key: str) -> bool:
    try:
        verify_key = VerifyKey(public_key)
        verify_key.verify(body, bytes.fromhex(signature))
        return True
    except BadSignatureError:
        return False

Storing Full Emoji History Without Expiry

Why Not:

  • Unbounded table growth (millions of rows/month)
  • Query performance degrades over time
  • Storage costs balloon on cloud databases

Correct Approach:

-- Archive old data monthly
INSERT INTO emoji_translation_archive
  SELECT * FROM translation_history
  WHERE created_at < NOW() - INTERVAL '3 months';

DELETE FROM translation_history
WHERE created_at < NOW() - INTERVAL '3 months';

CREATE INDEX idx_created_at ON translation_history(created_at);

Using Synchronous Libraries (requests, sqlite3)

Why Not:

  • Blocks Discord bot event loop
  • One slow query = all slash commands freeze
  • Unresponsive bot experience

Correct Approach:

# ❌ DON'T
import sqlite3
conn = sqlite3.connect('emoji.db')  # Blocks entire bot!

# ✅ DO
import aiosqlite
async with aiosqlite.connect('emoji.db') as db:
    cursor = await db.execute('SELECT ...')

Implementation Roadmap (Greenfield)

Phase 1: MVP (Weeks 1-2)

  • Tech: discord.py 2.6.4 + SQLite + slash commands
  • Features:
    • /learn 🎭 "happy performance" - store emoji → meaning
    • /translate 🎭💫 ... - look up emoji meanings
    • Detect Vivi's user ID, listen for messages
  • Testing: Local development, manual testing in private Discord server

Phase 2: PluralKit Integration (Weeks 3-4)

  • Add webhook endpoint for PluralKit dispatch events
  • Cache system members in-memory
  • Detect "from Vivi's system" vs "from other users"
  • Store per-system learned mappings

Phase 3: Production Prep (Weeks 5-6)

  • Migrate SQLite → PostgreSQL
  • Deploy to Railway
  • Set up logging and error tracking (Sentry, optional)
  • Public bot invite link, documentation

Phase 4: Scaling (Weeks 7+)

  • Global emoji dictionary learning across all servers
  • Per-server overrides for custom meanings
  • Analytics dashboard (most common emoji, growth trends)
  • Redis cache layer if needed

Cost Breakdown (Monthly)

Component Free Option Production Option Cost
Bot Hosting Railway free tier Railway $0-5
Database SQLite (local) PostgreSQL (Railway) $0 (included)
PluralKit API Free (webhook only) Free $0
Logging (optional) stdout Sentry $0-50
Custom Domain discord.bot.app your-domain.com $12+
TOTAL $0 $0-20 -

Summary

The 2025 recommended stack for Vivi Speech Translator is:

Framework: discord.py 2.6.4 (Python 3.10+) Database: SQLite (MVP) → PostgreSQL (production) with asyncpg/aiosqlite PluralKit: pluralkit.py library + webhook dispatch events Hosting: Railway Cloud ($0-5/month) Libraries: emoji 2.11.0+, pydantic 2.5.0+, aiohttp 3.9.0+

This stack prioritizes maintainability (discord.py is actively maintained), ecosystem maturity (largest Python Discord community), cost-effectiveness (free tier sufficient), and reliability (proven in production by 1000+ bots).

Start with MVP (SQLite, local development) to validate emoji detection logic, then migrate to PostgreSQL on Railway for multi-server deployment. Avoid Pycord (unmaintained), don't request message content intent (use slash commands instead), and leverage webhook dispatch for efficient PluralKit integration.


References & Sources

  1. discord.py Official Docs - Latest 2.6.4
  2. PluralKit API Reference
  3. PluralKit Webhook Dispatch
  4. pluralkit.py Library Docs
  5. Discord Bot Hosting Guide 2025
  6. Discord Intents & OAuth2 Documentation
  7. Railway Cloud Platform
  8. Pydantic v2 Documentation
  9. emoji Library (carpedm20)
  10. SQLite vs PostgreSQL Comparison 2025