Files

Dani B 901574f8c8 docs: complete project research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

Synthesized research findings from 4 parallel researcher agents:

Key Findings:
- Stack: discord.py 2.6.4 + PostgreSQL/SQLite with webhook-driven PluralKit integration
- Architecture: 7-component system with clear separation of concerns, async-native
- Features: Rule-based learning system starting simple, avoiding context inference and ML
- Pitfalls: 8 critical risks identified with phase assignments and prevention strategies

Recommended Approach:
- 5-phase build order (detection → translation → teaching → config → polish)
- Focus on dysgraphia accessibility for teaching interface
- Start with message detection reliability (Phase 1, load-bearing)
- Shared emoji dictionary (Phase 1-3); per-server overrides deferred to Phase 4+

Confidence Levels:
- Tech Stack: VERY HIGH (all production-proven, no experimental choices)
- Architecture: VERY HIGH (mirrors successful production bots)
- Features: HIGH (tight scope, transparent approach)
- Roadmap: HIGH (logical phase progression with value delivery)

Gaps to Address in Requirements:
- Vivi's teaching UX preferences (dysgraphia-specific patterns)
- Exact emoji coverage and naming conventions
- Moderation/teaching permissions model
- Multi-system scope and per-system customization needs

Ready for requirements definition and roadmap creation.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 11:02:32 -05:00

23 KiB

Raw Blame History

Stack Research: Vivi Speech Translator

Last Updated: January 29, 2025 Research Scope: Production-ready 2025 tech stack for Discord bot with PluralKit/Tupperbox integration

Executive Summary

For the Vivi Speech Translator project, the recommended 2025 stack is discord.py 2.6.4 (Python) with PostgreSQL/SQLite for emoji mapping storage, pluralkit.py for PluralKit integration via webhook dispatch, and Railway or Oracle Cloud for hosting. This combination offers mature frameworks, proven ecosystem integration, and cost-effectiveness while avoiding deprecated or unmaintained projects.

Discord Bot Framework

Recommendation: discord.py 2.6.4 (Python)

Why:

Actively Maintained: Latest version 2.6.4 released October 8, 2025 with healthy release cadence (new versions every 3 months)
Mature Ecosystem: 7+ years of development, largest Python Discord bot community, extensive documentation and third-party libraries
Slash Commands: Built-in support for modern Discord interactions without requiring message content intent for command parsing
Async-First Design: Native asyncio support essential for handling multiple concurrent API calls (PluralKit queries, webhook processing)
Production Proven: Powers many enterprise Discord communities with robust error handling and performance

Alternatives:

Pycord (py-cord): Fork of discord.py with enhanced UI components, but no new releases to PyPI in 12+ months - marked as inactive/discontinued as of 2025. Not recommended for greenfield projects.
discord.js (TypeScript/JavaScript): Popular but slower than Python at CPU-bound tasks. Better for teams comfortable with Node.js ecosystem.
Serenity/Twilight (Rust): Excellent performance but steep learning curve, overkill for a learning/utility bot, smaller community.
Go (discordgo): Good performance but emoji/text processing libraries less mature than Python ecosystem.

Confidence: High - discord.py is the de facto standard for Python Discord bot development in 2025.

Language

Recommendation: Python 3.10+

Why:

Rich Text Processing: Python has the most mature emoji handling libraries (emoji 2.x, regex, unicode support)
Data Validation: Pydantic ecosystem dominates for structured data (emoji mappings, system configs)
Community Resources: Largest Discord bot community uses Python, easiest to find tutorials and debugging help
Rapid Prototyping: Fast iteration on emoji detection/translation logic before optimization
Integration Libraries: pluralkit.py, aiosqlite, and asyncpg all have high-quality Python implementations

Version Specifics:

Minimum: Python 3.8 (discord.py requirement)
Recommended: Python 3.10 or 3.11 (pattern matching, better async, better type hints)
Support through: Python 3.12 confirmed by discord.py

Alternatives:

JavaScript/TypeScript: discord.js is feature-complete, but text emoji processing slower. Consider if team prefers TypeScript for type safety.
Rust: serenity/twilight offer 5-10x performance gains if emoji translation becomes CPU-bound with millions of mappings. Not needed initially.
Go: discordgo is simpler than Rust but emoji libraries less mature than Python.

Confidence: High - Python is the optimal choice for this project's text processing and ecosystem needs.

Database

Recommendation: PostgreSQL 15+ (production/scaling) or SQLite 3 (MVP/single-instance)

Schema Overview:

-- Global emoji dictionary
CREATE TABLE emoji_mappings (
    id SERIAL PRIMARY KEY,
    emoji TEXT NOT NULL UNIQUE,
    meanings TEXT[] NOT NULL,  -- array of translations
    created_at TIMESTAMP DEFAULT NOW(),
    confidence FLOAT DEFAULT 0.5,
    usage_count INT DEFAULT 0
);

-- Per-server overrides (future feature)
CREATE TABLE server_overrides (
    id SERIAL PRIMARY KEY,
    server_id BIGINT NOT NULL,
    emoji TEXT NOT NULL,
    custom_meaning TEXT NOT NULL,
    created_by BIGINT NOT NULL,
    UNIQUE(server_id, emoji)
);

-- PluralKit system tracking
CREATE TABLE pk_systems (
    id SERIAL PRIMARY KEY,
    pk_system_id TEXT NOT NULL UNIQUE,
    discord_user_id BIGINT NOT NULL,
    last_synced TIMESTAMP DEFAULT NOW(),
    member_count INT DEFAULT 0
);

-- Learning history for future model training
CREATE TABLE translation_history (
    id SERIAL PRIMARY KEY,
    emoji TEXT NOT NULL,
    translation TEXT NOT NULL,
    system_id BIGINT,
    context TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

Detailed Comparison

PostgreSQL (Recommended for Production)

Advantages:

Handles complex queries for learning/analytics (emoji co-occurrence, translation frequency)
Supports array types natively (efficient emoji->meanings mappings)
JSONB support for extensible emoji metadata
Scales to millions of emoji mappings across thousands of servers
Transaction support ensures data consistency during learning updates
Free tier available on Railway, Render, or self-hosted

Setup:

# Using asyncpg (async driver for discord.py)
pip install asyncpg

Considerations:

Requires external database service if cloud-hosted ($5-15/month)
Overkill for MVP with <10 servers, <1000 emoji mappings
Network latency adds 5-50ms per query (mitigated with caching)

SQLite (Recommended for MVP)

Advantages:

Zero setup: single file database, no server needed
Free and embedded
Fast for <10K emoji mappings and <100 concurrent users
Migrate to PostgreSQL later without API changes (SQLAlchemy compatibility)
Excellent for local development and testing

Setup:

# Using aiosqlite (async driver for discord.py)
pip install aiosqlite

Limitations:

One writer at a time (concurrent updates block)
No network access (bot must run on same machine)
Not suitable if bot replicates across multiple servers
No native array types (serialize to JSON)

Use SQLite when:

MVP with single bot instance
<1000 servers, <50K emoji mappings
Learning phase before optimization

Decision Framework:

Scenario	Recommendation	Rationale
MVP (Weeks 1-4)	SQLite + aiosqlite	Fast iteration, zero ops overhead
Public Bot (Month 2+)	PostgreSQL + asyncpg	Scale across communities, learn patterns
Enterprise (100+ servers)	PostgreSQL + Redis cache layer	Millions of mappings, sub-100ms response

Confidence: High - This structure mirrors successful Discord bot implementations (Logiq, MEE6, others).

PluralKit Integration

How PluralKit Works

PluralKit uses Discord webhook proxying to detect and rewrite messages:

User configures bracket patterns (e.g., [Name] for member "Name")
User sends: [Name] 🎭💫 means "happy performance"
PluralKit intercepts, detects brackets, replaces message under webhook as "Name" profile
Result: Message appears as if sent by that member's profile

Detection Mechanisms

Option A: Webhook Dispatch Events (Recommended)

PluralKit sends JSON webhooks when members are created/updated/deleted
Webhook payload includes member ID, modified fields, system ID
Signing token for security validation
No message content parsing required

Payload Example:

{
  "id": "webhook-event-id",
  "type": "UPDATE_MEMBER",
  "system": "system-id",
  "key": "member-id",
  "data": {
    "name": "Vivi",
    "avatar_url": "https://..."
  },
  "signing_token": "verify-this"
}

Option B: Message Content Intent (Fallback)

Listen for all messages, check for PluralKit proxy brackets
Requires MESSAGE_CONTENT privileged intent
Higher latency, more complex parsing
Use only if webhook dispatch unavailable

Implementation Approach for discord.py

# 1. Create webhook listener endpoint
from aiohttp import web

async def pk_webhook_handler(request):
    """Receive PluralKit dispatch webhooks"""
    data = await request.json()
    signing_token = request.headers.get('X-Signature-Ed25519')

    # Verify signature
    if not verify_signature(data, signing_token, PK_SECRET):
        return web.Response(status=401, text='Unauthorized')

    # Handle event types
    if data['type'] == 'UPDATE_MEMBER':
        await update_emoji_mappings(data['system'], data['key'])

    return web.Response(text='OK')

# 2. Register webhook with PluralKit API
async def register_pk_webhook():
    """Call PluralKit API to register webhook URL"""
    async with aiohttp.ClientSession() as session:
        headers = {'Authorization': PK_SYSTEM_TOKEN}
        payload = {
            'url': 'https://your-bot-domain.com/webhooks/pk',
            'events': ['UPDATE_MEMBER', 'DELETE_MEMBER', 'CREATE_MEMBER']
        }
        await session.post(
            'https://api.pluralkit.me/v2/systems/webhooks',
            json=payload,
            headers=headers
        )

# 3. Query system info for Vivi
from pluralkit import Client

async def get_system_members(system_id):
    """Fetch Vivi's system members using pluralkit.py library"""
    client = Client(token=PK_SYSTEM_TOKEN)
    system = await client.get_system(system_id)
    members = await client.get_system_members(system_id)
    return members

# 4. Detect Vivi's messages
async def on_message(message):
    """Intercept all messages, check if from Vivi's system"""
    if message.author.id == VIVI_USER_ID:
        # Check if this is a proxied message using PluralKit API
        try:
            proxied = await client.get_message(message.id)
            if proxied and proxied.system:
                await handle_vivi_message(message)
        except Exception:
            pass  # Not a proxied message

Integration Libraries

pluralkit.py: Client library for PluralKit API v2 (GitHub: PluralKit/PluralKit)
- Install: pip install pluralkit
- Handles auth, models, rate limiting
- Current version: 1.1.5+

API Endpoints Needed

Endpoint	Purpose	Frequency
`GET /systems/{id}`	Fetch system info	On startup, cache for 1 hour
`GET /systems/{id}/members`	List all members	On startup, update on webhook event
`GET /messages/{id}`	Query if message proxied	Per message (optional, high quota cost)
`POST /systems/webhooks`	Register webhook	On startup

Rate Limits

Standard: 2 requests/second
Burst: 10 requests/second
Message endpoint: Separate 1 request/second quota
Webhook dispatch: No rate limits, server-initiated

Recommendation: Cache member lists in-memory with 1-hour TTL, update only on webhook events. Avoid polling GET /messages/{id} for every message (expensive quota).

Key Libraries

Purpose	Library	Version	Installation	Notes
Discord API	discord.py	2.6.4+	`pip install discord.py`	Modern interactions, slash commands, intents
PluralKit API	pluralkit.py	1.1.5+	`pip install pluralkit`	Type-safe member/system models
Async Database	aiosqlite	0.19.0+	`pip install aiosqlite`	SQLite with asyncio (MVP)
Async Database	asyncpg	0.29.0+	`pip install asyncpg`	PostgreSQL with asyncio (production)
Emoji Handling	emoji	2.11.0+	`pip install emoji`	Convert emoji ↔ names, demojize/emojize
Data Validation	pydantic	2.5.0+	`pip install pydantic`	Validate emoji mappings, system configs
HTTP Requests	aiohttp	3.9.0+	`pip install aiohttp`	Async webhook server for PluralKit
Environment Config	python-dotenv	1.0.0+	`pip install python-dotenv`	Manage tokens, API keys safely
JSON Handling	jsonschema	4.20.0+	`pip install jsonschema`	Validate PluralKit webhook payloads

Why These Specific Libraries

emoji 2.11.0+:

Supports full Unicode 15.0 emoji set (2025 standard)
emoji.demojize() → emoji to :name: codes
emoji.emojize() → codes to emoji
Handles variant selectors and skin tone modifiers
Example: emoji.demojize("😊") → ":smiling_face_with_smiling_eyes:"

pydantic 2.5.0+:

Runtime type validation (catch invalid emoji mappings before DB save)
Auto-generate JSON schemas for API documentation
Configuration management for bot settings
Example:

from pydantic import BaseModel, validator

class EmojiMapping(BaseModel):
    emoji: str
    meanings: list[str]

    @validator('emoji')
    def validate_emoji(cls, v):
        if not emoji.is_emoji(v):
            raise ValueError('Invalid emoji')
        return v

asyncpg over psycopg2:

Native async/await (required for discord.py bot loop)
2-3x faster than sync driver in async context
Connection pooling built-in
No threading overhead

Hosting & Deployment

Recommended Approach: Cloud PaaS (Hybrid Model)

Primary Recommendation: Railway + PostgreSQL (Managed)

Setup:

Discord bot code hosted on Railway
PostgreSQL database also on Railway
Public URL for webhook endpoint (PluralKit dispatch)
$5/month free credits, ~$0-10/month if modest usage

Why Railway:

Automatic deployments from GitHub (git push = live update)
Built-in PostgreSQL add-on ($15/month or included in free tier for small projects)
Environment variables for secrets (tokens, API keys)
Good uptime (99.95%), supports long-running processes
Easy scaling if needed later
Free domain with SSL certificate

Setup Commands:

# Install Railway CLI
curl -fsSL https://railway.app/install.sh | bash

# Login
railway login

# Initialize project
railway init

# Deploy
git push  # automatic if GitHub connected

# View logs
railway logs

Alternative Options

Option 2: Oracle Cloud (Free Tier) + Self-Hosted Bot

Services:

Oracle Cloud Always-Free VM (4 CPU, 24GB RAM, 200GB storage) - runs bot + PostgreSQL
Bot code in Docker container
Systemd or supervisor for process management

Advantages:

Completely free for life
Plenty of resources for 1000+ emoji mappings
Full control over environment

Disadvantages:

Oracle may delete instances after 60 days of inactivity (unpredictable)
Requires Linux/Docker knowledge
Manual SSL certificate renewal (Let's Encrypt)
No automatic redeploys

Option 3: Render (Free Tier Deprecated)

Status: Render removed free tier in 2024. Not recommended for budget projects.

Option 4: Self-Hosted on Raspberry Pi / Home Server

Setup:

Raspberry Pi 5 ($80 hardware) or old laptop
SQLite database
Systemd service runner
NGINX reverse proxy for webhooks
Dynamic DNS for public URL (Cloudflare, DuckDNS)

Cost: Electricity only (~$10/year) Reliability: Depends on home internet uptime Best for: Learning/hobby projects, not community-facing bots

Authentication & Permissions (Discord OAuth2/Intents)

Required Intents

intents = discord.Intents.default()
intents.guilds = True              # Guild events (joins, member counts)
intents.members = True             # Member info for presence checks
# intents.message_content = True   # ONLY if using prefix commands or parsing raw messages
# For slash commands: NOT REQUIRED

bot = discord.Bot(intents=intents)

Why NOT to Request Message Content Intent

✅ Slash Commands Don't Need It:

/translate 🎭 → Works without message content intent
Discord sends interaction object with full data

❌ Don't Use Prefix Commands:

Prefix commands (e.g., !translate 🎭) require message_content intent
Adds compliance burden (privacy concern)
Slash commands are standard for new bots in 2025

Required Permissions

Bot Invite URL Permissions (decimal: 536996928):
- Read Messages/View Channels (1024)
- Send Messages (2048)
- Embed Links (16384)
- Read Message History (65536)
- Use Slash Commands (274877906944)  # auto-included in interactions

Don't request:
- Manage Messages (edit other users' messages) - not needed
- Administrator - major red flag, users won't add bot

OAuth2 Setup

Register Bot in Discord Developer Portal:
- Create application → Create bot user
- Copy bot token, store in .env
- Enable Intents: GUILD_MEMBERS, GUILDS
Generate Invite Link:
- Use Discord Permissions Calculator
- Share: https://discord.com/oauth2/authorize?client_id={CLIENT_ID}&scope=bot&permissions=536996928

Bot Token Management:

import os
from dotenv import load_dotenv

load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')
bot.run(TOKEN)

MFA Requirement

If bot has elevated permissions (marked with asterisk in permissions list) and added to guild with MFA enabled, bot owner must enable 2FA on Discord account. Plan for this before public release.

❌ Using Pycord in 2025

Why Not:

No new PyPI releases since November 2023 (12+ months)
Actively marked as "discontinued" or "low priority maintenance"
discord.py 2.6.4 is more stable and has better community support
Migration from Pycord → discord.py requires minimal changes (compatible imports)

If you inherit Pycord code: Plan migration to discord.py, but it's not urgent.

❌ Storing Bot Token in Code

Why Not:

GitHub will scan and revoke tokens automatically (good) but bot will be compromised
Attacker gets full bot access, can impersonate, delete, spam communities

Correct Approach:

# ✅ Use environment variables
from dotenv import load_dotenv
import os
load_dotenv()
TOKEN = os.getenv('DISCORD_TOKEN')

# ✅ Git should ignore .env
echo ".env" >> .gitignore

❌ Requesting MessageContent Intent "Just in Case"

Why Not:

Discord tracks intent abuse (compliance review for 100+ guilds)
Shows poor design (should use slash commands instead)
Privacy red flag for communities
Adds API request latency for every message

When you actually need it:

Prefix commands ONLY (not applicable for Vivi bot)
Raw message parsing (not needed for emoji detection via webhooks)
Chat bots that need to understand full conversation

❌ Syncing Emoji Mappings via REST API Polling

Why Not:

PluralKit rate limits API calls (2 requests/sec)
Polling every 30 seconds across 100 members = 200+ API calls/30s (throttled, errors)
High latency (5+ second delay to sync new member)

Correct Approach:

Use webhook dispatch (PluralKit pushes updates to you)
Cache member list in-memory
Update only on webhook events

❌ Building Custom PluralKit Webhook Signature Verification

Why Not:

Ed25519 signature verification is cryptographically complex
One mistake = accepts forged webhooks (security vulnerability)

Correct Approach:

# Use library instead
from nacl.signing import VerifyKey
from nacl.exceptions import BadSignatureError

def verify_pk_signature(body: bytes, signature: str, public_key: str) -> bool:
    try:
        verify_key = VerifyKey(public_key)
        verify_key.verify(body, bytes.fromhex(signature))
        return True
    except BadSignatureError:
        return False

❌ Storing Full Emoji History Without Expiry

Why Not:

Unbounded table growth (millions of rows/month)
Query performance degrades over time
Storage costs balloon on cloud databases

Correct Approach:

-- Archive old data monthly
INSERT INTO emoji_translation_archive
  SELECT * FROM translation_history
  WHERE created_at < NOW() - INTERVAL '3 months';

DELETE FROM translation_history
WHERE created_at < NOW() - INTERVAL '3 months';

CREATE INDEX idx_created_at ON translation_history(created_at);

❌ Using Synchronous Libraries (requests, sqlite3)

Why Not:

Blocks Discord bot event loop
One slow query = all slash commands freeze
Unresponsive bot experience

Correct Approach:

# ❌ DON'T
import sqlite3
conn = sqlite3.connect('emoji.db')  # Blocks entire bot!

# ✅ DO
import aiosqlite
async with aiosqlite.connect('emoji.db') as db:
    cursor = await db.execute('SELECT ...')

Implementation Roadmap (Greenfield)

Phase 1: MVP (Weeks 1-2)

Tech: discord.py 2.6.4 + SQLite + slash commands
Features:
- /learn 🎭 "happy performance" - store emoji → meaning
- /translate 🎭💫 ... - look up emoji meanings
- Detect Vivi's user ID, listen for messages
Testing: Local development, manual testing in private Discord server

Phase 2: PluralKit Integration (Weeks 3-4)

Add webhook endpoint for PluralKit dispatch events
Cache system members in-memory
Detect "from Vivi's system" vs "from other users"
Store per-system learned mappings

Phase 3: Production Prep (Weeks 5-6)

Migrate SQLite → PostgreSQL
Deploy to Railway
Set up logging and error tracking (Sentry, optional)
Public bot invite link, documentation

Phase 4: Scaling (Weeks 7+)

Global emoji dictionary learning across all servers
Per-server overrides for custom meanings
Analytics dashboard (most common emoji, growth trends)
Redis cache layer if needed

Cost Breakdown (Monthly)

Component	Free Option	Production Option	Cost
Bot Hosting	Railway free tier	Railway	$0-5
Database	SQLite (local)	PostgreSQL (Railway)	$0 (included)
PluralKit API	Free (webhook only)	Free	$0
Logging (optional)	stdout	Sentry	$0-50
Custom Domain	discord.bot.app	your-domain.com	$12+
TOTAL	$0	$0-20	-

Summary

The 2025 recommended stack for Vivi Speech Translator is:

Framework: discord.py 2.6.4 (Python 3.10+) Database: SQLite (MVP) → PostgreSQL (production) with asyncpg/aiosqlite PluralKit: pluralkit.py library + webhook dispatch events Hosting: Railway Cloud ($0-5/month) Libraries: emoji 2.11.0+, pydantic 2.5.0+, aiohttp 3.9.0+

This stack prioritizes maintainability (discord.py is actively maintained), ecosystem maturity (largest Python Discord community), cost-effectiveness (free tier sufficient), and reliability (proven in production by 1000+ bots).

Start with MVP (SQLite, local development) to validate emoji detection logic, then migrate to PostgreSQL on Railway for multi-server deployment. Avoid Pycord (unmaintained), don't request message content intent (use slash commands instead), and leverage webhook dispatch for efficient PluralKit integration.

23 KiB Raw Blame History

Stack Research: Vivi Speech Translator

Executive Summary

Discord Bot Framework

Language

Database

Detailed Comparison

PluralKit Integration

How PluralKit Works

Detection Mechanisms

Implementation Approach for discord.py

Integration Libraries

API Endpoints Needed

Rate Limits

Key Libraries

Why These Specific Libraries

Hosting & Deployment

Recommended Approach: Cloud PaaS (Hybrid Model)

Alternative Options

Option 2: Oracle Cloud (Free Tier) + Self-Hosted Bot

Option 3: Render (Free Tier Deprecated)

Option 4: Self-Hosted on Raspberry Pi / Home Server

Authentication & Permissions (Discord OAuth2/Intents)

Required Intents

Why NOT to Request Message Content Intent

Required Permissions

OAuth2 Setup

MFA Requirement

Related Anti-patterns

❌ Using Pycord in 2025

❌ Storing Bot Token in Code

❌ Requesting MessageContent Intent "Just in Case"

❌ Syncing Emoji Mappings via REST API Polling

❌ Building Custom PluralKit Webhook Signature Verification

❌ Storing Full Emoji History Without Expiry

❌ Using Synchronous Libraries (requests, sqlite3)

Implementation Roadmap (Greenfield)

Phase 1: MVP (Weeks 1-2)

Phase 2: PluralKit Integration (Weeks 3-4)

Phase 3: Production Prep (Weeks 5-6)

Phase 4: Scaling (Weeks 7+)

Cost Breakdown (Monthly)

Summary

References & Sources

23 KiB

Raw Blame History