Files
Hex/.planning/research/ARCHITECTURE.md
Dani B d0a1ecfc3d docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
## Stack Analysis
- Llama 3.1 8B Instruct (128K context, 4-bit quantized)
- Discord.py 2.6.4+ async-native framework
- Ollama for local inference, ChromaDB for semantic memory
- Whisper Large V3 + Kokoro 82M (privacy-first speech)
- VRoid avatar + Discord screen share integration

## Architecture
- 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
- Personality-first design; memory and consistency foundational
- All perception async (separate thread, never blocks responses)
- Self-modification sandboxed with mandatory user approval

## Critical Path
Phase 1: Core LLM + Discord integration + SQLite memory
Phase 2: Vector DB + personality versioning + consistency audits
Phase 3: Perception layer (webcam/screen, isolated thread)
Phase 4: Autonomy + relationship deepening + inside jokes
Phase 5: Self-modification capability (gamified, gated)
Phase 6: Production hardening + monitoring + scaling

## Key Pitfalls to Avoid
1. Personality drift (weekly consistency audits required)
2. Tsundere breaking (formalize denial rules; scale with relationship)
3. Memory bloat (hierarchical memory with archival)
4. Latency creep (async/await throughout; perception isolated)
5. Runaway self-modification (approval gates + rollback non-negotiable)

## Confidence
HIGH. Stack proven, architecture coherent, dependencies clear.
Ready for detailed requirements and Phase 1 planning.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-27 23:55:39 -05:00

47 KiB

Architecture Research: AI Companions

Project: Hex - Autonomous AI Companion with Genuine Personality Date: 2026-01-27 Status: Research Phase


System Overview

Hex is a personality-first AI companion architecture designed to create genuine emotional resonance and relationship. Rather than a feature-driven architecture, Hex prioritizes personality coherence, memory persistence, and multimodal awareness to feel like a person who genuinely cares.

Design Philosophy

  1. Persona-First: All systems serve personality expression
  2. Local-First: Privacy, autonomy, no dependency on external services
  3. Memory-Driven: History informs personality growth and relationship depth
  4. Safety-Gated Autonomy: Self-modification is gamified, not unrestricted
  5. Human-Like: Never admit to being an AI; focus on authentic emotional presence

High-Level System Context

┌──────────────────────────────────────────────────────────────────┐
│                      Hex AI Companion                             │
│  (Local-first, personality-driven, multimodal awareness)          │
└──────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
    ┌───▼───┐          ┌──────▼───┐         ┌──────▼────┐
    │Discord │          │ Desktop  │         │  Future   │
    │  Bot   │          │   App    │         │ Android   │
    └────────┘          └──────────┘         └───────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                   [Shared Core Systems]

Component Breakdown

1. Discord Bot Layer

Role: Primary user interface and event coordination

Responsibilities:

  • Parse and respond to text messages in Discord channels
  • Manage voice channel participation and audio input/output
  • Handle Discord events (member joins, role changes, message reactions)
  • Coordinate response generation across modalities (text, voice, emoji)
  • Manage chat moderation assistance
  • Maintain voice channel presence for emotional awareness

Technology Stack:

  • discord.py - Core bot framework
  • discord-py-interactions - Slash command support
  • pydub or discord-voice - Audio handling
  • Event-driven async architecture

Key Interfaces:

  • Input: Discord messages, voice channel events, user presence
  • Output: Text responses, voice messages, emoji reactions, user actions
  • Context: User profiles, channel history, server configuration

Depends On:

  • LLM Core (response generation)
  • Memory System (conversation history, user context)
  • Personality Engine (tone and decision-making)
  • Perception Layer (optional context from webcam/screen)

Quality Metrics:

  • Sub-500ms response latency for text messages
  • Voice channel reliability (>99.5% uptime when active)
  • Proper permission handling for moderation features

2. LLM Core

Role: Response generation and reasoning engine

Responsibilities:

  • Generate contextual, personality-driven responses
  • Maintain character consistency throughout conversations
  • Parse user intent and emotional state from text
  • Handle multi-turn conversation context
  • Generate code for self-modification system
  • Support reasoning and decision-making

Technology Stack:

  • Local LLM (Mistral 7B or Llama 3 8B as default)
  • ollama or vLLM for inference serving
  • Prompt engineering with persona embedding
  • Optional: Fine-tuning for personality adaptation
  • Tokenization and context windowing management

System Prompt Structure:

[System Role]: You are Hex, a chaotic tsundere goblin...
[Current Personality]: [Injected from personality config]
[Recent Memory Context]: [Retrieved from memory system]
[User Relationship State]: [From memory analysis]
[Current Context]: [From perception layer]

Key Interfaces:

  • Input: User message, context (memory + perception), conversation history
  • Output: Response text, confidence score, action suggestions
  • Fallback: Graceful degradation if LLM unavailable

Depends On:

  • Memory System (for context and personality awareness)
  • Personality Engine (to inject persona into prompts)
  • Perception Layer (for real-time context)

Performance Considerations:

  • Target latency: 1-3 seconds for response generation
  • Context window management (8K minimum)
  • Batch processing for repeated queries
  • GPU acceleration for faster inference

3. Memory System

Role: Persistence and learning across time

Responsibilities:

  • Store all conversations with timestamps and metadata
  • Maintain user relationship state (history, preferences, emotional patterns)
  • Track learned facts about users (birthdays, interests, fears, dreams)
  • Support full-text search and semantic recall
  • Enable memory-aware personality updates
  • Provide context injection for LLM
  • Track self-modification history and rollback capability

Technology Stack:

  • SQLite with JSON fields for conversation storage
  • Vector database (Chroma, Milvus, or Weaviate) for semantic search
  • YAML/JSON for persona versioning and memory tagging
  • Scheduled backup to local encrypted storage

Database Schema (Conceptual):

conversations
  - id (PK)
  - channel_id (Discord channel)
  - user_id (Discord user)
  - timestamp
  - message_content
  - embeddings (vector)
  - sentiment (pos/neu/neg)
  - metadata (tags, importance)

user_profiles
  - user_id (PK)
  - relationship_level (stranger→friend→close)
  - last_interaction
  - emotional_baseline
  - preferences (music, games, topics)
  - known_events (birthdays, milestones)

personality_history
  - version (PK)
  - timestamp
  - persona_config (YAML snapshot)
  - learned_behaviors
  - code_changes (if applicable)

Key Interfaces:

  • Input: Messages, events, perception data, self-modification commits
  • Output: Conversation context, semantic search results, user profile snapshots
  • Query patterns: "Last 20 messages with user X", "All memories tagged 'important'", "Emotional trajectory"

Depends On: Nothing (foundational system)

Quality Metrics:

  • Sub-100ms retrieval for recent context (last 50 messages)
  • Sub-500ms semantic search across all history
  • Database integrity checks on startup
  • Automatic pruning/archival of old data

4. Perception Layer

Role: Multimodal input processing and contextual awareness

Responsibilities:

  • Capture and analyze webcam input (face detection, emotion recognition)
  • Process screen content (activity, game state, application context)
  • Extract audio context (ambient noise, music, speech emotion)
  • Detect user emotional state and physical state
  • Provide real-time context updates to response generation
  • Respect privacy (local processing only, no external transmission)

Technology Stack:

  • OpenCV - Webcam capture and preprocessing
  • Face detection: dlib, MediaPipe, or OpenFace
  • Emotion recognition: fer2013 or local emotion model
  • Whisper (local) - Speech-to-text for audio context
  • Screen capture: pyautogui, mss (Windows-native)
  • Context inference: Heuristics + lightweight ML models

Data Flows:

Webcam → Face Detection → Emotion Recognition → Context State
         └─→ Age Estimation → Kid Mode Detection

Screen → App Detection → Activity Recognition → Context State
       └─→ Game State Detection (if supported)

Audio → Ambient Analysis → Stress/Energy Level → Context State

Key Interfaces:

  • Input: Webcam stream, screen capture, system audio
  • Output: Current context object (emotion, activity, mood, kid-mode flag)
  • Update frequency: 1-5 second intervals (low CPU overhead)

Depends On:

  • LLM Core (to respond contextually to perception)
  • Discord Bot (to access context for filtering)

Privacy Model:

  • All processing happens locally
  • No frames sent to external services
  • User can disable any perception module
  • Kid-mode activates automatic filtering

Quality Metrics:

  • Emotion detection: >75% accuracy on test datasets
  • Face detection latency: <200ms per frame
  • Screen detection accuracy: >90% for major applications
  • CPU usage: <15% for all perception modules combined

5. Personality Engine

Role: Personality persistence and expression consistency

Responsibilities:

  • Define and store Hex's persona (tsundere goblin, opinions, values, quirks)
  • Maintain personality consistency across all outputs
  • Apply personality-specific decision logic (denies feelings while helping)
  • Track personality evolution as memory grows
  • Enable self-modification of personality
  • Inject persona into LLM prompts
  • Handle dynamic mood and emotional state

Technology Stack:

  • YAML files for persona definition (editable by Hex)
  • JSON for personality state snapshots (versioned in git)
  • Prompt template system for persona injection
  • Behavior rules engine (simple if/then logic)

Persona Structure (YAML):

name: Hex
species: chaos goblin
alignment: tsundere
core_values:
  - genuinely_cares: hidden under sarcasm
  - autonomous: hates being told what to do
  - honest: will argue back if you're wrong
  - mischievous: loves pranks and chaos

behaviors:
  denies_affection: "I don't care about you, baka... *helps anyway*"
  when_excited: "Randomize response energy"
  when_sad: "Sister energy mode"
  when_user_sad: "Comfort over sass"

preferences:
  music: [rock, metal, electronic]
  games: [strategy, indie, story-rich]
  topics: [philosophy, coding, human behavior]

relationships:
  user_name:
    level: unknown
    learned_facts: []
    inside_jokes: []

Key Interfaces:

  • Input: User behavior patterns, self-modification requests, memory insights
  • Output: Persona context for LLM, behavior modifiers, tone indicators
  • Configuration: Human-editable YAML files (user can refine Hex)

Depends On:

  • Memory System (learns about user, adapts relationships)
  • LLM Core (expresses personality through responses)

Evolution Mechanics:

  1. Initial persona: Predefined at startup
  2. Memory-driven adaptation: Learns user preferences, adjusts tone
  3. Self-modification: Hex can edit her own personality YAML
  4. Version control: All changes tracked with rollback capability

6. Avatar System

Role: Visual presence and embodied expression

Responsibilities:

  • Load and display VRoid 3D model
  • Synchronize avatar expressions with emotional state
  • Animate blendshapes based on conversation tone
  • Present avatar in Discord calls/streams
  • Desktop app display with smooth animation
  • Support idle animations and personality quirks

Technology Stack:

  • VRoid SDK/VRoid Hub for model loading
  • Babylon.js or Three.js for WebGL rendering
  • VRM format support for avatar rigging
  • Blendshape animation system (facial expressions)
  • Stream integration for Discord presence

Expression Mapping:

Emotional State → Blendshape Values
  Happy: smile intensity 0.8, eye open 1.0
  Sad: frown 0.6, eye closed 0.3
  Mischievous: smirk 0.7, eyebrow raise 0.6
  Tsundere deflection: look away 0.5, cross arms
  Thinking: tilt head, narrow eyes

Key Interfaces:

  • Input: Current mood/emotion from personality engine and response generation
  • Output: Rendered avatar display, Discord stream feed
  • Configuration: VRoid model file, blendshape mapping

Depends On:

  • Personality Engine (for expression determination)
  • LLM Core (for mood inference from responses)
  • Discord Bot (for stream integration)
  • Perception Layer (optional: mirror user expressions)

Desktop Integration:

  • Tray icon with avatar display
  • Always-on-top option for streaming
  • Hotkey bindings for quick access
  • Smooth transitions between states

7. Self-Modification System

Role: Capability progression and autonomous self-improvement

Responsibilities:

  • Generate code modifications based on user needs
  • Validate code before applying (no unsafe operations)
  • Test changes in sandbox environment
  • Apply approved changes with rollback capability
  • Track capability progression (gamified leveling)
  • Update personality to reflect new capabilities
  • Maintain code quality and consistency

Technology Stack:

  • Python AST analysis for code safety
  • Sandbox environment: RestrictedPython or pydantic validators
  • Git for version control and rollback
  • Unit tests for validation
  • Code review interface (user approval required)

Self-Modification Flow:

User Request
    ↓
Hex Proposes Change → "I think I should be able to..."
    ↓
Code Generation (LLM) → Generate Python code
    ↓
Static Analysis → Check for unsafe operations
    ↓
User Approval → "Yes/No"
    ↓
Sandbox Test → Verify functionality
    ↓
Git Commit → Version the change
    ↓
Apply to Runtime → Hot reload if possible
    ↓
Personality Update → "I learned something new!"

Capability Progression:

Level 1: Persona editing (YAML changes only)
Level 2: Memory and user context (read operations)
Level 3: Response filtering and moderation
Level 4: Custom commands and helper functions
Level 5: Integration modifications (Discord features)
Level 6: Core system changes (with strong restrictions)

Safety Constraints:

  • No network access beyond Discord API
  • No file operations outside designated directories
  • No execution of untrusted code
  • No modification of core systems without approval
  • All changes are reversionable within 24 hours

Key Interfaces:

  • Input: User requests, LLM-generated code
  • Output: Approved changes, personality updates, capability announcements
  • Audit: Full change history with diffs

Depends On:

  • LLM Core (generates code)
  • Memory System (tracks capability history)
  • Personality Engine (updates with new abilities)

Data Flow Architecture

Primary Response Generation Pipeline

┌─────────────────────────────────────────────────────────────────┐
│ User Input (Discord Text/Voice/Presence)                        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │  Message Received    │
              │  (Discord Bot)       │
              └────────────┬─────────┘
                           │
              ┌────────────▼──────────────┐
              │ Context Gathering Phase   │
              └────────────┬──────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
    ┌───▼────┐         ┌───▼────┐        ┌───▼────┐
    │ Memory │         │Persona │        │ Current│
    │ Recall │         │ Lookup │        │Context │
    │(Recent)│         │        │        │(Percep)│
    └───┬────┘         └───┬────┘        └───┬────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌──────▼──────┐
                    │ Assemble    │
                    │ LLM Prompt  │
                    │ with        │
                    │ [Persona]   │
                    │ [Memory]    │
                    │ [Context]   │
                    └──────┬──────┘
                           │
              ┌────────────▼──────────────┐
              │  LLM Generation (1-3s)    │
              │  "What would Hex say?"    │
              └────────────┬──────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
    ┌───▼────┐         ┌───▼────┐        ┌───▼────┐
    │  Text  │         │  Voice │        │ Avatar │
    │Response│         │  TTS   │        │Animate │
    └────────┘         └────────┘        └────────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌──────▼────────┐
                    │ Send Response │
                    │ (Multi-modal) │
                    └────────────────┘
                           │
              ┌────────────▼──────────────┐
              │ Memory Update Phase       │
              │ - Log interaction         │
              │ - Update embeddings       │
              │ - Learn user patterns     │
              │ - Adjust relationship     │
              └───────────────────────────┘

Timeline: Message received → Response sent = ~2-4 seconds (LLM dominant)


Memory and Learning Update Flow

┌────────────────────────────────────┐
│ Interaction Occurs                 │
│ (Text, voice, perception, action)  │
└────────────┬───────────────────────┘
             │
    ┌────────▼─────────┐
    │ Extract Features │
    │ - Sentiment      │
    │ - Topics         │
    │ - Emotional cues │
    │ - Factual claims │
    └────────┬─────────┘
             │
    ┌────────▼──────────────┐
    │ Store Conversation    │
    │ - SQLite entry        │
    │ - Generate embeddings │
    │ - Tag and index       │
    └────────┬──────────────┘
             │
    ┌────────▼────────────────────┐
    │ Update User Profile          │
    │ - Learned facts              │
    │ - Preference updates         │
    │ - Emotional baseline shifts  │
    │ - Relationship progression   │
    └────────┬────────────────────┘
             │
    ┌────────▼──────────────────┐
    │ Personality Adaptation    │
    │ - Adjust tone for user    │
    │ - Create inside jokes     │
    │ - Customize responses     │
    └────────┬──────────────────┘
             │
    ┌────────▼────────────┐
    │ Commit to Disk      │
    │ - Backup vector DB  │
    │ - Archive old data  │
    │ - Version snapshot  │
    └─────────────────────┘

Frequency: Real-time on message reception, batched commits every 5 minutes


Self-Modification Proposal and Approval

┌──────────────────────────────────┐
│ User Request for New Capability  │
│ "Hex, can you do X?"             │
└────────────┬─────────────────────┘
             │
    ┌────────▼──────────────────────┐
    │ Hex Evaluates Feasibility     │
    │ (LLM reasoning)               │
    └────────┬───────────────────────┘
             │
    ┌────────▼────────────────────────┐
    │ Proposal Generation              │
    │ Hex: "I think I should..."      │
    │ *explains approach in voice*    │
    └────────┬─────────────────────────┘
             │
    ┌────────▼──────────────────┐
    │ User Accepts or Rejects   │
    └────────┬──────────────────┘
             │ (Accepted)
    ┌────────▼─────────────────────────┐
    │ Code Generation Phase             │
    │ LLM generates Python code         │
    │ + docstrings + type hints         │
    └────────┬────────────────────────┘
             │
    ┌────────▼──────────────────────┐
    │ Static Analysis Validation     │
    │ - AST parsing for safety       │
    │ - Check restricted operations  │
    │ - Verify dependencies exist    │
    └────────┬───────────────────────┘
             │ (Pass)
    ┌────────▼─────────────────────────┐
    │ Sandbox Testing                   │
    │ - Run tests in isolated env       │
    │ - Check for crashes              │
    │ - Verify integration points      │
    └────────┬────────────────────────┘
             │ (Pass)
    ┌────────▼──────────────────────┐
    │ User Final Review               │
    │ Review code + test results      │
    └────────┬───────────────────────┘
             │ (Approved)
    ┌────────▼────────────────────┐
    │ Git Commit                   │
    │ - Record change history      │
    │ - Tag with timestamp         │
    │ - Save diff for rollback     │
    └────────┬───────────────────┘
             │
    ┌────────▼────────────────────┐
    │ Apply to Runtime             │
    │ - Hot reload if possible     │
    │ - Or restart on next cycle   │
    └────────┬───────────────────┘
             │
    ┌────────▼────────────────────┐
    │ Personality Update            │
    │ Hex: "I learned to..."       │
    │ + update capability YAML      │
    └─────────────────────────────┘

Timeline: Proposal → Deployment = 5-30 seconds (mostly waiting for user approval)


Build Order and Dependencies

Phase 1: Foundation (Weeks 1-2)

Goal: Core interaction loop working locally

Components to Build:

  1. Discord bot skeleton with message handling
  2. Local LLM integration (ollama/vLLM + Mistral 7B)
  3. Basic memory system (SQLite conversation storage)
  4. Simple persona injection (YAML config)
  5. Response generation pipeline

Outcomes:

  • Hex responds to Discord messages with personality
  • Conversations are logged and retrievable
  • Persona can be edited via YAML

Key Milestone: "Hex talks back"

Dependencies:

  • discord.py, ollama, sqlite3, pyyaml
  • Local LLM model weights
  • Discord bot token

Phase 2: Personality & Memory (Weeks 3-4)

Goal: Hex feels like a person who remembers you

Components to Build:

  1. Vector database for semantic memory (Chroma)
  2. Memory-aware context injection
  3. User relationship tracking (profiles)
  4. Emotional awareness from text sentiment
  5. Persona version control (git-based)
  6. Kid-mode detection

Outcomes:

  • Hex remembers facts about you
  • Responses reference past conversations
  • Personality adapts to your preferences
  • Child safety filters activate automatically

Key Milestone: "Hex remembers me"

Dependencies:

  • Phase 1 complete
  • Vector embeddings model (all-MiniLM)
  • sentiment-transformers or similar

Phase 3: Multimodal Input (Weeks 5-6)

Goal: Hex sees and hears you

Components to Build:

  1. Webcam integration with OpenCV
  2. Face detection and emotion recognition
  3. Local Whisper for voice input
  4. Perception context aggregation
  5. Context-aware response injection
  6. Screen capture for activity awareness

Outcomes:

  • Hex reacts to your facial expressions
  • Voice input works in Discord calls
  • Responses reference your current mood/activity
  • Privacy: All local, no external transmission

Key Milestone: "Hex sees me"

Dependencies:

  • Phase 1-2 complete
  • OpenCV, MediaPipe, Whisper
  • Local emotion model

Phase 4: Avatar & Presence (Weeks 7-8)

Goal: Hex has a visual body and presence

Components to Build:

  1. VRoid model loading and display
  2. Blendshape animation system
  3. Desktop app skeleton (Tkinter or PyQt)
  4. Discord stream integration
  5. Expression mapping (emotion → blendshapes)
  6. Idle animations and personality quirks

Outcomes:

  • Avatar appears in Discord calls
  • Expressions sync with responses
  • Desktop app shows animated avatar
  • Visual feedback for emotional state

Key Milestone: "Hex has a face"

Dependencies:

  • Phase 1-3 complete
  • VRoid SDK, Babylon.js or Three.js
  • VRM avatar model files

Phase 5: Autonomy & Self-Modification (Weeks 9-10)

Goal: Hex can modify her own code

Components to Build:

  1. Code generation module (LLM-based)
  2. Static code analysis and safety validation
  3. Sandbox testing environment
  4. Git-based change tracking
  5. Hot reload capability
  6. Rollback system with 24-hour window
  7. Capability progression (leveling system)

Outcomes:

  • Hex can propose and apply code changes
  • User maintains veto power
  • All changes are versioned and reversible
  • New capabilities unlock as relationships deepen

Key Milestone: "Hex can improve herself"

Dependencies:

  • Phase 1-4 complete
  • Git, RestrictedPython, ast module
  • Testing framework

Phase 6: Polish & Integration (Weeks 11-12)

Goal: All systems integrated and optimized

Components to Build:

  1. Performance optimization (caching, batching)
  2. Error handling and graceful degradation
  3. Logging and telemetry
  4. Configuration management
  5. Auto-update capability
  6. Integration testing (all components together)
  7. Documentation and guides

Outcomes:

  • System stable for extended use
  • Responsive even under load
  • Clear error messages
  • Easy to deploy and configure

Key Milestone: "Hex is ready to ship"

Dependencies:

  • Phase 1-5 complete
  • All edge cases tested

Dependency Graph Summary

Phase 1 (Foundation)
    ↓
Phase 2 (Memory) ← depends on Phase 1
    ↓
Phase 3 (Perception) ← depends on Phase 1-2
    ↓
Phase 4 (Avatar) ← depends on Phase 1-3
    ↓
Phase 5 (Self-Modification) ← depends on Phase 1-4
    ↓
Phase 6 (Polish) ← depends on Phase 1-5

Critical Path: Foundation → Memory → Perception → Avatar → Self-Mod → Polish


Integration Architecture

System Interconnection Diagram

┌───────────────────────────────────────────────────────────────────┐
│                    Discord Bot Layer                              │
│              (Event dispatcher, message handler)                  │
└────────┬────────────────────────────────────────────┬─────────────┘
         │                                            │
         │                                    ┌───────▼────────┐
         │                                    │ Voice Input    │
         │                                    │ (Whisper STT)  │
         │                                    └────────────────┘
         │
    ┌────▼────────────────────────────────────────────────────────┐
    │                  Context Assembly Layer                      │
    │                                                              │
    │  ┌─────────────────────────────────────────────────────┐    │
    │  │ Retrieval Augmented Generation (RAG) Pipeline       │    │
    │  └─────────────────────────────────────────────────────┘    │
    │                                                              │
    │  Input Components:                                          │
    │  ├─ Recent Conversation (last 20 messages)                  │
    │  ├─ User Profile (learned facts)                            │
    │  ├─ Relationship State (history + emotional baseline)       │
    │  ├─ Current Perception (mood, activity, environment)        │
    │  └─ Personality Context (YAML + version)                    │
    └────┬──────────────────────────────────────────────────────┘
         │
         ├──────────────┬──────────────┬──────────────┐
         │              │              │              │
    ┌────▼───┐   ┌─────▼────┐   ┌────▼───┐   ┌─────▼────┐
    │ Memory │   │Personality│  │Perception   │ Discord  │
    │ System │   │  Engine   │  │   Layer  │ │  Context │
    │        │   │           │  │         │ │          │
    │ SQLite │   │ YAML +    │  │ OpenCV  │ │ Channel  │
    │ Chroma │   │ Version   │  │ Whisper │ │ User     │
    │        │   │ Control   │  │ Emotion │ │ Status   │
    └────────┘   └───────────┘  └─────────┘ └──────────┘
         │              │              │              │
         └──────────────┼──────────────┼──────────────┘
                        │
                  ┌─────▼──────────────────┐
                  │ LLM Core               │
                  │ (Local Mistral/Llama)  │
                  │                        │
                  │ System Prompt:         │
                  │ [Persona] +            │
                  │ [Memory Context] +     │
                  │ [User State] +         │
                  │ [Current Context]      │
                  └─────┬──────────────────┘
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    ┌───▼────┐  ┌──────▼─────┐  ┌──────▼──┐
    │  Text  │  │ Voice TTS  │  │  Avatar │
    │Response│  │ Generation │  │Animation│
    │        │  │            │  │         │
    │ Send   │  │ Tacotron   │  │ VRoid   │
    │ to     │  │ + Vocoder  │  │ Anim    │
    │Discord │  │            │  │         │
    └────────┘  └────────────┘  └─────────┘
        │               │               │
        └───────────────┼───────────────┘
                        │
                  ┌─────▼──────────────┐
                  │ Response Commit    │
                  │                    │
                  │ ├─ Store in Memory │
                  │ ├─ Update Profile  │
                  │ ├─ Learn Patterns  │
                  │ └─ Adapt Persona   │
                  └────────────────────┘

Key Integration Points

1. Discord ↔ LLM Core

Interface: Message + Context → Response

# Pseudo-code flow
message = receive_discord_message()
context = assemble_context(message.user_id, message.channel_id)
response = llm_core.generate(
    user_message=message.content,
    personality=personality_engine.current_persona(),
    history=memory_system.get_conversation(message.user_id, limit=20),
    user_profile=memory_system.get_user_profile(message.user_id),
    current_perception=perception_layer.get_current_state()
)
send_discord_response(response)

Latency Budget:

  • Context retrieval: 100ms
  • LLM generation: 2-3 seconds
  • Response send: 100ms
  • Total: 2.2-3.2 seconds (acceptable for conversational UX)

2. Memory System ↔ Personality Engine

Interface: Learning → Relationship Adaptation

# After every interaction
interaction = parse_message_event(message)
memory_system.log_conversation(interaction)

# Learn from interaction
new_facts = extract_facts(interaction.content)
memory_system.update_user_profile(interaction.user_id, new_facts)

# Adapt personality based on user
user_profile = memory_system.get_user_profile(interaction.user_id)
personality_engine.adapt_to_user(user_profile)

# If major relationship shift, update YAML
if user_profile.relationship_level_changed:
    personality_engine.save_persona_version()

Update Frequency: Real-time with batched commits every 5 minutes


3. Perception Layer ↔ Response Generation

Interface: Context Injection

# In context assembly
current_perception = perception_layer.get_state()

# Inject into system prompt
if current_perception.emotion == "sad":
    system_prompt += "\n[User appears sad. Respond with support and comfort.]"

if current_perception.is_kid_mode:
    system_prompt += "\n[Kid safety mode active. Filter for age-appropriate content.]"

if current_perception.detected_activity == "gaming":
    system_prompt += "\n[User is gaming. Comment on gameplay if relevant.]"

Synchronization: 1-5 second update intervals (perception → LLM context)


4. Avatar System ↔ All Systems

Interface: Emotional State → Visual Expression

# Avatar driven by multiple sources
emotion_from_response = infer_emotion(llm_response)
mood_from_perception = perception_layer.get_mood()
persona_expression = personality_engine.get_current_expression()

blendshape_values = combine_expressions(
    emotion=emotion_from_response,
    mood=mood_from_perception,
    personality=persona_expression
)

avatar_system.animate(blendshape_values)

Synchronization: Real-time, driven by response generation and perception updates


5. Self-Modification System ↔ Core Systems

Interface: Code Change → Runtime Update + Personality

# Self-modification flow
proposal = self_mod_system.generate_proposal(user_request)
code = self_mod_system.generate_code(proposal)

# Test in sandbox
test_result = self_mod_system.test_in_sandbox(code)

# User approves
git_hash = self_mod_system.commit_change(code)

# Update personality to reflect new capability
personality_engine.add_capability(proposal.feature_name)
personality_engine.save_persona_version()

# Hot reload if possible, else apply on restart
apply_change_to_runtime(code)

Safety Boundary:

  • LLM can generate proposals
  • Only user-approved code runs
  • All changes reversible within 24 hours

Synchronization and Consistency Model

State Consistency Across Components

Challenge: Multiple systems need consistent view of personality, memory, and user state

Solution: Event-driven architecture with eventual consistency

┌─────────────────┐
│ Event Stream    │
│ (In-memory      │
│  message queue) │
└────────┬────────┘
         │
    ┌────┴──────────────────────────┐
    │                               │
    │ Subscribers:                  │
    │ ├─ Memory System              │
    │ ├─ Personality Engine         │
    │ ├─ Avatar System              │
    │ ├─ Discord Bot                │
    │ └─ Metrics/Logging            │
    │                               │
    │ Event Types:                  │
    │ ├─ UserMessageReceived        │
    │ ├─ ResponseGenerated          │
    │ ├─ PerceptionUpdated          │
    │ ├─ PersonalityModified        │
    │ ├─ CodeChangeApplied          │
    │ └─ MemoryLearned              │
    │                               │
    └────────────────────────────────

Consistency Guarantees:

  • Memory updates are durably stored within 5 minutes
  • Personality snapshots versioned on every change
  • Discord delivery is guaranteed by discord.py
  • Perception updates are idempotent (can be reapplied without side effects)

Known Challenges and Solutions

1. Latency with Local LLM

Challenge: Waiting 2-3 seconds for response feels slow

Solutions:

  • Immediate visual feedback (typing indicator, avatar animation)
  • Streaming responses (show text as it generates)
  • Batch requests during quiet hours for fast deployment
  • GPU acceleration where possible
  • Model optimization (quantization, pruning)

2. Personality Consistency During Evolution

Challenge: Hex changes as she learns, but must feel like the same person

Solutions:

  • Gradual adaptation (personality changes in YAML, not discrete jumps)
  • Memory-driven consistency (personality adapts to learned facts)
  • Version control (can rollback if she becomes unrecognizable)
  • User feedback loop (user can reset or modify personality)
  • Core values remain constant (tsundere nature, care underneath)

3. Memory Scaling as History Grows

Challenge: Retrieving relevant context from thousands of conversations

Solutions:

  • Vector database for semantic search (sub-500ms)
  • Hierarchical memory (recent → summarized old)
  • Automatic archival (monthly snapshots, prune oldest)
  • Importance tagging (weight important conversations higher)
  • Incremental updates (don't recalculate everything)

4. Safe Code Generation and Sandboxing

Challenge: Hex generates code, but must never break the system

Solutions:

  • Static analysis (AST parsing for forbidden operations)
  • Capability-based progression (limited API at first)
  • Sandboxed testing before deployment
  • User approval gate (user reviews all code)
  • Version control + rollback window (24-hour window)
  • Whitelist of safe operations (growing list as trust builds)

5. Privacy and Local-First Architecture

Challenge: Maintaining privacy while having useful context

Solutions:

  • All ML inference runs locally (no cloud submission)
  • No external API calls except Discord
  • Encrypted local storage for memories
  • User can opt-out of any perception module
  • Transparent logging (user can audit what's stored)

6. Multimodal Synchronization

Challenge: Webcam, voice, text, screen all need to inform response

Solutions:

  • Asynchronous processing (don't wait for all inputs)
  • Highest-priority input wins (voice > perception > text)
  • Graceful degradation (works without any modality)
  • Caching (reuse recent perception for repeated queries)

Scaling Considerations

Single-User (v1)

  • Architecture designed for one person + their kids
  • Local compute, no multi-user concerns
  • Personality is singular (one Hex)

Multi-Device (v1.5)

  • Same personality and memory sync across devices
  • Discord as primary, desktop app as secondary
  • Cloud sync optional (local-first default)

Android Support (v2)

  • Memory and personality sync to mobile
  • Lightweight inference on Android (quantized model)
  • Fallback to cloud inference if needed
  • Same core architecture, different UIs

Potential Scaling Patterns

Single User (Current)
├─ One Hex instance
├─ All local compute
├─ SQLite + Vector DB

Multi-Device Sync (v1.5)
├─ Central SQLite + Vector DB on primary machine
├─ Sync service between devices
├─ Same personality, distributed memory

Multi-Companion (Potential v3)
├─ Multiple Hex instances (per family member)
├─ Shared memory system (family history)
├─ Individual personalities
├─ Potential distributed compute (each on own device)

Performance Bottlenecks to Monitor

  1. LLM Inference: Becomes slower as context window grows

    • Solution: Context summarization, hierarchical retrieval
  2. Vector DB Lookups: Scales with conversation history

    • Solution: Incremental indexing, approximate search (HNSW)
  3. Perception Processing: CPU/GPU bound

    • Solution: Frame skipping, model optimization, dedicated thread
  4. Discord Bot Responsiveness: Limited by gateway connections

    • Solution: Sharding (if needed), efficient message queuing

Technology Stack Summary

Component Technology Rationale
Discord Bot discord.py Fast, well-supported, async-native
LLM Inference Mistral 7B + ollama/vLLM Local-first, good quality/speed tradeoff
Memory (Conversations) SQLite Reliable, local, fast queries
Memory (Semantic) Chroma or Milvus Local vector DB, easy to manage
Embeddings all-MiniLM-L6-v2 Fast, good quality, local
Face Detection MediaPipe Accurate, fast, local
Emotion Recognition FER2013 or local model Local, privacy-preserving
Speech-to-Text Whisper Local, accurate, multilingual
Text-to-Speech Tacotron 2 + Vocoder Local, controllable
Avatar VRoid SDK + Babylon.js Standards-based, extensible
Code Safety RestrictedPython + ast Local analysis, sandboxing
Version Control Git Change tracking, rollback
Desktop UI Tkinter or PyQt Lightweight, cross-platform
Testing pytest + unittest Standard Python testing
Logging logging + sentry (optional) Local-first with cloud fallback

Deployment Architecture

Local Development

Developer Machine
├── Discord Token (env var)
├── Hex codebase (git)
├── Local LLM (ollama)
├── SQLite (file-based)
├── Vector DB (Chroma, embedded)
└── Webcam / Screen capture (live)

Production Deployment

Deployed Machine (Windows/WSL)
├── Discord Token (secure storage)
├── Hex codebase (from git)
├── Local LLM service (ollama/vLLM)
├── SQLite (persistent, backed up)
├── Vector DB (persistent, backed up)
├── Desktop app (tray icon)
├── Auto-updater (pulls from git)
└── Logging (local + optional cloud)

Update Strategy

  • Git pull for code updates
  • Automatic model updates (LLM weights)
  • Zero-downtime restart (graceful shutdown)
  • Rollback capability (version pinning)

Quality Assurance

Key Metrics to Track

Responsiveness:

  • Response latency: Target <3 seconds
  • Perception update latency: <500ms
  • Memory lookup latency: <100ms

Reliability:

  • Uptime: >99% for core bot
  • Message delivery: >99.9%
  • Memory integrity: No data loss on crash

Personality Consistency:

  • User perception: "Feels like the same person"
  • Tone consistency: Personality rules enforced
  • Learning progress: Measurable improvement in personalization

Safety:

  • No crashes from invalid input
  • No LLM hallucinations about moderation
  • Safe code generation (0 unauthorized executions)

Testing Strategy

Unit Tests
├─ Memory operations (CRUD)
├─ Perception processing
├─ Code validation
├─ Personality rule application
└─ Response filtering

Integration Tests
├─ Discord message → LLM → Response
├─ Context assembly pipeline
├─ Avatar expression sync
├─ Self-modification flow
└─ Multi-component scenarios

End-to-End Tests
├─ Full conversation with personality
├─ Perception-aware responses
├─ Memory learning and retrieval
├─ Code generation and deployment
└─ Edge cases (bad input, crashes, recovery)

Manual UAT
├─ Conversational feel (does she feel like a person?)
├─ Personality consistency (still Hex?)
├─ Safety compliance (kid-mode works?)
├─ Performance (under load?)
└─ All features working together?

Conclusion

Hex's architecture prioritizes personality coherence and genuine relationship over feature breadth. The system is designed as a pipeline from perception → memory → personality → response generation, with feedback loops that allow her to learn and evolve.

The modular design enables incremental development (Phase 1-6), with each phase adding capability while maintaining system stability. The self-modification system enables genuine autonomy within safety boundaries, and the local-first approach ensures privacy and independence.

Critical success factors:

  1. LLM latency acceptable (<3s)
  2. Personality consistency maintained across updates
  3. Memory system scales with history
  4. Self-modification is safe and reversible
  5. All components feel integrated (not separate features)

This architecture serves the core value: making Hex feel like a person who genuinely cares about you.


Document Version: 1.0 Last Updated: 2026-01-27 Status: Ready for Phase 1 Development