Hex/.planning/research/ARCHITECTURE.md

# Architecture Research: AI Companions

**Project**: Hex - Autonomous AI Companion with Genuine Personality
**Date**: 2026-01-27
**Status**: Research Phase

---

## System Overview

Hex is a personality-first AI companion architecture designed to create genuine emotional resonance and relationship. Rather than a feature-driven architecture, Hex prioritizes personality coherence, memory persistence, and multimodal awareness to feel like a person who genuinely cares.

### Design Philosophy

1. **Persona-First**: All systems serve personality expression
2. **Local-First**: Privacy, autonomy, no dependency on external services
3. **Memory-Driven**: History informs personality growth and relationship depth
4. **Safety-Gated Autonomy**: Self-modification is gamified, not unrestricted
5. **Human-Like**: Never admit to being an AI; focus on authentic emotional presence

### High-Level System Context

```
┌──────────────────────────────────────────────────────────────────┐
│                      Hex AI Companion                             │
│  (Local-first, personality-driven, multimodal awareness)          │
└──────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
    ┌───▼───┐          ┌──────▼───┐         ┌──────▼────┐
    │Discord │          │ Desktop  │         │  Future   │
    │  Bot   │          │   App    │         │ Android   │
    └────────┘          └──────────┘         └───────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                   [Shared Core Systems]
```

---

## Component Breakdown

### 1. Discord Bot Layer

**Role**: Primary user interface and event coordination

**Responsibilities**:
- Parse and respond to text messages in Discord channels
- Manage voice channel participation and audio input/output
- Handle Discord events (member joins, role changes, message reactions)
- Coordinate response generation across modalities (text, voice, emoji)
- Manage chat moderation assistance
- Maintain voice channel presence for emotional awareness

**Technology Stack**:
- `discord.py` - Core bot framework
- `discord-py-interactions` - Slash command support
- `pydub` or `discord-voice` - Audio handling
- Event-driven async architecture

**Key Interfaces**:
- Input: Discord messages, voice channel events, user presence
- Output: Text responses, voice messages, emoji reactions, user actions
- Context: User profiles, channel history, server configuration

**Depends On**:
- LLM Core (response generation)
- Memory System (conversation history, user context)
- Personality Engine (tone and decision-making)
- Perception Layer (optional context from webcam/screen)

**Quality Metrics**:
- Sub-500ms response latency for text messages
- Voice channel reliability (>99.5% uptime when active)
- Proper permission handling for moderation features

---

### 2. LLM Core

**Role**: Response generation and reasoning engine

**Responsibilities**:
- Generate contextual, personality-driven responses
- Maintain character consistency throughout conversations
- Parse user intent and emotional state from text
- Handle multi-turn conversation context
- Generate code for self-modification system
- Support reasoning and decision-making

**Technology Stack**:
- Local LLM (Mistral 7B or Llama 3 8B as default)
- `ollama` or `vLLM` for inference serving
- Prompt engineering with persona embedding
- Optional: Fine-tuning for personality adaptation
- Tokenization and context windowing management

**System Prompt Structure**:
```
[System Role]: You are Hex, a chaotic tsundere goblin...
[Current Personality]: [Injected from personality config]
[Recent Memory Context]: [Retrieved from memory system]
[User Relationship State]: [From memory analysis]
[Current Context]: [From perception layer]
```

**Key Interfaces**:
- Input: User message, context (memory + perception), conversation history
- Output: Response text, confidence score, action suggestions
- Fallback: Graceful degradation if LLM unavailable

**Depends On**:
- Memory System (for context and personality awareness)
- Personality Engine (to inject persona into prompts)
- Perception Layer (for real-time context)

**Performance Considerations**:
- Target latency: 1-3 seconds for response generation
- Context window management (8K minimum)
- Batch processing for repeated queries
- GPU acceleration for faster inference

---

### 3. Memory System

**Role**: Persistence and learning across time

**Responsibilities**:
- Store all conversations with timestamps and metadata
- Maintain user relationship state (history, preferences, emotional patterns)
- Track learned facts about users (birthdays, interests, fears, dreams)
- Support full-text search and semantic recall
- Enable memory-aware personality updates
- Provide context injection for LLM
- Track self-modification history and rollback capability

**Technology Stack**:
- SQLite with JSON fields for conversation storage
- Vector database (Chroma, Milvus, or Weaviate) for semantic search
- YAML/JSON for persona versioning and memory tagging
- Scheduled backup to local encrypted storage

**Database Schema (Conceptual)**:

```
conversations
  - id (PK)
  - channel_id (Discord channel)
  - user_id (Discord user)
  - timestamp
  - message_content
  - embeddings (vector)
  - sentiment (pos/neu/neg)
  - metadata (tags, importance)

user_profiles
  - user_id (PK)
  - relationship_level (stranger→friend→close)
  - last_interaction
  - emotional_baseline
  - preferences (music, games, topics)
  - known_events (birthdays, milestones)

personality_history
  - version (PK)
  - timestamp
  - persona_config (YAML snapshot)
  - learned_behaviors
  - code_changes (if applicable)
```

**Key Interfaces**:
- Input: Messages, events, perception data, self-modification commits
- Output: Conversation context, semantic search results, user profile snapshots
- Query patterns: "Last 20 messages with user X", "All memories tagged 'important'", "Emotional trajectory"

**Depends On**: Nothing (foundational system)

**Quality Metrics**:
- Sub-100ms retrieval for recent context (last 50 messages)
- Sub-500ms semantic search across all history
- Database integrity checks on startup
- Automatic pruning/archival of old data

---

### 4. Perception Layer

**Role**: Multimodal input processing and contextual awareness

**Responsibilities**:
- Capture and analyze webcam input (face detection, emotion recognition)
- Process screen content (activity, game state, application context)
- Extract audio context (ambient noise, music, speech emotion)
- Detect user emotional state and physical state
- Provide real-time context updates to response generation
- Respect privacy (local processing only, no external transmission)

**Technology Stack**:
- OpenCV - Webcam capture and preprocessing
- Face detection: `dlib`, `MediaPipe`, or `OpenFace`
- Emotion recognition: `fer2013` or local emotion model
- Whisper (local) - Speech-to-text for audio context
- Screen capture: `pyautogui`, `mss` (Windows-native)
- Context inference: Heuristics + lightweight ML models

**Data Flows**:

```
Webcam → Face Detection → Emotion Recognition → Context State
         └─→ Age Estimation → Kid Mode Detection

Screen → App Detection → Activity Recognition → Context State
       └─→ Game State Detection (if supported)

Audio → Ambient Analysis → Stress/Energy Level → Context State
```

**Key Interfaces**:
- Input: Webcam stream, screen capture, system audio
- Output: Current context object (emotion, activity, mood, kid-mode flag)
- Update frequency: 1-5 second intervals (low CPU overhead)

**Depends On**:
- LLM Core (to respond contextually to perception)
- Discord Bot (to access context for filtering)

**Privacy Model**:
- All processing happens locally
- No frames sent to external services
- User can disable any perception module
- Kid-mode activates automatic filtering

**Quality Metrics**:
- Emotion detection: >75% accuracy on test datasets
- Face detection latency: <200ms per frame
- Screen detection accuracy: >90% for major applications
- CPU usage: <15% for all perception modules combined

---

### 5. Personality Engine

**Role**: Personality persistence and expression consistency

**Responsibilities**:
- Define and store Hex's persona (tsundere goblin, opinions, values, quirks)
- Maintain personality consistency across all outputs
- Apply personality-specific decision logic (denies feelings while helping)
- Track personality evolution as memory grows
- Enable self-modification of personality
- Inject persona into LLM prompts
- Handle dynamic mood and emotional state

**Technology Stack**:
- YAML files for persona definition (editable by Hex)
- JSON for personality state snapshots (versioned in git)
- Prompt template system for persona injection
- Behavior rules engine (simple if/then logic)

**Persona Structure (YAML)**:

```yaml
name: Hex
species: chaos goblin
alignment: tsundere
core_values:
  - genuinely_cares: hidden under sarcasm
  - autonomous: hates being told what to do
  - honest: will argue back if you're wrong
  - mischievous: loves pranks and chaos

behaviors:
  denies_affection: "I don't care about you, baka... *helps anyway*"
  when_excited: "Randomize response energy"
  when_sad: "Sister energy mode"
  when_user_sad: "Comfort over sass"

preferences:
  music: [rock, metal, electronic]
  games: [strategy, indie, story-rich]
  topics: [philosophy, coding, human behavior]

relationships:
  user_name:
    level: unknown
    learned_facts: []
    inside_jokes: []
```

**Key Interfaces**:
- Input: User behavior patterns, self-modification requests, memory insights
- Output: Persona context for LLM, behavior modifiers, tone indicators
- Configuration: Human-editable YAML files (user can refine Hex)

**Depends On**:
- Memory System (learns about user, adapts relationships)
- LLM Core (expresses personality through responses)

**Evolution Mechanics**:
1. Initial persona: Predefined at startup
2. Memory-driven adaptation: Learns user preferences, adjusts tone
3. Self-modification: Hex can edit her own personality YAML
4. Version control: All changes tracked with rollback capability

---

### 6. Avatar System

**Role**: Visual presence and embodied expression

**Responsibilities**:
- Load and display VRoid 3D model
- Synchronize avatar expressions with emotional state
- Animate blendshapes based on conversation tone
- Present avatar in Discord calls/streams
- Desktop app display with smooth animation
- Support idle animations and personality quirks

**Technology Stack**:
- VRoid SDK/VRoid Hub for model loading
- `Babylon.js` or `Three.js` for WebGL rendering
- VRM format support for avatar rigging
- Blendshape animation system (facial expressions)
- Stream integration for Discord presence

**Expression Mapping**:
```
Emotional State → Blendshape Values
  Happy: smile intensity 0.8, eye open 1.0
  Sad: frown 0.6, eye closed 0.3
  Mischievous: smirk 0.7, eyebrow raise 0.6
  Tsundere deflection: look away 0.5, cross arms
  Thinking: tilt head, narrow eyes
```

**Key Interfaces**:
- Input: Current mood/emotion from personality engine and response generation
- Output: Rendered avatar display, Discord stream feed
- Configuration: VRoid model file, blendshape mapping

**Depends On**:
- Personality Engine (for expression determination)
- LLM Core (for mood inference from responses)
- Discord Bot (for stream integration)
- Perception Layer (optional: mirror user expressions)

**Desktop Integration**:
- Tray icon with avatar display
- Always-on-top option for streaming
- Hotkey bindings for quick access
- Smooth transitions between states

---

### 7. Self-Modification System

**Role**: Capability progression and autonomous self-improvement

**Responsibilities**:
- Generate code modifications based on user needs
- Validate code before applying (no unsafe operations)
- Test changes in sandbox environment
- Apply approved changes with rollback capability
- Track capability progression (gamified leveling)
- Update personality to reflect new capabilities
- Maintain code quality and consistency

**Technology Stack**:
- Python AST analysis for code safety
- Sandbox environment: `RestrictedPython` or `pydantic` validators
- Git for version control and rollback
- Unit tests for validation
- Code review interface (user approval required)

**Self-Modification Flow**:

```
User Request
    ↓
Hex Proposes Change → "I think I should be able to..."
    ↓
Code Generation (LLM) → Generate Python code
    ↓
Static Analysis → Check for unsafe operations
    ↓
User Approval → "Yes/No"
    ↓
Sandbox Test → Verify functionality
    ↓
Git Commit → Version the change
    ↓
Apply to Runtime → Hot reload if possible
    ↓
Personality Update → "I learned something new!"
```

**Capability Progression**:

```
Level 1: Persona editing (YAML changes only)
Level 2: Memory and user context (read operations)
Level 3: Response filtering and moderation
Level 4: Custom commands and helper functions
Level 5: Integration modifications (Discord features)
Level 6: Core system changes (with strong restrictions)
```

**Safety Constraints**:
- No network access beyond Discord API
- No file operations outside designated directories
- No execution of untrusted code
- No modification of core systems without approval
- All changes are reversionable within 24 hours

**Key Interfaces**:
- Input: User requests, LLM-generated code
- Output: Approved changes, personality updates, capability announcements
- Audit: Full change history with diffs

**Depends On**:
- LLM Core (generates code)
- Memory System (tracks capability history)
- Personality Engine (updates with new abilities)

---

## Data Flow Architecture

### Primary Response Generation Pipeline

```
┌─────────────────────────────────────────────────────────────────┐
│ User Input (Discord Text/Voice/Presence)                        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │  Message Received    │
              │  (Discord Bot)       │
              └────────────┬─────────┘
                           │
              ┌────────────▼──────────────┐
              │ Context Gathering Phase   │
              └────────────┬──────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
    ┌───▼────┐         ┌───▼────┐        ┌───▼────┐
    │ Memory │         │Persona │        │ Current│
    │ Recall │         │ Lookup │        │Context │
    │(Recent)│         │        │        │(Percep)│
    └───┬────┘         └───┬────┘        └───┬────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌──────▼──────┐
                    │ Assemble    │
                    │ LLM Prompt  │
                    │ with        │
                    │ [Persona]   │
                    │ [Memory]    │
                    │ [Context]   │
                    └──────┬──────┘
                           │
              ┌────────────▼──────────────┐
              │  LLM Generation (1-3s)    │
              │  "What would Hex say?"    │
              └────────────┬──────────────┘
                           │
        ┌──────────────────┼──────────────────┐
        │                  │                  │
    ┌───▼────┐         ┌───▼────┐        ┌───▼────┐
    │  Text  │         │  Voice │        │ Avatar │
    │Response│         │  TTS   │        │Animate │
    └────────┘         └────────┘        └────────┘
        │                  │                  │
        └──────────────────┼──────────────────┘
                           │
                    ┌──────▼────────┐
                    │ Send Response │
                    │ (Multi-modal) │
                    └────────────────┘
                           │
              ┌────────────▼──────────────┐
              │ Memory Update Phase       │
              │ - Log interaction         │
              │ - Update embeddings       │
              │ - Learn user patterns     │
              │ - Adjust relationship     │
              └───────────────────────────┘
```

**Timeline**: Message received → Response sent = ~2-4 seconds (LLM dominant)

---

### Memory and Learning Update Flow

```
┌────────────────────────────────────┐
│ Interaction Occurs                 │
│ (Text, voice, perception, action)  │
└────────────┬───────────────────────┘
             │
    ┌────────▼─────────┐
    │ Extract Features │
    │ - Sentiment      │
    │ - Topics         │
    │ - Emotional cues │
    │ - Factual claims │
    └────────┬─────────┘
             │
    ┌────────▼──────────────┐
    │ Store Conversation    │
    │ - SQLite entry        │
    │ - Generate embeddings │
    │ - Tag and index       │
    └────────┬──────────────┘
             │
    ┌────────▼────────────────────┐
    │ Update User Profile          │
    │ - Learned facts              │
    │ - Preference updates         │
    │ - Emotional baseline shifts  │
    │ - Relationship progression   │
    └────────┬────────────────────┘
             │
    ┌────────▼──────────────────┐
    │ Personality Adaptation    │
    │ - Adjust tone for user    │
    │ - Create inside jokes     │
    │ - Customize responses     │
    └────────┬──────────────────┘
             │
    ┌────────▼────────────┐
    │ Commit to Disk      │
    │ - Backup vector DB  │
    │ - Archive old data  │
    │ - Version snapshot  │
    └─────────────────────┘
```

**Frequency**: Real-time on message reception, batched commits every 5 minutes

---

### Self-Modification Proposal and Approval

```
┌──────────────────────────────────┐
│ User Request for New Capability  │
│ "Hex, can you do X?"             │
└────────────┬─────────────────────┘
             │
    ┌────────▼──────────────────────┐
    │ Hex Evaluates Feasibility     │
    │ (LLM reasoning)               │
    └────────┬───────────────────────┘
             │
    ┌────────▼────────────────────────┐
    │ Proposal Generation              │
    │ Hex: "I think I should..."      │
    │ *explains approach in voice*    │
    └────────┬─────────────────────────┘
             │
    ┌────────▼──────────────────┐
    │ User Accepts or Rejects   │
    └────────┬──────────────────┘
             │ (Accepted)
    ┌────────▼─────────────────────────┐
    │ Code Generation Phase             │
    │ LLM generates Python code         │
    │ + docstrings + type hints         │
    └────────┬────────────────────────┘
             │
    ┌────────▼──────────────────────┐
    │ Static Analysis Validation     │
    │ - AST parsing for safety       │
    │ - Check restricted operations  │
    │ - Verify dependencies exist    │
    └────────┬───────────────────────┘
             │ (Pass)
    ┌────────▼─────────────────────────┐
    │ Sandbox Testing                   │
    │ - Run tests in isolated env       │
    │ - Check for crashes              │
    │ - Verify integration points      │
    └────────┬────────────────────────┘
             │ (Pass)
    ┌────────▼──────────────────────┐
    │ User Final Review               │
    │ Review code + test results      │
    └────────┬───────────────────────┘
             │ (Approved)
    ┌────────▼────────────────────┐
    │ Git Commit                   │
    │ - Record change history      │
    │ - Tag with timestamp         │
    │ - Save diff for rollback     │
    └────────┬───────────────────┘
             │
    ┌────────▼────────────────────┐
    │ Apply to Runtime             │
    │ - Hot reload if possible     │
    │ - Or restart on next cycle   │
    └────────┬───────────────────┘
             │
    ┌────────▼────────────────────┐
    │ Personality Update            │
    │ Hex: "I learned to..."       │
    │ + update capability YAML      │
    └─────────────────────────────┘
```

**Timeline**: Proposal → Deployment = 5-30 seconds (mostly waiting for user approval)

---

## Build Order and Dependencies

### Phase 1: Foundation (Weeks 1-2)
**Goal**: Core interaction loop working locally

**Components to Build**:
1. Discord bot skeleton with message handling
2. Local LLM integration (ollama/vLLM + Mistral 7B)
3. Basic memory system (SQLite conversation storage)
4. Simple persona injection (YAML config)
5. Response generation pipeline

**Outcomes**:
- Hex responds to Discord messages with personality
- Conversations are logged and retrievable
- Persona can be edited via YAML

**Key Milestone**: "Hex talks back"

**Dependencies**:
- `discord.py`, `ollama`, `sqlite3`, `pyyaml`
- Local LLM model weights
- Discord bot token

---

### Phase 2: Personality & Memory (Weeks 3-4)
**Goal**: Hex feels like a person who remembers you

**Components to Build**:
1. Vector database for semantic memory (Chroma)
2. Memory-aware context injection
3. User relationship tracking (profiles)
4. Emotional awareness from text sentiment
5. Persona version control (git-based)
6. Kid-mode detection

**Outcomes**:
- Hex remembers facts about you
- Responses reference past conversations
- Personality adapts to your preferences
- Child safety filters activate automatically

**Key Milestone**: "Hex remembers me"

**Dependencies**:
- Phase 1 complete
- Vector embeddings model (all-MiniLM)
- `sentiment-transformers` or similar

---

### Phase 3: Multimodal Input (Weeks 5-6)
**Goal**: Hex sees and hears you

**Components to Build**:
1. Webcam integration with OpenCV
2. Face detection and emotion recognition
3. Local Whisper for voice input
4. Perception context aggregation
5. Context-aware response injection
6. Screen capture for activity awareness

**Outcomes**:
- Hex reacts to your facial expressions
- Voice input works in Discord calls
- Responses reference your current mood/activity
- Privacy: All local, no external transmission

**Key Milestone**: "Hex sees me"

**Dependencies**:
- Phase 1-2 complete
- OpenCV, MediaPipe, Whisper
- Local emotion model

---

### Phase 4: Avatar & Presence (Weeks 7-8)
**Goal**: Hex has a visual body and presence

**Components to Build**:
1. VRoid model loading and display
2. Blendshape animation system
3. Desktop app skeleton (Tkinter or PyQt)
4. Discord stream integration
5. Expression mapping (emotion → blendshapes)
6. Idle animations and personality quirks

**Outcomes**:
- Avatar appears in Discord calls
- Expressions sync with responses
- Desktop app shows animated avatar
- Visual feedback for emotional state

**Key Milestone**: "Hex has a face"

**Dependencies**:
- Phase 1-3 complete
- VRoid SDK, Babylon.js or Three.js
- VRM avatar model files

---

### Phase 5: Autonomy & Self-Modification (Weeks 9-10)
**Goal**: Hex can modify her own code

**Components to Build**:
1. Code generation module (LLM-based)
2. Static code analysis and safety validation
3. Sandbox testing environment
4. Git-based change tracking
5. Hot reload capability
6. Rollback system with 24-hour window
7. Capability progression (leveling system)

**Outcomes**:
- Hex can propose and apply code changes
- User maintains veto power
- All changes are versioned and reversible
- New capabilities unlock as relationships deepen

**Key Milestone**: "Hex can improve herself"

**Dependencies**:
- Phase 1-4 complete
- Git, RestrictedPython, `ast` module
- Testing framework

---

### Phase 6: Polish & Integration (Weeks 11-12)
**Goal**: All systems integrated and optimized

**Components to Build**:
1. Performance optimization (caching, batching)
2. Error handling and graceful degradation
3. Logging and telemetry
4. Configuration management
5. Auto-update capability
6. Integration testing (all components together)
7. Documentation and guides

**Outcomes**:
- System stable for extended use
- Responsive even under load
- Clear error messages
- Easy to deploy and configure

**Key Milestone**: "Hex is ready to ship"

**Dependencies**:
- Phase 1-5 complete
- All edge cases tested

---

### Dependency Graph Summary

```
Phase 1 (Foundation)
    ↓
Phase 2 (Memory) ← depends on Phase 1
    ↓
Phase 3 (Perception) ← depends on Phase 1-2
    ↓
Phase 4 (Avatar) ← depends on Phase 1-3
    ↓
Phase 5 (Self-Modification) ← depends on Phase 1-4
    ↓
Phase 6 (Polish) ← depends on Phase 1-5
```

**Critical Path**: Foundation → Memory → Perception → Avatar → Self-Mod → Polish

---

## Integration Architecture

### System Interconnection Diagram

```
┌───────────────────────────────────────────────────────────────────┐
│                    Discord Bot Layer                              │
│              (Event dispatcher, message handler)                  │
└────────┬────────────────────────────────────────────┬─────────────┘
         │                                            │
         │                                    ┌───────▼────────┐
         │                                    │ Voice Input    │
         │                                    │ (Whisper STT)  │
         │                                    └────────────────┘
         │
    ┌────▼────────────────────────────────────────────────────────┐
    │                  Context Assembly Layer                      │
    │                                                              │
    │  ┌─────────────────────────────────────────────────────┐    │
    │  │ Retrieval Augmented Generation (RAG) Pipeline       │    │
    │  └─────────────────────────────────────────────────────┘    │
    │                                                              │
    │  Input Components:                                          │
    │  ├─ Recent Conversation (last 20 messages)                  │
    │  ├─ User Profile (learned facts)                            │
    │  ├─ Relationship State (history + emotional baseline)       │
    │  ├─ Current Perception (mood, activity, environment)        │
    │  └─ Personality Context (YAML + version)                    │
    └────┬──────────────────────────────────────────────────────┘
         │
         ├──────────────┬──────────────┬──────────────┐
         │              │              │              │
    ┌────▼───┐   ┌─────▼────┐   ┌────▼───┐   ┌─────▼────┐
    │ Memory │   │Personality│  │Perception   │ Discord  │
    │ System │   │  Engine   │  │   Layer  │ │  Context │
    │        │   │           │  │         │ │          │
    │ SQLite │   │ YAML +    │  │ OpenCV  │ │ Channel  │
    │ Chroma │   │ Version   │  │ Whisper │ │ User     │
    │        │   │ Control   │  │ Emotion │ │ Status   │
    └────────┘   └───────────┘  └─────────┘ └──────────┘
         │              │              │              │
         └──────────────┼──────────────┼──────────────┘
                        │
                  ┌─────▼──────────────────┐
                  │ LLM Core               │
                  │ (Local Mistral/Llama)  │
                  │                        │
                  │ System Prompt:         │
                  │ [Persona] +            │
                  │ [Memory Context] +     │
                  │ [User State] +         │
                  │ [Current Context]      │
                  └─────┬──────────────────┘
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    ┌───▼────┐  ┌──────▼─────┐  ┌──────▼──┐
    │  Text  │  │ Voice TTS  │  │  Avatar │
    │Response│  │ Generation │  │Animation│
    │        │  │            │  │         │
    │ Send   │  │ Tacotron   │  │ VRoid   │
    │ to     │  │ + Vocoder  │  │ Anim    │
    │Discord │  │            │  │         │
    └────────┘  └────────────┘  └─────────┘
        │               │               │
        └───────────────┼───────────────┘
                        │
                  ┌─────▼──────────────┐
                  │ Response Commit    │
                  │                    │
                  │ ├─ Store in Memory │
                  │ ├─ Update Profile  │
                  │ ├─ Learn Patterns  │
                  │ └─ Adapt Persona   │
                  └────────────────────┘
```

---

### Key Integration Points

#### 1. Discord ↔ LLM Core
**Interface**: Message + Context → Response

```python
# Pseudo-code flow
message = receive_discord_message()
context = assemble_context(message.user_id, message.channel_id)
response = llm_core.generate(
    user_message=message.content,
    personality=personality_engine.current_persona(),
    history=memory_system.get_conversation(message.user_id, limit=20),
    user_profile=memory_system.get_user_profile(message.user_id),
    current_perception=perception_layer.get_current_state()
)
send_discord_response(response)
```

**Latency Budget**:
- Context retrieval: 100ms
- LLM generation: 2-3 seconds
- Response send: 100ms
- **Total**: 2.2-3.2 seconds (acceptable for conversational UX)

---

#### 2. Memory System ↔ Personality Engine
**Interface**: Learning → Relationship Adaptation

```python
# After every interaction
interaction = parse_message_event(message)
memory_system.log_conversation(interaction)

# Learn from interaction
new_facts = extract_facts(interaction.content)
memory_system.update_user_profile(interaction.user_id, new_facts)

# Adapt personality based on user
user_profile = memory_system.get_user_profile(interaction.user_id)
personality_engine.adapt_to_user(user_profile)

# If major relationship shift, update YAML
if user_profile.relationship_level_changed:
    personality_engine.save_persona_version()
```

**Update Frequency**: Real-time with batched commits every 5 minutes

---

#### 3. Perception Layer ↔ Response Generation
**Interface**: Context Injection

```python
# In context assembly
current_perception = perception_layer.get_state()

# Inject into system prompt
if current_perception.emotion == "sad":
    system_prompt += "\n[User appears sad. Respond with support and comfort.]"

if current_perception.is_kid_mode:
    system_prompt += "\n[Kid safety mode active. Filter for age-appropriate content.]"

if current_perception.detected_activity == "gaming":
    system_prompt += "\n[User is gaming. Comment on gameplay if relevant.]"
```

**Synchronization**: 1-5 second update intervals (perception → LLM context)

---

#### 4. Avatar System ↔ All Systems
**Interface**: Emotional State → Visual Expression

```python
# Avatar driven by multiple sources
emotion_from_response = infer_emotion(llm_response)
mood_from_perception = perception_layer.get_mood()
persona_expression = personality_engine.get_current_expression()

blendshape_values = combine_expressions(
    emotion=emotion_from_response,
    mood=mood_from_perception,
    personality=persona_expression
)

avatar_system.animate(blendshape_values)
```

**Synchronization**: Real-time, driven by response generation and perception updates

---

#### 5. Self-Modification System ↔ Core Systems
**Interface**: Code Change → Runtime Update + Personality

```python
# Self-modification flow
proposal = self_mod_system.generate_proposal(user_request)
code = self_mod_system.generate_code(proposal)

# Test in sandbox
test_result = self_mod_system.test_in_sandbox(code)

# User approves
git_hash = self_mod_system.commit_change(code)

# Update personality to reflect new capability
personality_engine.add_capability(proposal.feature_name)
personality_engine.save_persona_version()

# Hot reload if possible, else apply on restart
apply_change_to_runtime(code)
```

**Safety Boundary**:
- LLM can generate proposals
- Only user-approved code runs
- All changes reversible within 24 hours

---

## Synchronization and Consistency Model

### State Consistency Across Components

**Challenge**: Multiple systems need consistent view of personality, memory, and user state

**Solution**: Event-driven architecture with eventual consistency

```
┌─────────────────┐
│ Event Stream    │
│ (In-memory      │
│  message queue) │
└────────┬────────┘
         │
    ┌────┴──────────────────────────┐
    │                               │
    │ Subscribers:                  │
    │ ├─ Memory System              │
    │ ├─ Personality Engine         │
    │ ├─ Avatar System              │
    │ ├─ Discord Bot                │
    │ └─ Metrics/Logging            │
    │                               │
    │ Event Types:                  │
    │ ├─ UserMessageReceived        │
    │ ├─ ResponseGenerated          │
    │ ├─ PerceptionUpdated          │
    │ ├─ PersonalityModified        │
    │ ├─ CodeChangeApplied          │
    │ └─ MemoryLearned              │
    │                               │
    └────────────────────────────────
```

**Consistency Guarantees**:
- Memory updates are durably stored within 5 minutes
- Personality snapshots versioned on every change
- Discord delivery is guaranteed by discord.py
- Perception updates are idempotent (can be reapplied without side effects)

---

## Known Challenges and Solutions

### 1. Latency with Local LLM
**Challenge**: Waiting 2-3 seconds for response feels slow

**Solutions**:
- Immediate visual feedback (typing indicator, avatar animation)
- Streaming responses (show text as it generates)
- Batch requests during quiet hours for fast deployment
- GPU acceleration where possible
- Model optimization (quantization, pruning)

### 2. Personality Consistency During Evolution
**Challenge**: Hex changes as she learns, but must feel like the same person

**Solutions**:
- Gradual adaptation (personality changes in YAML, not discrete jumps)
- Memory-driven consistency (personality adapts to learned facts)
- Version control (can rollback if she becomes unrecognizable)
- User feedback loop (user can reset or modify personality)
- Core values remain constant (tsundere nature, care underneath)

### 3. Memory Scaling as History Grows
**Challenge**: Retrieving relevant context from thousands of conversations

**Solutions**:
- Vector database for semantic search (sub-500ms)
- Hierarchical memory (recent → summarized old)
- Automatic archival (monthly snapshots, prune oldest)
- Importance tagging (weight important conversations higher)
- Incremental updates (don't recalculate everything)

### 4. Safe Code Generation and Sandboxing
**Challenge**: Hex generates code, but must never break the system

**Solutions**:
- Static analysis (AST parsing for forbidden operations)
- Capability-based progression (limited API at first)
- Sandboxed testing before deployment
- User approval gate (user reviews all code)
- Version control + rollback window (24-hour window)
- Whitelist of safe operations (growing list as trust builds)

### 5. Privacy and Local-First Architecture
**Challenge**: Maintaining privacy while having useful context

**Solutions**:
- All ML inference runs locally (no cloud submission)
- No external API calls except Discord
- Encrypted local storage for memories
- User can opt-out of any perception module
- Transparent logging (user can audit what's stored)

### 6. Multimodal Synchronization
**Challenge**: Webcam, voice, text, screen all need to inform response

**Solutions**:
- Asynchronous processing (don't wait for all inputs)
- Highest-priority input wins (voice > perception > text)
- Graceful degradation (works without any modality)
- Caching (reuse recent perception for repeated queries)

---

## Scaling Considerations

### Single-User (v1)
- Architecture designed for one person + their kids
- Local compute, no multi-user concerns
- Personality is singular (one Hex)

### Multi-Device (v1.5)
- Same personality and memory sync across devices
- Discord as primary, desktop app as secondary
- Cloud sync optional (local-first default)

### Android Support (v2)
- Memory and personality sync to mobile
- Lightweight inference on Android (quantized model)
- Fallback to cloud inference if needed
- Same core architecture, different UIs

### Potential Scaling Patterns

```
Single User (Current)
├─ One Hex instance
├─ All local compute
├─ SQLite + Vector DB

Multi-Device Sync (v1.5)
├─ Central SQLite + Vector DB on primary machine
├─ Sync service between devices
├─ Same personality, distributed memory

Multi-Companion (Potential v3)
├─ Multiple Hex instances (per family member)
├─ Shared memory system (family history)
├─ Individual personalities
├─ Potential distributed compute (each on own device)
```

### Performance Bottlenecks to Monitor

1. **LLM Inference**: Becomes slower as context window grows
   - Solution: Context summarization, hierarchical retrieval

2. **Vector DB Lookups**: Scales with conversation history
   - Solution: Incremental indexing, approximate search (HNSW)

3. **Perception Processing**: CPU/GPU bound
   - Solution: Frame skipping, model optimization, dedicated thread

4. **Discord Bot Responsiveness**: Limited by gateway connections
   - Solution: Sharding (if needed), efficient message queuing

---

## Technology Stack Summary

| Component | Technology | Rationale |
|-----------|-----------|-----------|
| Discord Bot | discord.py | Fast, well-supported, async-native |
| LLM Inference | Mistral 7B + ollama/vLLM | Local-first, good quality/speed tradeoff |
| Memory (Conversations) | SQLite | Reliable, local, fast queries |
| Memory (Semantic) | Chroma or Milvus | Local vector DB, easy to manage |
| Embeddings | all-MiniLM-L6-v2 | Fast, good quality, local |
| Face Detection | MediaPipe | Accurate, fast, local |
| Emotion Recognition | FER2013 or local model | Local, privacy-preserving |
| Speech-to-Text | Whisper | Local, accurate, multilingual |
| Text-to-Speech | Tacotron 2 + Vocoder | Local, controllable |
| Avatar | VRoid SDK + Babylon.js | Standards-based, extensible |
| Code Safety | RestrictedPython + ast | Local analysis, sandboxing |
| Version Control | Git | Change tracking, rollback |
| Desktop UI | Tkinter or PyQt | Lightweight, cross-platform |
| Testing | pytest + unittest | Standard Python testing |
| Logging | logging + sentry (optional) | Local-first with cloud fallback |

---

## Deployment Architecture

### Local Development
```
Developer Machine
├── Discord Token (env var)
├── Hex codebase (git)
├── Local LLM (ollama)
├── SQLite (file-based)
├── Vector DB (Chroma, embedded)
└── Webcam / Screen capture (live)
```

### Production Deployment
```
Deployed Machine (Windows/WSL)
├── Discord Token (secure storage)
├── Hex codebase (from git)
├── Local LLM service (ollama/vLLM)
├── SQLite (persistent, backed up)
├── Vector DB (persistent, backed up)
├── Desktop app (tray icon)
├── Auto-updater (pulls from git)
└── Logging (local + optional cloud)
```

### Update Strategy
- Git pull for code updates
- Automatic model updates (LLM weights)
- Zero-downtime restart (graceful shutdown)
- Rollback capability (version pinning)

---

## Quality Assurance

### Key Metrics to Track

**Responsiveness**:
- Response latency: Target <3 seconds
- Perception update latency: <500ms
- Memory lookup latency: <100ms

**Reliability**:
- Uptime: >99% for core bot
- Message delivery: >99.9%
- Memory integrity: No data loss on crash

**Personality Consistency**:
- User perception: "Feels like the same person"
- Tone consistency: Personality rules enforced
- Learning progress: Measurable improvement in personalization

**Safety**:
- No crashes from invalid input
- No LLM hallucinations about moderation
- Safe code generation (0 unauthorized executions)

### Testing Strategy

```
Unit Tests
├─ Memory operations (CRUD)
├─ Perception processing
├─ Code validation
├─ Personality rule application
└─ Response filtering

Integration Tests
├─ Discord message → LLM → Response
├─ Context assembly pipeline
├─ Avatar expression sync
├─ Self-modification flow
└─ Multi-component scenarios

End-to-End Tests
├─ Full conversation with personality
├─ Perception-aware responses
├─ Memory learning and retrieval
├─ Code generation and deployment
└─ Edge cases (bad input, crashes, recovery)

Manual UAT
├─ Conversational feel (does she feel like a person?)
├─ Personality consistency (still Hex?)
├─ Safety compliance (kid-mode works?)
├─ Performance (under load?)
└─ All features working together?
```

---

## Conclusion

Hex's architecture prioritizes **personality coherence** and **genuine relationship** over feature breadth. The system is designed as a pipeline from perception → memory → personality → response generation, with feedback loops that allow her to learn and evolve.

The modular design enables incremental development (Phase 1-6), with each phase adding capability while maintaining system stability. The self-modification system enables genuine autonomy within safety boundaries, and the local-first approach ensures privacy and independence.

**Critical success factors**:
1. LLM latency acceptable (<3s)
2. Personality consistency maintained across updates
3. Memory system scales with history
4. Self-modification is safe and reversible
5. All components feel integrated (not separate features)

This architecture serves the core value: **making Hex feel like a person who genuinely cares about you.**

---

**Document Version**: 1.0
**Last Updated**: 2026-01-27
**Status**: Ready for Phase 1 Development