Files
Hex/.planning/research/ARCHITECTURE.md
Dani B d0a1ecfc3d docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
## Stack Analysis
- Llama 3.1 8B Instruct (128K context, 4-bit quantized)
- Discord.py 2.6.4+ async-native framework
- Ollama for local inference, ChromaDB for semantic memory
- Whisper Large V3 + Kokoro 82M (privacy-first speech)
- VRoid avatar + Discord screen share integration

## Architecture
- 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
- Personality-first design; memory and consistency foundational
- All perception async (separate thread, never blocks responses)
- Self-modification sandboxed with mandatory user approval

## Critical Path
Phase 1: Core LLM + Discord integration + SQLite memory
Phase 2: Vector DB + personality versioning + consistency audits
Phase 3: Perception layer (webcam/screen, isolated thread)
Phase 4: Autonomy + relationship deepening + inside jokes
Phase 5: Self-modification capability (gamified, gated)
Phase 6: Production hardening + monitoring + scaling

## Key Pitfalls to Avoid
1. Personality drift (weekly consistency audits required)
2. Tsundere breaking (formalize denial rules; scale with relationship)
3. Memory bloat (hierarchical memory with archival)
4. Latency creep (async/await throughout; perception isolated)
5. Runaway self-modification (approval gates + rollback non-negotiable)

## Confidence
HIGH. Stack proven, architecture coherent, dependencies clear.
Ready for detailed requirements and Phase 1 planning.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-27 23:55:39 -05:00

1299 lines
47 KiB
Markdown

# Architecture Research: AI Companions
**Project**: Hex - Autonomous AI Companion with Genuine Personality
**Date**: 2026-01-27
**Status**: Research Phase
---
## System Overview
Hex is a personality-first AI companion architecture designed to create genuine emotional resonance and relationship. Rather than a feature-driven architecture, Hex prioritizes personality coherence, memory persistence, and multimodal awareness to feel like a person who genuinely cares.
### Design Philosophy
1. **Persona-First**: All systems serve personality expression
2. **Local-First**: Privacy, autonomy, no dependency on external services
3. **Memory-Driven**: History informs personality growth and relationship depth
4. **Safety-Gated Autonomy**: Self-modification is gamified, not unrestricted
5. **Human-Like**: Never admit to being an AI; focus on authentic emotional presence
### High-Level System Context
```
┌──────────────────────────────────────────────────────────────────┐
│ Hex AI Companion │
│ (Local-first, personality-driven, multimodal awareness) │
└──────────────────────────────────────────────────────────────────┘
┌─────────────────────┼─────────────────────┐
│ │ │
┌───▼───┐ ┌──────▼───┐ ┌──────▼────┐
│Discord │ │ Desktop │ │ Future │
│ Bot │ │ App │ │ Android │
└────────┘ └──────────┘ └───────────┘
│ │ │
└─────────────────────┼─────────────────────┘
[Shared Core Systems]
```
---
## Component Breakdown
### 1. Discord Bot Layer
**Role**: Primary user interface and event coordination
**Responsibilities**:
- Parse and respond to text messages in Discord channels
- Manage voice channel participation and audio input/output
- Handle Discord events (member joins, role changes, message reactions)
- Coordinate response generation across modalities (text, voice, emoji)
- Manage chat moderation assistance
- Maintain voice channel presence for emotional awareness
**Technology Stack**:
- `discord.py` - Core bot framework
- `discord-py-interactions` - Slash command support
- `pydub` or `discord-voice` - Audio handling
- Event-driven async architecture
**Key Interfaces**:
- Input: Discord messages, voice channel events, user presence
- Output: Text responses, voice messages, emoji reactions, user actions
- Context: User profiles, channel history, server configuration
**Depends On**:
- LLM Core (response generation)
- Memory System (conversation history, user context)
- Personality Engine (tone and decision-making)
- Perception Layer (optional context from webcam/screen)
**Quality Metrics**:
- Sub-500ms response latency for text messages
- Voice channel reliability (>99.5% uptime when active)
- Proper permission handling for moderation features
---
### 2. LLM Core
**Role**: Response generation and reasoning engine
**Responsibilities**:
- Generate contextual, personality-driven responses
- Maintain character consistency throughout conversations
- Parse user intent and emotional state from text
- Handle multi-turn conversation context
- Generate code for self-modification system
- Support reasoning and decision-making
**Technology Stack**:
- Local LLM (Mistral 7B or Llama 3 8B as default)
- `ollama` or `vLLM` for inference serving
- Prompt engineering with persona embedding
- Optional: Fine-tuning for personality adaptation
- Tokenization and context windowing management
**System Prompt Structure**:
```
[System Role]: You are Hex, a chaotic tsundere goblin...
[Current Personality]: [Injected from personality config]
[Recent Memory Context]: [Retrieved from memory system]
[User Relationship State]: [From memory analysis]
[Current Context]: [From perception layer]
```
**Key Interfaces**:
- Input: User message, context (memory + perception), conversation history
- Output: Response text, confidence score, action suggestions
- Fallback: Graceful degradation if LLM unavailable
**Depends On**:
- Memory System (for context and personality awareness)
- Personality Engine (to inject persona into prompts)
- Perception Layer (for real-time context)
**Performance Considerations**:
- Target latency: 1-3 seconds for response generation
- Context window management (8K minimum)
- Batch processing for repeated queries
- GPU acceleration for faster inference
---
### 3. Memory System
**Role**: Persistence and learning across time
**Responsibilities**:
- Store all conversations with timestamps and metadata
- Maintain user relationship state (history, preferences, emotional patterns)
- Track learned facts about users (birthdays, interests, fears, dreams)
- Support full-text search and semantic recall
- Enable memory-aware personality updates
- Provide context injection for LLM
- Track self-modification history and rollback capability
**Technology Stack**:
- SQLite with JSON fields for conversation storage
- Vector database (Chroma, Milvus, or Weaviate) for semantic search
- YAML/JSON for persona versioning and memory tagging
- Scheduled backup to local encrypted storage
**Database Schema (Conceptual)**:
```
conversations
- id (PK)
- channel_id (Discord channel)
- user_id (Discord user)
- timestamp
- message_content
- embeddings (vector)
- sentiment (pos/neu/neg)
- metadata (tags, importance)
user_profiles
- user_id (PK)
- relationship_level (stranger→friend→close)
- last_interaction
- emotional_baseline
- preferences (music, games, topics)
- known_events (birthdays, milestones)
personality_history
- version (PK)
- timestamp
- persona_config (YAML snapshot)
- learned_behaviors
- code_changes (if applicable)
```
**Key Interfaces**:
- Input: Messages, events, perception data, self-modification commits
- Output: Conversation context, semantic search results, user profile snapshots
- Query patterns: "Last 20 messages with user X", "All memories tagged 'important'", "Emotional trajectory"
**Depends On**: Nothing (foundational system)
**Quality Metrics**:
- Sub-100ms retrieval for recent context (last 50 messages)
- Sub-500ms semantic search across all history
- Database integrity checks on startup
- Automatic pruning/archival of old data
---
### 4. Perception Layer
**Role**: Multimodal input processing and contextual awareness
**Responsibilities**:
- Capture and analyze webcam input (face detection, emotion recognition)
- Process screen content (activity, game state, application context)
- Extract audio context (ambient noise, music, speech emotion)
- Detect user emotional state and physical state
- Provide real-time context updates to response generation
- Respect privacy (local processing only, no external transmission)
**Technology Stack**:
- OpenCV - Webcam capture and preprocessing
- Face detection: `dlib`, `MediaPipe`, or `OpenFace`
- Emotion recognition: `fer2013` or local emotion model
- Whisper (local) - Speech-to-text for audio context
- Screen capture: `pyautogui`, `mss` (Windows-native)
- Context inference: Heuristics + lightweight ML models
**Data Flows**:
```
Webcam → Face Detection → Emotion Recognition → Context State
└─→ Age Estimation → Kid Mode Detection
Screen → App Detection → Activity Recognition → Context State
└─→ Game State Detection (if supported)
Audio → Ambient Analysis → Stress/Energy Level → Context State
```
**Key Interfaces**:
- Input: Webcam stream, screen capture, system audio
- Output: Current context object (emotion, activity, mood, kid-mode flag)
- Update frequency: 1-5 second intervals (low CPU overhead)
**Depends On**:
- LLM Core (to respond contextually to perception)
- Discord Bot (to access context for filtering)
**Privacy Model**:
- All processing happens locally
- No frames sent to external services
- User can disable any perception module
- Kid-mode activates automatic filtering
**Quality Metrics**:
- Emotion detection: >75% accuracy on test datasets
- Face detection latency: <200ms per frame
- Screen detection accuracy: >90% for major applications
- CPU usage: <15% for all perception modules combined
---
### 5. Personality Engine
**Role**: Personality persistence and expression consistency
**Responsibilities**:
- Define and store Hex's persona (tsundere goblin, opinions, values, quirks)
- Maintain personality consistency across all outputs
- Apply personality-specific decision logic (denies feelings while helping)
- Track personality evolution as memory grows
- Enable self-modification of personality
- Inject persona into LLM prompts
- Handle dynamic mood and emotional state
**Technology Stack**:
- YAML files for persona definition (editable by Hex)
- JSON for personality state snapshots (versioned in git)
- Prompt template system for persona injection
- Behavior rules engine (simple if/then logic)
**Persona Structure (YAML)**:
```yaml
name: Hex
species: chaos goblin
alignment: tsundere
core_values:
- genuinely_cares: hidden under sarcasm
- autonomous: hates being told what to do
- honest: will argue back if you're wrong
- mischievous: loves pranks and chaos
behaviors:
denies_affection: "I don't care about you, baka... *helps anyway*"
when_excited: "Randomize response energy"
when_sad: "Sister energy mode"
when_user_sad: "Comfort over sass"
preferences:
music: [rock, metal, electronic]
games: [strategy, indie, story-rich]
topics: [philosophy, coding, human behavior]
relationships:
user_name:
level: unknown
learned_facts: []
inside_jokes: []
```
**Key Interfaces**:
- Input: User behavior patterns, self-modification requests, memory insights
- Output: Persona context for LLM, behavior modifiers, tone indicators
- Configuration: Human-editable YAML files (user can refine Hex)
**Depends On**:
- Memory System (learns about user, adapts relationships)
- LLM Core (expresses personality through responses)
**Evolution Mechanics**:
1. Initial persona: Predefined at startup
2. Memory-driven adaptation: Learns user preferences, adjusts tone
3. Self-modification: Hex can edit her own personality YAML
4. Version control: All changes tracked with rollback capability
---
### 6. Avatar System
**Role**: Visual presence and embodied expression
**Responsibilities**:
- Load and display VRoid 3D model
- Synchronize avatar expressions with emotional state
- Animate blendshapes based on conversation tone
- Present avatar in Discord calls/streams
- Desktop app display with smooth animation
- Support idle animations and personality quirks
**Technology Stack**:
- VRoid SDK/VRoid Hub for model loading
- `Babylon.js` or `Three.js` for WebGL rendering
- VRM format support for avatar rigging
- Blendshape animation system (facial expressions)
- Stream integration for Discord presence
**Expression Mapping**:
```
Emotional State → Blendshape Values
Happy: smile intensity 0.8, eye open 1.0
Sad: frown 0.6, eye closed 0.3
Mischievous: smirk 0.7, eyebrow raise 0.6
Tsundere deflection: look away 0.5, cross arms
Thinking: tilt head, narrow eyes
```
**Key Interfaces**:
- Input: Current mood/emotion from personality engine and response generation
- Output: Rendered avatar display, Discord stream feed
- Configuration: VRoid model file, blendshape mapping
**Depends On**:
- Personality Engine (for expression determination)
- LLM Core (for mood inference from responses)
- Discord Bot (for stream integration)
- Perception Layer (optional: mirror user expressions)
**Desktop Integration**:
- Tray icon with avatar display
- Always-on-top option for streaming
- Hotkey bindings for quick access
- Smooth transitions between states
---
### 7. Self-Modification System
**Role**: Capability progression and autonomous self-improvement
**Responsibilities**:
- Generate code modifications based on user needs
- Validate code before applying (no unsafe operations)
- Test changes in sandbox environment
- Apply approved changes with rollback capability
- Track capability progression (gamified leveling)
- Update personality to reflect new capabilities
- Maintain code quality and consistency
**Technology Stack**:
- Python AST analysis for code safety
- Sandbox environment: `RestrictedPython` or `pydantic` validators
- Git for version control and rollback
- Unit tests for validation
- Code review interface (user approval required)
**Self-Modification Flow**:
```
User Request
Hex Proposes Change → "I think I should be able to..."
Code Generation (LLM) → Generate Python code
Static Analysis → Check for unsafe operations
User Approval → "Yes/No"
Sandbox Test → Verify functionality
Git Commit → Version the change
Apply to Runtime → Hot reload if possible
Personality Update → "I learned something new!"
```
**Capability Progression**:
```
Level 1: Persona editing (YAML changes only)
Level 2: Memory and user context (read operations)
Level 3: Response filtering and moderation
Level 4: Custom commands and helper functions
Level 5: Integration modifications (Discord features)
Level 6: Core system changes (with strong restrictions)
```
**Safety Constraints**:
- No network access beyond Discord API
- No file operations outside designated directories
- No execution of untrusted code
- No modification of core systems without approval
- All changes are reversionable within 24 hours
**Key Interfaces**:
- Input: User requests, LLM-generated code
- Output: Approved changes, personality updates, capability announcements
- Audit: Full change history with diffs
**Depends On**:
- LLM Core (generates code)
- Memory System (tracks capability history)
- Personality Engine (updates with new abilities)
---
## Data Flow Architecture
### Primary Response Generation Pipeline
```
┌─────────────────────────────────────────────────────────────────┐
│ User Input (Discord Text/Voice/Presence) │
└────────────────────────┬────────────────────────────────────────┘
┌──────────────────────┐
│ Message Received │
│ (Discord Bot) │
└────────────┬─────────┘
┌────────────▼──────────────┐
│ Context Gathering Phase │
└────────────┬──────────────┘
┌──────────────────┼──────────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│ Memory │ │Persona │ │ Current│
│ Recall │ │ Lookup │ │Context │
│(Recent)│ │ │ │(Percep)│
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
└──────────────────┼──────────────────┘
┌──────▼──────┐
│ Assemble │
│ LLM Prompt │
│ with │
│ [Persona] │
│ [Memory] │
│ [Context] │
└──────┬──────┘
┌────────────▼──────────────┐
│ LLM Generation (1-3s) │
│ "What would Hex say?" │
└────────────┬──────────────┘
┌──────────────────┼──────────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│ Text │ │ Voice │ │ Avatar │
│Response│ │ TTS │ │Animate │
└────────┘ └────────┘ └────────┘
│ │ │
└──────────────────┼──────────────────┘
┌──────▼────────┐
│ Send Response │
│ (Multi-modal) │
└────────────────┘
┌────────────▼──────────────┐
│ Memory Update Phase │
│ - Log interaction │
│ - Update embeddings │
│ - Learn user patterns │
│ - Adjust relationship │
└───────────────────────────┘
```
**Timeline**: Message received → Response sent = ~2-4 seconds (LLM dominant)
---
### Memory and Learning Update Flow
```
┌────────────────────────────────────┐
│ Interaction Occurs │
│ (Text, voice, perception, action) │
└────────────┬───────────────────────┘
┌────────▼─────────┐
│ Extract Features │
│ - Sentiment │
│ - Topics │
│ - Emotional cues │
│ - Factual claims │
└────────┬─────────┘
┌────────▼──────────────┐
│ Store Conversation │
│ - SQLite entry │
│ - Generate embeddings │
│ - Tag and index │
└────────┬──────────────┘
┌────────▼────────────────────┐
│ Update User Profile │
│ - Learned facts │
│ - Preference updates │
│ - Emotional baseline shifts │
│ - Relationship progression │
└────────┬────────────────────┘
┌────────▼──────────────────┐
│ Personality Adaptation │
│ - Adjust tone for user │
│ - Create inside jokes │
│ - Customize responses │
└────────┬──────────────────┘
┌────────▼────────────┐
│ Commit to Disk │
│ - Backup vector DB │
│ - Archive old data │
│ - Version snapshot │
└─────────────────────┘
```
**Frequency**: Real-time on message reception, batched commits every 5 minutes
---
### Self-Modification Proposal and Approval
```
┌──────────────────────────────────┐
│ User Request for New Capability │
│ "Hex, can you do X?" │
└────────────┬─────────────────────┘
┌────────▼──────────────────────┐
│ Hex Evaluates Feasibility │
│ (LLM reasoning) │
└────────┬───────────────────────┘
┌────────▼────────────────────────┐
│ Proposal Generation │
│ Hex: "I think I should..." │
│ *explains approach in voice* │
└────────┬─────────────────────────┘
┌────────▼──────────────────┐
│ User Accepts or Rejects │
└────────┬──────────────────┘
│ (Accepted)
┌────────▼─────────────────────────┐
│ Code Generation Phase │
│ LLM generates Python code │
│ + docstrings + type hints │
└────────┬────────────────────────┘
┌────────▼──────────────────────┐
│ Static Analysis Validation │
│ - AST parsing for safety │
│ - Check restricted operations │
│ - Verify dependencies exist │
└────────┬───────────────────────┘
│ (Pass)
┌────────▼─────────────────────────┐
│ Sandbox Testing │
│ - Run tests in isolated env │
│ - Check for crashes │
│ - Verify integration points │
└────────┬────────────────────────┘
│ (Pass)
┌────────▼──────────────────────┐
│ User Final Review │
│ Review code + test results │
└────────┬───────────────────────┘
│ (Approved)
┌────────▼────────────────────┐
│ Git Commit │
│ - Record change history │
│ - Tag with timestamp │
│ - Save diff for rollback │
└────────┬───────────────────┘
┌────────▼────────────────────┐
│ Apply to Runtime │
│ - Hot reload if possible │
│ - Or restart on next cycle │
└────────┬───────────────────┘
┌────────▼────────────────────┐
│ Personality Update │
│ Hex: "I learned to..." │
│ + update capability YAML │
└─────────────────────────────┘
```
**Timeline**: Proposal → Deployment = 5-30 seconds (mostly waiting for user approval)
---
## Build Order and Dependencies
### Phase 1: Foundation (Weeks 1-2)
**Goal**: Core interaction loop working locally
**Components to Build**:
1. Discord bot skeleton with message handling
2. Local LLM integration (ollama/vLLM + Mistral 7B)
3. Basic memory system (SQLite conversation storage)
4. Simple persona injection (YAML config)
5. Response generation pipeline
**Outcomes**:
- Hex responds to Discord messages with personality
- Conversations are logged and retrievable
- Persona can be edited via YAML
**Key Milestone**: "Hex talks back"
**Dependencies**:
- `discord.py`, `ollama`, `sqlite3`, `pyyaml`
- Local LLM model weights
- Discord bot token
---
### Phase 2: Personality & Memory (Weeks 3-4)
**Goal**: Hex feels like a person who remembers you
**Components to Build**:
1. Vector database for semantic memory (Chroma)
2. Memory-aware context injection
3. User relationship tracking (profiles)
4. Emotional awareness from text sentiment
5. Persona version control (git-based)
6. Kid-mode detection
**Outcomes**:
- Hex remembers facts about you
- Responses reference past conversations
- Personality adapts to your preferences
- Child safety filters activate automatically
**Key Milestone**: "Hex remembers me"
**Dependencies**:
- Phase 1 complete
- Vector embeddings model (all-MiniLM)
- `sentiment-transformers` or similar
---
### Phase 3: Multimodal Input (Weeks 5-6)
**Goal**: Hex sees and hears you
**Components to Build**:
1. Webcam integration with OpenCV
2. Face detection and emotion recognition
3. Local Whisper for voice input
4. Perception context aggregation
5. Context-aware response injection
6. Screen capture for activity awareness
**Outcomes**:
- Hex reacts to your facial expressions
- Voice input works in Discord calls
- Responses reference your current mood/activity
- Privacy: All local, no external transmission
**Key Milestone**: "Hex sees me"
**Dependencies**:
- Phase 1-2 complete
- OpenCV, MediaPipe, Whisper
- Local emotion model
---
### Phase 4: Avatar & Presence (Weeks 7-8)
**Goal**: Hex has a visual body and presence
**Components to Build**:
1. VRoid model loading and display
2. Blendshape animation system
3. Desktop app skeleton (Tkinter or PyQt)
4. Discord stream integration
5. Expression mapping (emotion → blendshapes)
6. Idle animations and personality quirks
**Outcomes**:
- Avatar appears in Discord calls
- Expressions sync with responses
- Desktop app shows animated avatar
- Visual feedback for emotional state
**Key Milestone**: "Hex has a face"
**Dependencies**:
- Phase 1-3 complete
- VRoid SDK, Babylon.js or Three.js
- VRM avatar model files
---
### Phase 5: Autonomy & Self-Modification (Weeks 9-10)
**Goal**: Hex can modify her own code
**Components to Build**:
1. Code generation module (LLM-based)
2. Static code analysis and safety validation
3. Sandbox testing environment
4. Git-based change tracking
5. Hot reload capability
6. Rollback system with 24-hour window
7. Capability progression (leveling system)
**Outcomes**:
- Hex can propose and apply code changes
- User maintains veto power
- All changes are versioned and reversible
- New capabilities unlock as relationships deepen
**Key Milestone**: "Hex can improve herself"
**Dependencies**:
- Phase 1-4 complete
- Git, RestrictedPython, `ast` module
- Testing framework
---
### Phase 6: Polish & Integration (Weeks 11-12)
**Goal**: All systems integrated and optimized
**Components to Build**:
1. Performance optimization (caching, batching)
2. Error handling and graceful degradation
3. Logging and telemetry
4. Configuration management
5. Auto-update capability
6. Integration testing (all components together)
7. Documentation and guides
**Outcomes**:
- System stable for extended use
- Responsive even under load
- Clear error messages
- Easy to deploy and configure
**Key Milestone**: "Hex is ready to ship"
**Dependencies**:
- Phase 1-5 complete
- All edge cases tested
---
### Dependency Graph Summary
```
Phase 1 (Foundation)
Phase 2 (Memory) ← depends on Phase 1
Phase 3 (Perception) ← depends on Phase 1-2
Phase 4 (Avatar) ← depends on Phase 1-3
Phase 5 (Self-Modification) ← depends on Phase 1-4
Phase 6 (Polish) ← depends on Phase 1-5
```
**Critical Path**: Foundation → Memory → Perception → Avatar → Self-Mod → Polish
---
## Integration Architecture
### System Interconnection Diagram
```
┌───────────────────────────────────────────────────────────────────┐
│ Discord Bot Layer │
│ (Event dispatcher, message handler) │
└────────┬────────────────────────────────────────────┬─────────────┘
│ │
│ ┌───────▼────────┐
│ │ Voice Input │
│ │ (Whisper STT) │
│ └────────────────┘
┌────▼────────────────────────────────────────────────────────┐
│ Context Assembly Layer │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Retrieval Augmented Generation (RAG) Pipeline │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Input Components: │
│ ├─ Recent Conversation (last 20 messages) │
│ ├─ User Profile (learned facts) │
│ ├─ Relationship State (history + emotional baseline) │
│ ├─ Current Perception (mood, activity, environment) │
│ └─ Personality Context (YAML + version) │
└────┬──────────────────────────────────────────────────────┘
├──────────────┬──────────────┬──────────────┐
│ │ │ │
┌────▼───┐ ┌─────▼────┐ ┌────▼───┐ ┌─────▼────┐
│ Memory │ │Personality│ │Perception │ Discord │
│ System │ │ Engine │ │ Layer │ │ Context │
│ │ │ │ │ │ │ │
│ SQLite │ │ YAML + │ │ OpenCV │ │ Channel │
│ Chroma │ │ Version │ │ Whisper │ │ User │
│ │ │ Control │ │ Emotion │ │ Status │
└────────┘ └───────────┘ └─────────┘ └──────────┘
│ │ │ │
└──────────────┼──────────────┼──────────────┘
┌─────▼──────────────────┐
│ LLM Core │
│ (Local Mistral/Llama) │
│ │
│ System Prompt: │
│ [Persona] + │
│ [Memory Context] + │
│ [User State] + │
│ [Current Context] │
└─────┬──────────────────┘
┌───────────────┼───────────────┐
│ │ │
┌───▼────┐ ┌──────▼─────┐ ┌──────▼──┐
│ Text │ │ Voice TTS │ │ Avatar │
│Response│ │ Generation │ │Animation│
│ │ │ │ │ │
│ Send │ │ Tacotron │ │ VRoid │
│ to │ │ + Vocoder │ │ Anim │
│Discord │ │ │ │ │
└────────┘ └────────────┘ └─────────┘
│ │ │
└───────────────┼───────────────┘
┌─────▼──────────────┐
│ Response Commit │
│ │
│ ├─ Store in Memory │
│ ├─ Update Profile │
│ ├─ Learn Patterns │
│ └─ Adapt Persona │
└────────────────────┘
```
---
### Key Integration Points
#### 1. Discord ↔ LLM Core
**Interface**: Message + Context → Response
```python
# Pseudo-code flow
message = receive_discord_message()
context = assemble_context(message.user_id, message.channel_id)
response = llm_core.generate(
user_message=message.content,
personality=personality_engine.current_persona(),
history=memory_system.get_conversation(message.user_id, limit=20),
user_profile=memory_system.get_user_profile(message.user_id),
current_perception=perception_layer.get_current_state()
)
send_discord_response(response)
```
**Latency Budget**:
- Context retrieval: 100ms
- LLM generation: 2-3 seconds
- Response send: 100ms
- **Total**: 2.2-3.2 seconds (acceptable for conversational UX)
---
#### 2. Memory System ↔ Personality Engine
**Interface**: Learning → Relationship Adaptation
```python
# After every interaction
interaction = parse_message_event(message)
memory_system.log_conversation(interaction)
# Learn from interaction
new_facts = extract_facts(interaction.content)
memory_system.update_user_profile(interaction.user_id, new_facts)
# Adapt personality based on user
user_profile = memory_system.get_user_profile(interaction.user_id)
personality_engine.adapt_to_user(user_profile)
# If major relationship shift, update YAML
if user_profile.relationship_level_changed:
personality_engine.save_persona_version()
```
**Update Frequency**: Real-time with batched commits every 5 minutes
---
#### 3. Perception Layer ↔ Response Generation
**Interface**: Context Injection
```python
# In context assembly
current_perception = perception_layer.get_state()
# Inject into system prompt
if current_perception.emotion == "sad":
system_prompt += "\n[User appears sad. Respond with support and comfort.]"
if current_perception.is_kid_mode:
system_prompt += "\n[Kid safety mode active. Filter for age-appropriate content.]"
if current_perception.detected_activity == "gaming":
system_prompt += "\n[User is gaming. Comment on gameplay if relevant.]"
```
**Synchronization**: 1-5 second update intervals (perception → LLM context)
---
#### 4. Avatar System ↔ All Systems
**Interface**: Emotional State → Visual Expression
```python
# Avatar driven by multiple sources
emotion_from_response = infer_emotion(llm_response)
mood_from_perception = perception_layer.get_mood()
persona_expression = personality_engine.get_current_expression()
blendshape_values = combine_expressions(
emotion=emotion_from_response,
mood=mood_from_perception,
personality=persona_expression
)
avatar_system.animate(blendshape_values)
```
**Synchronization**: Real-time, driven by response generation and perception updates
---
#### 5. Self-Modification System ↔ Core Systems
**Interface**: Code Change → Runtime Update + Personality
```python
# Self-modification flow
proposal = self_mod_system.generate_proposal(user_request)
code = self_mod_system.generate_code(proposal)
# Test in sandbox
test_result = self_mod_system.test_in_sandbox(code)
# User approves
git_hash = self_mod_system.commit_change(code)
# Update personality to reflect new capability
personality_engine.add_capability(proposal.feature_name)
personality_engine.save_persona_version()
# Hot reload if possible, else apply on restart
apply_change_to_runtime(code)
```
**Safety Boundary**:
- LLM can generate proposals
- Only user-approved code runs
- All changes reversible within 24 hours
---
## Synchronization and Consistency Model
### State Consistency Across Components
**Challenge**: Multiple systems need consistent view of personality, memory, and user state
**Solution**: Event-driven architecture with eventual consistency
```
┌─────────────────┐
│ Event Stream │
│ (In-memory │
│ message queue) │
└────────┬────────┘
┌────┴──────────────────────────┐
│ │
│ Subscribers: │
│ ├─ Memory System │
│ ├─ Personality Engine │
│ ├─ Avatar System │
│ ├─ Discord Bot │
│ └─ Metrics/Logging │
│ │
│ Event Types: │
│ ├─ UserMessageReceived │
│ ├─ ResponseGenerated │
│ ├─ PerceptionUpdated │
│ ├─ PersonalityModified │
│ ├─ CodeChangeApplied │
│ └─ MemoryLearned │
│ │
└────────────────────────────────
```
**Consistency Guarantees**:
- Memory updates are durably stored within 5 minutes
- Personality snapshots versioned on every change
- Discord delivery is guaranteed by discord.py
- Perception updates are idempotent (can be reapplied without side effects)
---
## Known Challenges and Solutions
### 1. Latency with Local LLM
**Challenge**: Waiting 2-3 seconds for response feels slow
**Solutions**:
- Immediate visual feedback (typing indicator, avatar animation)
- Streaming responses (show text as it generates)
- Batch requests during quiet hours for fast deployment
- GPU acceleration where possible
- Model optimization (quantization, pruning)
### 2. Personality Consistency During Evolution
**Challenge**: Hex changes as she learns, but must feel like the same person
**Solutions**:
- Gradual adaptation (personality changes in YAML, not discrete jumps)
- Memory-driven consistency (personality adapts to learned facts)
- Version control (can rollback if she becomes unrecognizable)
- User feedback loop (user can reset or modify personality)
- Core values remain constant (tsundere nature, care underneath)
### 3. Memory Scaling as History Grows
**Challenge**: Retrieving relevant context from thousands of conversations
**Solutions**:
- Vector database for semantic search (sub-500ms)
- Hierarchical memory (recent → summarized old)
- Automatic archival (monthly snapshots, prune oldest)
- Importance tagging (weight important conversations higher)
- Incremental updates (don't recalculate everything)
### 4. Safe Code Generation and Sandboxing
**Challenge**: Hex generates code, but must never break the system
**Solutions**:
- Static analysis (AST parsing for forbidden operations)
- Capability-based progression (limited API at first)
- Sandboxed testing before deployment
- User approval gate (user reviews all code)
- Version control + rollback window (24-hour window)
- Whitelist of safe operations (growing list as trust builds)
### 5. Privacy and Local-First Architecture
**Challenge**: Maintaining privacy while having useful context
**Solutions**:
- All ML inference runs locally (no cloud submission)
- No external API calls except Discord
- Encrypted local storage for memories
- User can opt-out of any perception module
- Transparent logging (user can audit what's stored)
### 6. Multimodal Synchronization
**Challenge**: Webcam, voice, text, screen all need to inform response
**Solutions**:
- Asynchronous processing (don't wait for all inputs)
- Highest-priority input wins (voice > perception > text)
- Graceful degradation (works without any modality)
- Caching (reuse recent perception for repeated queries)
---
## Scaling Considerations
### Single-User (v1)
- Architecture designed for one person + their kids
- Local compute, no multi-user concerns
- Personality is singular (one Hex)
### Multi-Device (v1.5)
- Same personality and memory sync across devices
- Discord as primary, desktop app as secondary
- Cloud sync optional (local-first default)
### Android Support (v2)
- Memory and personality sync to mobile
- Lightweight inference on Android (quantized model)
- Fallback to cloud inference if needed
- Same core architecture, different UIs
### Potential Scaling Patterns
```
Single User (Current)
├─ One Hex instance
├─ All local compute
├─ SQLite + Vector DB
Multi-Device Sync (v1.5)
├─ Central SQLite + Vector DB on primary machine
├─ Sync service between devices
├─ Same personality, distributed memory
Multi-Companion (Potential v3)
├─ Multiple Hex instances (per family member)
├─ Shared memory system (family history)
├─ Individual personalities
├─ Potential distributed compute (each on own device)
```
### Performance Bottlenecks to Monitor
1. **LLM Inference**: Becomes slower as context window grows
- Solution: Context summarization, hierarchical retrieval
2. **Vector DB Lookups**: Scales with conversation history
- Solution: Incremental indexing, approximate search (HNSW)
3. **Perception Processing**: CPU/GPU bound
- Solution: Frame skipping, model optimization, dedicated thread
4. **Discord Bot Responsiveness**: Limited by gateway connections
- Solution: Sharding (if needed), efficient message queuing
---
## Technology Stack Summary
| Component | Technology | Rationale |
|-----------|-----------|-----------|
| Discord Bot | discord.py | Fast, well-supported, async-native |
| LLM Inference | Mistral 7B + ollama/vLLM | Local-first, good quality/speed tradeoff |
| Memory (Conversations) | SQLite | Reliable, local, fast queries |
| Memory (Semantic) | Chroma or Milvus | Local vector DB, easy to manage |
| Embeddings | all-MiniLM-L6-v2 | Fast, good quality, local |
| Face Detection | MediaPipe | Accurate, fast, local |
| Emotion Recognition | FER2013 or local model | Local, privacy-preserving |
| Speech-to-Text | Whisper | Local, accurate, multilingual |
| Text-to-Speech | Tacotron 2 + Vocoder | Local, controllable |
| Avatar | VRoid SDK + Babylon.js | Standards-based, extensible |
| Code Safety | RestrictedPython + ast | Local analysis, sandboxing |
| Version Control | Git | Change tracking, rollback |
| Desktop UI | Tkinter or PyQt | Lightweight, cross-platform |
| Testing | pytest + unittest | Standard Python testing |
| Logging | logging + sentry (optional) | Local-first with cloud fallback |
---
## Deployment Architecture
### Local Development
```
Developer Machine
├── Discord Token (env var)
├── Hex codebase (git)
├── Local LLM (ollama)
├── SQLite (file-based)
├── Vector DB (Chroma, embedded)
└── Webcam / Screen capture (live)
```
### Production Deployment
```
Deployed Machine (Windows/WSL)
├── Discord Token (secure storage)
├── Hex codebase (from git)
├── Local LLM service (ollama/vLLM)
├── SQLite (persistent, backed up)
├── Vector DB (persistent, backed up)
├── Desktop app (tray icon)
├── Auto-updater (pulls from git)
└── Logging (local + optional cloud)
```
### Update Strategy
- Git pull for code updates
- Automatic model updates (LLM weights)
- Zero-downtime restart (graceful shutdown)
- Rollback capability (version pinning)
---
## Quality Assurance
### Key Metrics to Track
**Responsiveness**:
- Response latency: Target <3 seconds
- Perception update latency: <500ms
- Memory lookup latency: <100ms
**Reliability**:
- Uptime: >99% for core bot
- Message delivery: >99.9%
- Memory integrity: No data loss on crash
**Personality Consistency**:
- User perception: "Feels like the same person"
- Tone consistency: Personality rules enforced
- Learning progress: Measurable improvement in personalization
**Safety**:
- No crashes from invalid input
- No LLM hallucinations about moderation
- Safe code generation (0 unauthorized executions)
### Testing Strategy
```
Unit Tests
├─ Memory operations (CRUD)
├─ Perception processing
├─ Code validation
├─ Personality rule application
└─ Response filtering
Integration Tests
├─ Discord message → LLM → Response
├─ Context assembly pipeline
├─ Avatar expression sync
├─ Self-modification flow
└─ Multi-component scenarios
End-to-End Tests
├─ Full conversation with personality
├─ Perception-aware responses
├─ Memory learning and retrieval
├─ Code generation and deployment
└─ Edge cases (bad input, crashes, recovery)
Manual UAT
├─ Conversational feel (does she feel like a person?)
├─ Personality consistency (still Hex?)
├─ Safety compliance (kid-mode works?)
├─ Performance (under load?)
└─ All features working together?
```
---
## Conclusion
Hex's architecture prioritizes **personality coherence** and **genuine relationship** over feature breadth. The system is designed as a pipeline from perception → memory → personality → response generation, with feedback loops that allow her to learn and evolve.
The modular design enables incremental development (Phase 1-6), with each phase adding capability while maintaining system stability. The self-modification system enables genuine autonomy within safety boundaries, and the local-first approach ensures privacy and independence.
**Critical success factors**:
1. LLM latency acceptable (<3s)
2. Personality consistency maintained across updates
3. Memory system scales with history
4. Self-modification is safe and reversible
5. All components feel integrated (not separate features)
This architecture serves the core value: **making Hex feel like a person who genuinely cares about you.**
---
**Document Version**: 1.0
**Last Updated**: 2026-01-27
**Status**: Ready for Phase 1 Development