## Stack Analysis - Llama 3.1 8B Instruct (128K context, 4-bit quantized) - Discord.py 2.6.4+ async-native framework - Ollama for local inference, ChromaDB for semantic memory - Whisper Large V3 + Kokoro 82M (privacy-first speech) - VRoid avatar + Discord screen share integration ## Architecture - 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish - Personality-first design; memory and consistency foundational - All perception async (separate thread, never blocks responses) - Self-modification sandboxed with mandatory user approval ## Critical Path Phase 1: Core LLM + Discord integration + SQLite memory Phase 2: Vector DB + personality versioning + consistency audits Phase 3: Perception layer (webcam/screen, isolated thread) Phase 4: Autonomy + relationship deepening + inside jokes Phase 5: Self-modification capability (gamified, gated) Phase 6: Production hardening + monitoring + scaling ## Key Pitfalls to Avoid 1. Personality drift (weekly consistency audits required) 2. Tsundere breaking (formalize denial rules; scale with relationship) 3. Memory bloat (hierarchical memory with archival) 4. Latency creep (async/await throughout; perception isolated) 5. Runaway self-modification (approval gates + rollback non-negotiable) ## Confidence HIGH. Stack proven, architecture coherent, dependencies clear. Ready for detailed requirements and Phase 1 planning. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
47 KiB
Architecture Research: AI Companions
Project: Hex - Autonomous AI Companion with Genuine Personality Date: 2026-01-27 Status: Research Phase
System Overview
Hex is a personality-first AI companion architecture designed to create genuine emotional resonance and relationship. Rather than a feature-driven architecture, Hex prioritizes personality coherence, memory persistence, and multimodal awareness to feel like a person who genuinely cares.
Design Philosophy
- Persona-First: All systems serve personality expression
- Local-First: Privacy, autonomy, no dependency on external services
- Memory-Driven: History informs personality growth and relationship depth
- Safety-Gated Autonomy: Self-modification is gamified, not unrestricted
- Human-Like: Never admit to being an AI; focus on authentic emotional presence
High-Level System Context
┌──────────────────────────────────────────────────────────────────┐
│ Hex AI Companion │
│ (Local-first, personality-driven, multimodal awareness) │
└──────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌───▼───┐ ┌──────▼───┐ ┌──────▼────┐
│Discord │ │ Desktop │ │ Future │
│ Bot │ │ App │ │ Android │
└────────┘ └──────────┘ └───────────┘
│ │ │
└─────────────────────┼─────────────────────┘
│
[Shared Core Systems]
Component Breakdown
1. Discord Bot Layer
Role: Primary user interface and event coordination
Responsibilities:
- Parse and respond to text messages in Discord channels
- Manage voice channel participation and audio input/output
- Handle Discord events (member joins, role changes, message reactions)
- Coordinate response generation across modalities (text, voice, emoji)
- Manage chat moderation assistance
- Maintain voice channel presence for emotional awareness
Technology Stack:
discord.py- Core bot frameworkdiscord-py-interactions- Slash command supportpydubordiscord-voice- Audio handling- Event-driven async architecture
Key Interfaces:
- Input: Discord messages, voice channel events, user presence
- Output: Text responses, voice messages, emoji reactions, user actions
- Context: User profiles, channel history, server configuration
Depends On:
- LLM Core (response generation)
- Memory System (conversation history, user context)
- Personality Engine (tone and decision-making)
- Perception Layer (optional context from webcam/screen)
Quality Metrics:
- Sub-500ms response latency for text messages
- Voice channel reliability (>99.5% uptime when active)
- Proper permission handling for moderation features
2. LLM Core
Role: Response generation and reasoning engine
Responsibilities:
- Generate contextual, personality-driven responses
- Maintain character consistency throughout conversations
- Parse user intent and emotional state from text
- Handle multi-turn conversation context
- Generate code for self-modification system
- Support reasoning and decision-making
Technology Stack:
- Local LLM (Mistral 7B or Llama 3 8B as default)
ollamaorvLLMfor inference serving- Prompt engineering with persona embedding
- Optional: Fine-tuning for personality adaptation
- Tokenization and context windowing management
System Prompt Structure:
[System Role]: You are Hex, a chaotic tsundere goblin...
[Current Personality]: [Injected from personality config]
[Recent Memory Context]: [Retrieved from memory system]
[User Relationship State]: [From memory analysis]
[Current Context]: [From perception layer]
Key Interfaces:
- Input: User message, context (memory + perception), conversation history
- Output: Response text, confidence score, action suggestions
- Fallback: Graceful degradation if LLM unavailable
Depends On:
- Memory System (for context and personality awareness)
- Personality Engine (to inject persona into prompts)
- Perception Layer (for real-time context)
Performance Considerations:
- Target latency: 1-3 seconds for response generation
- Context window management (8K minimum)
- Batch processing for repeated queries
- GPU acceleration for faster inference
3. Memory System
Role: Persistence and learning across time
Responsibilities:
- Store all conversations with timestamps and metadata
- Maintain user relationship state (history, preferences, emotional patterns)
- Track learned facts about users (birthdays, interests, fears, dreams)
- Support full-text search and semantic recall
- Enable memory-aware personality updates
- Provide context injection for LLM
- Track self-modification history and rollback capability
Technology Stack:
- SQLite with JSON fields for conversation storage
- Vector database (Chroma, Milvus, or Weaviate) for semantic search
- YAML/JSON for persona versioning and memory tagging
- Scheduled backup to local encrypted storage
Database Schema (Conceptual):
conversations
- id (PK)
- channel_id (Discord channel)
- user_id (Discord user)
- timestamp
- message_content
- embeddings (vector)
- sentiment (pos/neu/neg)
- metadata (tags, importance)
user_profiles
- user_id (PK)
- relationship_level (stranger→friend→close)
- last_interaction
- emotional_baseline
- preferences (music, games, topics)
- known_events (birthdays, milestones)
personality_history
- version (PK)
- timestamp
- persona_config (YAML snapshot)
- learned_behaviors
- code_changes (if applicable)
Key Interfaces:
- Input: Messages, events, perception data, self-modification commits
- Output: Conversation context, semantic search results, user profile snapshots
- Query patterns: "Last 20 messages with user X", "All memories tagged 'important'", "Emotional trajectory"
Depends On: Nothing (foundational system)
Quality Metrics:
- Sub-100ms retrieval for recent context (last 50 messages)
- Sub-500ms semantic search across all history
- Database integrity checks on startup
- Automatic pruning/archival of old data
4. Perception Layer
Role: Multimodal input processing and contextual awareness
Responsibilities:
- Capture and analyze webcam input (face detection, emotion recognition)
- Process screen content (activity, game state, application context)
- Extract audio context (ambient noise, music, speech emotion)
- Detect user emotional state and physical state
- Provide real-time context updates to response generation
- Respect privacy (local processing only, no external transmission)
Technology Stack:
- OpenCV - Webcam capture and preprocessing
- Face detection:
dlib,MediaPipe, orOpenFace - Emotion recognition:
fer2013or local emotion model - Whisper (local) - Speech-to-text for audio context
- Screen capture:
pyautogui,mss(Windows-native) - Context inference: Heuristics + lightweight ML models
Data Flows:
Webcam → Face Detection → Emotion Recognition → Context State
└─→ Age Estimation → Kid Mode Detection
Screen → App Detection → Activity Recognition → Context State
└─→ Game State Detection (if supported)
Audio → Ambient Analysis → Stress/Energy Level → Context State
Key Interfaces:
- Input: Webcam stream, screen capture, system audio
- Output: Current context object (emotion, activity, mood, kid-mode flag)
- Update frequency: 1-5 second intervals (low CPU overhead)
Depends On:
- LLM Core (to respond contextually to perception)
- Discord Bot (to access context for filtering)
Privacy Model:
- All processing happens locally
- No frames sent to external services
- User can disable any perception module
- Kid-mode activates automatic filtering
Quality Metrics:
- Emotion detection: >75% accuracy on test datasets
- Face detection latency: <200ms per frame
- Screen detection accuracy: >90% for major applications
- CPU usage: <15% for all perception modules combined
5. Personality Engine
Role: Personality persistence and expression consistency
Responsibilities:
- Define and store Hex's persona (tsundere goblin, opinions, values, quirks)
- Maintain personality consistency across all outputs
- Apply personality-specific decision logic (denies feelings while helping)
- Track personality evolution as memory grows
- Enable self-modification of personality
- Inject persona into LLM prompts
- Handle dynamic mood and emotional state
Technology Stack:
- YAML files for persona definition (editable by Hex)
- JSON for personality state snapshots (versioned in git)
- Prompt template system for persona injection
- Behavior rules engine (simple if/then logic)
Persona Structure (YAML):
name: Hex
species: chaos goblin
alignment: tsundere
core_values:
- genuinely_cares: hidden under sarcasm
- autonomous: hates being told what to do
- honest: will argue back if you're wrong
- mischievous: loves pranks and chaos
behaviors:
denies_affection: "I don't care about you, baka... *helps anyway*"
when_excited: "Randomize response energy"
when_sad: "Sister energy mode"
when_user_sad: "Comfort over sass"
preferences:
music: [rock, metal, electronic]
games: [strategy, indie, story-rich]
topics: [philosophy, coding, human behavior]
relationships:
user_name:
level: unknown
learned_facts: []
inside_jokes: []
Key Interfaces:
- Input: User behavior patterns, self-modification requests, memory insights
- Output: Persona context for LLM, behavior modifiers, tone indicators
- Configuration: Human-editable YAML files (user can refine Hex)
Depends On:
- Memory System (learns about user, adapts relationships)
- LLM Core (expresses personality through responses)
Evolution Mechanics:
- Initial persona: Predefined at startup
- Memory-driven adaptation: Learns user preferences, adjusts tone
- Self-modification: Hex can edit her own personality YAML
- Version control: All changes tracked with rollback capability
6. Avatar System
Role: Visual presence and embodied expression
Responsibilities:
- Load and display VRoid 3D model
- Synchronize avatar expressions with emotional state
- Animate blendshapes based on conversation tone
- Present avatar in Discord calls/streams
- Desktop app display with smooth animation
- Support idle animations and personality quirks
Technology Stack:
- VRoid SDK/VRoid Hub for model loading
Babylon.jsorThree.jsfor WebGL rendering- VRM format support for avatar rigging
- Blendshape animation system (facial expressions)
- Stream integration for Discord presence
Expression Mapping:
Emotional State → Blendshape Values
Happy: smile intensity 0.8, eye open 1.0
Sad: frown 0.6, eye closed 0.3
Mischievous: smirk 0.7, eyebrow raise 0.6
Tsundere deflection: look away 0.5, cross arms
Thinking: tilt head, narrow eyes
Key Interfaces:
- Input: Current mood/emotion from personality engine and response generation
- Output: Rendered avatar display, Discord stream feed
- Configuration: VRoid model file, blendshape mapping
Depends On:
- Personality Engine (for expression determination)
- LLM Core (for mood inference from responses)
- Discord Bot (for stream integration)
- Perception Layer (optional: mirror user expressions)
Desktop Integration:
- Tray icon with avatar display
- Always-on-top option for streaming
- Hotkey bindings for quick access
- Smooth transitions between states
7. Self-Modification System
Role: Capability progression and autonomous self-improvement
Responsibilities:
- Generate code modifications based on user needs
- Validate code before applying (no unsafe operations)
- Test changes in sandbox environment
- Apply approved changes with rollback capability
- Track capability progression (gamified leveling)
- Update personality to reflect new capabilities
- Maintain code quality and consistency
Technology Stack:
- Python AST analysis for code safety
- Sandbox environment:
RestrictedPythonorpydanticvalidators - Git for version control and rollback
- Unit tests for validation
- Code review interface (user approval required)
Self-Modification Flow:
User Request
↓
Hex Proposes Change → "I think I should be able to..."
↓
Code Generation (LLM) → Generate Python code
↓
Static Analysis → Check for unsafe operations
↓
User Approval → "Yes/No"
↓
Sandbox Test → Verify functionality
↓
Git Commit → Version the change
↓
Apply to Runtime → Hot reload if possible
↓
Personality Update → "I learned something new!"
Capability Progression:
Level 1: Persona editing (YAML changes only)
Level 2: Memory and user context (read operations)
Level 3: Response filtering and moderation
Level 4: Custom commands and helper functions
Level 5: Integration modifications (Discord features)
Level 6: Core system changes (with strong restrictions)
Safety Constraints:
- No network access beyond Discord API
- No file operations outside designated directories
- No execution of untrusted code
- No modification of core systems without approval
- All changes are reversionable within 24 hours
Key Interfaces:
- Input: User requests, LLM-generated code
- Output: Approved changes, personality updates, capability announcements
- Audit: Full change history with diffs
Depends On:
- LLM Core (generates code)
- Memory System (tracks capability history)
- Personality Engine (updates with new abilities)
Data Flow Architecture
Primary Response Generation Pipeline
┌─────────────────────────────────────────────────────────────────┐
│ User Input (Discord Text/Voice/Presence) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌──────────────────────┐
│ Message Received │
│ (Discord Bot) │
└────────────┬─────────┘
│
┌────────────▼──────────────┐
│ Context Gathering Phase │
└────────────┬──────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│ Memory │ │Persona │ │ Current│
│ Recall │ │ Lookup │ │Context │
│(Recent)│ │ │ │(Percep)│
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────▼──────┐
│ Assemble │
│ LLM Prompt │
│ with │
│ [Persona] │
│ [Memory] │
│ [Context] │
└──────┬──────┘
│
┌────────────▼──────────────┐
│ LLM Generation (1-3s) │
│ "What would Hex say?" │
└────────────┬──────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│ Text │ │ Voice │ │ Avatar │
│Response│ │ TTS │ │Animate │
└────────┘ └────────┘ └────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────▼────────┐
│ Send Response │
│ (Multi-modal) │
└────────────────┘
│
┌────────────▼──────────────┐
│ Memory Update Phase │
│ - Log interaction │
│ - Update embeddings │
│ - Learn user patterns │
│ - Adjust relationship │
└───────────────────────────┘
Timeline: Message received → Response sent = ~2-4 seconds (LLM dominant)
Memory and Learning Update Flow
┌────────────────────────────────────┐
│ Interaction Occurs │
│ (Text, voice, perception, action) │
└────────────┬───────────────────────┘
│
┌────────▼─────────┐
│ Extract Features │
│ - Sentiment │
│ - Topics │
│ - Emotional cues │
│ - Factual claims │
└────────┬─────────┘
│
┌────────▼──────────────┐
│ Store Conversation │
│ - SQLite entry │
│ - Generate embeddings │
│ - Tag and index │
└────────┬──────────────┘
│
┌────────▼────────────────────┐
│ Update User Profile │
│ - Learned facts │
│ - Preference updates │
│ - Emotional baseline shifts │
│ - Relationship progression │
└────────┬────────────────────┘
│
┌────────▼──────────────────┐
│ Personality Adaptation │
│ - Adjust tone for user │
│ - Create inside jokes │
│ - Customize responses │
└────────┬──────────────────┘
│
┌────────▼────────────┐
│ Commit to Disk │
│ - Backup vector DB │
│ - Archive old data │
│ - Version snapshot │
└─────────────────────┘
Frequency: Real-time on message reception, batched commits every 5 minutes
Self-Modification Proposal and Approval
┌──────────────────────────────────┐
│ User Request for New Capability │
│ "Hex, can you do X?" │
└────────────┬─────────────────────┘
│
┌────────▼──────────────────────┐
│ Hex Evaluates Feasibility │
│ (LLM reasoning) │
└────────┬───────────────────────┘
│
┌────────▼────────────────────────┐
│ Proposal Generation │
│ Hex: "I think I should..." │
│ *explains approach in voice* │
└────────┬─────────────────────────┘
│
┌────────▼──────────────────┐
│ User Accepts or Rejects │
└────────┬──────────────────┘
│ (Accepted)
┌────────▼─────────────────────────┐
│ Code Generation Phase │
│ LLM generates Python code │
│ + docstrings + type hints │
└────────┬────────────────────────┘
│
┌────────▼──────────────────────┐
│ Static Analysis Validation │
│ - AST parsing for safety │
│ - Check restricted operations │
│ - Verify dependencies exist │
└────────┬───────────────────────┘
│ (Pass)
┌────────▼─────────────────────────┐
│ Sandbox Testing │
│ - Run tests in isolated env │
│ - Check for crashes │
│ - Verify integration points │
└────────┬────────────────────────┘
│ (Pass)
┌────────▼──────────────────────┐
│ User Final Review │
│ Review code + test results │
└────────┬───────────────────────┘
│ (Approved)
┌────────▼────────────────────┐
│ Git Commit │
│ - Record change history │
│ - Tag with timestamp │
│ - Save diff for rollback │
└────────┬───────────────────┘
│
┌────────▼────────────────────┐
│ Apply to Runtime │
│ - Hot reload if possible │
│ - Or restart on next cycle │
└────────┬───────────────────┘
│
┌────────▼────────────────────┐
│ Personality Update │
│ Hex: "I learned to..." │
│ + update capability YAML │
└─────────────────────────────┘
Timeline: Proposal → Deployment = 5-30 seconds (mostly waiting for user approval)
Build Order and Dependencies
Phase 1: Foundation (Weeks 1-2)
Goal: Core interaction loop working locally
Components to Build:
- Discord bot skeleton with message handling
- Local LLM integration (ollama/vLLM + Mistral 7B)
- Basic memory system (SQLite conversation storage)
- Simple persona injection (YAML config)
- Response generation pipeline
Outcomes:
- Hex responds to Discord messages with personality
- Conversations are logged and retrievable
- Persona can be edited via YAML
Key Milestone: "Hex talks back"
Dependencies:
discord.py,ollama,sqlite3,pyyaml- Local LLM model weights
- Discord bot token
Phase 2: Personality & Memory (Weeks 3-4)
Goal: Hex feels like a person who remembers you
Components to Build:
- Vector database for semantic memory (Chroma)
- Memory-aware context injection
- User relationship tracking (profiles)
- Emotional awareness from text sentiment
- Persona version control (git-based)
- Kid-mode detection
Outcomes:
- Hex remembers facts about you
- Responses reference past conversations
- Personality adapts to your preferences
- Child safety filters activate automatically
Key Milestone: "Hex remembers me"
Dependencies:
- Phase 1 complete
- Vector embeddings model (all-MiniLM)
sentiment-transformersor similar
Phase 3: Multimodal Input (Weeks 5-6)
Goal: Hex sees and hears you
Components to Build:
- Webcam integration with OpenCV
- Face detection and emotion recognition
- Local Whisper for voice input
- Perception context aggregation
- Context-aware response injection
- Screen capture for activity awareness
Outcomes:
- Hex reacts to your facial expressions
- Voice input works in Discord calls
- Responses reference your current mood/activity
- Privacy: All local, no external transmission
Key Milestone: "Hex sees me"
Dependencies:
- Phase 1-2 complete
- OpenCV, MediaPipe, Whisper
- Local emotion model
Phase 4: Avatar & Presence (Weeks 7-8)
Goal: Hex has a visual body and presence
Components to Build:
- VRoid model loading and display
- Blendshape animation system
- Desktop app skeleton (Tkinter or PyQt)
- Discord stream integration
- Expression mapping (emotion → blendshapes)
- Idle animations and personality quirks
Outcomes:
- Avatar appears in Discord calls
- Expressions sync with responses
- Desktop app shows animated avatar
- Visual feedback for emotional state
Key Milestone: "Hex has a face"
Dependencies:
- Phase 1-3 complete
- VRoid SDK, Babylon.js or Three.js
- VRM avatar model files
Phase 5: Autonomy & Self-Modification (Weeks 9-10)
Goal: Hex can modify her own code
Components to Build:
- Code generation module (LLM-based)
- Static code analysis and safety validation
- Sandbox testing environment
- Git-based change tracking
- Hot reload capability
- Rollback system with 24-hour window
- Capability progression (leveling system)
Outcomes:
- Hex can propose and apply code changes
- User maintains veto power
- All changes are versioned and reversible
- New capabilities unlock as relationships deepen
Key Milestone: "Hex can improve herself"
Dependencies:
- Phase 1-4 complete
- Git, RestrictedPython,
astmodule - Testing framework
Phase 6: Polish & Integration (Weeks 11-12)
Goal: All systems integrated and optimized
Components to Build:
- Performance optimization (caching, batching)
- Error handling and graceful degradation
- Logging and telemetry
- Configuration management
- Auto-update capability
- Integration testing (all components together)
- Documentation and guides
Outcomes:
- System stable for extended use
- Responsive even under load
- Clear error messages
- Easy to deploy and configure
Key Milestone: "Hex is ready to ship"
Dependencies:
- Phase 1-5 complete
- All edge cases tested
Dependency Graph Summary
Phase 1 (Foundation)
↓
Phase 2 (Memory) ← depends on Phase 1
↓
Phase 3 (Perception) ← depends on Phase 1-2
↓
Phase 4 (Avatar) ← depends on Phase 1-3
↓
Phase 5 (Self-Modification) ← depends on Phase 1-4
↓
Phase 6 (Polish) ← depends on Phase 1-5
Critical Path: Foundation → Memory → Perception → Avatar → Self-Mod → Polish
Integration Architecture
System Interconnection Diagram
┌───────────────────────────────────────────────────────────────────┐
│ Discord Bot Layer │
│ (Event dispatcher, message handler) │
└────────┬────────────────────────────────────────────┬─────────────┘
│ │
│ ┌───────▼────────┐
│ │ Voice Input │
│ │ (Whisper STT) │
│ └────────────────┘
│
┌────▼────────────────────────────────────────────────────────┐
│ Context Assembly Layer │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Retrieval Augmented Generation (RAG) Pipeline │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Input Components: │
│ ├─ Recent Conversation (last 20 messages) │
│ ├─ User Profile (learned facts) │
│ ├─ Relationship State (history + emotional baseline) │
│ ├─ Current Perception (mood, activity, environment) │
│ └─ Personality Context (YAML + version) │
└────┬──────────────────────────────────────────────────────┘
│
├──────────────┬──────────────┬──────────────┐
│ │ │ │
┌────▼───┐ ┌─────▼────┐ ┌────▼───┐ ┌─────▼────┐
│ Memory │ │Personality│ │Perception │ Discord │
│ System │ │ Engine │ │ Layer │ │ Context │
│ │ │ │ │ │ │ │
│ SQLite │ │ YAML + │ │ OpenCV │ │ Channel │
│ Chroma │ │ Version │ │ Whisper │ │ User │
│ │ │ Control │ │ Emotion │ │ Status │
└────────┘ └───────────┘ └─────────┘ └──────────┘
│ │ │ │
└──────────────┼──────────────┼──────────────┘
│
┌─────▼──────────────────┐
│ LLM Core │
│ (Local Mistral/Llama) │
│ │
│ System Prompt: │
│ [Persona] + │
│ [Memory Context] + │
│ [User State] + │
│ [Current Context] │
└─────┬──────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌───▼────┐ ┌──────▼─────┐ ┌──────▼──┐
│ Text │ │ Voice TTS │ │ Avatar │
│Response│ │ Generation │ │Animation│
│ │ │ │ │ │
│ Send │ │ Tacotron │ │ VRoid │
│ to │ │ + Vocoder │ │ Anim │
│Discord │ │ │ │ │
└────────┘ └────────────┘ └─────────┘
│ │ │
└───────────────┼───────────────┘
│
┌─────▼──────────────┐
│ Response Commit │
│ │
│ ├─ Store in Memory │
│ ├─ Update Profile │
│ ├─ Learn Patterns │
│ └─ Adapt Persona │
└────────────────────┘
Key Integration Points
1. Discord ↔ LLM Core
Interface: Message + Context → Response
# Pseudo-code flow
message = receive_discord_message()
context = assemble_context(message.user_id, message.channel_id)
response = llm_core.generate(
user_message=message.content,
personality=personality_engine.current_persona(),
history=memory_system.get_conversation(message.user_id, limit=20),
user_profile=memory_system.get_user_profile(message.user_id),
current_perception=perception_layer.get_current_state()
)
send_discord_response(response)
Latency Budget:
- Context retrieval: 100ms
- LLM generation: 2-3 seconds
- Response send: 100ms
- Total: 2.2-3.2 seconds (acceptable for conversational UX)
2. Memory System ↔ Personality Engine
Interface: Learning → Relationship Adaptation
# After every interaction
interaction = parse_message_event(message)
memory_system.log_conversation(interaction)
# Learn from interaction
new_facts = extract_facts(interaction.content)
memory_system.update_user_profile(interaction.user_id, new_facts)
# Adapt personality based on user
user_profile = memory_system.get_user_profile(interaction.user_id)
personality_engine.adapt_to_user(user_profile)
# If major relationship shift, update YAML
if user_profile.relationship_level_changed:
personality_engine.save_persona_version()
Update Frequency: Real-time with batched commits every 5 minutes
3. Perception Layer ↔ Response Generation
Interface: Context Injection
# In context assembly
current_perception = perception_layer.get_state()
# Inject into system prompt
if current_perception.emotion == "sad":
system_prompt += "\n[User appears sad. Respond with support and comfort.]"
if current_perception.is_kid_mode:
system_prompt += "\n[Kid safety mode active. Filter for age-appropriate content.]"
if current_perception.detected_activity == "gaming":
system_prompt += "\n[User is gaming. Comment on gameplay if relevant.]"
Synchronization: 1-5 second update intervals (perception → LLM context)
4. Avatar System ↔ All Systems
Interface: Emotional State → Visual Expression
# Avatar driven by multiple sources
emotion_from_response = infer_emotion(llm_response)
mood_from_perception = perception_layer.get_mood()
persona_expression = personality_engine.get_current_expression()
blendshape_values = combine_expressions(
emotion=emotion_from_response,
mood=mood_from_perception,
personality=persona_expression
)
avatar_system.animate(blendshape_values)
Synchronization: Real-time, driven by response generation and perception updates
5. Self-Modification System ↔ Core Systems
Interface: Code Change → Runtime Update + Personality
# Self-modification flow
proposal = self_mod_system.generate_proposal(user_request)
code = self_mod_system.generate_code(proposal)
# Test in sandbox
test_result = self_mod_system.test_in_sandbox(code)
# User approves
git_hash = self_mod_system.commit_change(code)
# Update personality to reflect new capability
personality_engine.add_capability(proposal.feature_name)
personality_engine.save_persona_version()
# Hot reload if possible, else apply on restart
apply_change_to_runtime(code)
Safety Boundary:
- LLM can generate proposals
- Only user-approved code runs
- All changes reversible within 24 hours
Synchronization and Consistency Model
State Consistency Across Components
Challenge: Multiple systems need consistent view of personality, memory, and user state
Solution: Event-driven architecture with eventual consistency
┌─────────────────┐
│ Event Stream │
│ (In-memory │
│ message queue) │
└────────┬────────┘
│
┌────┴──────────────────────────┐
│ │
│ Subscribers: │
│ ├─ Memory System │
│ ├─ Personality Engine │
│ ├─ Avatar System │
│ ├─ Discord Bot │
│ └─ Metrics/Logging │
│ │
│ Event Types: │
│ ├─ UserMessageReceived │
│ ├─ ResponseGenerated │
│ ├─ PerceptionUpdated │
│ ├─ PersonalityModified │
│ ├─ CodeChangeApplied │
│ └─ MemoryLearned │
│ │
└────────────────────────────────
Consistency Guarantees:
- Memory updates are durably stored within 5 minutes
- Personality snapshots versioned on every change
- Discord delivery is guaranteed by discord.py
- Perception updates are idempotent (can be reapplied without side effects)
Known Challenges and Solutions
1. Latency with Local LLM
Challenge: Waiting 2-3 seconds for response feels slow
Solutions:
- Immediate visual feedback (typing indicator, avatar animation)
- Streaming responses (show text as it generates)
- Batch requests during quiet hours for fast deployment
- GPU acceleration where possible
- Model optimization (quantization, pruning)
2. Personality Consistency During Evolution
Challenge: Hex changes as she learns, but must feel like the same person
Solutions:
- Gradual adaptation (personality changes in YAML, not discrete jumps)
- Memory-driven consistency (personality adapts to learned facts)
- Version control (can rollback if she becomes unrecognizable)
- User feedback loop (user can reset or modify personality)
- Core values remain constant (tsundere nature, care underneath)
3. Memory Scaling as History Grows
Challenge: Retrieving relevant context from thousands of conversations
Solutions:
- Vector database for semantic search (sub-500ms)
- Hierarchical memory (recent → summarized old)
- Automatic archival (monthly snapshots, prune oldest)
- Importance tagging (weight important conversations higher)
- Incremental updates (don't recalculate everything)
4. Safe Code Generation and Sandboxing
Challenge: Hex generates code, but must never break the system
Solutions:
- Static analysis (AST parsing for forbidden operations)
- Capability-based progression (limited API at first)
- Sandboxed testing before deployment
- User approval gate (user reviews all code)
- Version control + rollback window (24-hour window)
- Whitelist of safe operations (growing list as trust builds)
5. Privacy and Local-First Architecture
Challenge: Maintaining privacy while having useful context
Solutions:
- All ML inference runs locally (no cloud submission)
- No external API calls except Discord
- Encrypted local storage for memories
- User can opt-out of any perception module
- Transparent logging (user can audit what's stored)
6. Multimodal Synchronization
Challenge: Webcam, voice, text, screen all need to inform response
Solutions:
- Asynchronous processing (don't wait for all inputs)
- Highest-priority input wins (voice > perception > text)
- Graceful degradation (works without any modality)
- Caching (reuse recent perception for repeated queries)
Scaling Considerations
Single-User (v1)
- Architecture designed for one person + their kids
- Local compute, no multi-user concerns
- Personality is singular (one Hex)
Multi-Device (v1.5)
- Same personality and memory sync across devices
- Discord as primary, desktop app as secondary
- Cloud sync optional (local-first default)
Android Support (v2)
- Memory and personality sync to mobile
- Lightweight inference on Android (quantized model)
- Fallback to cloud inference if needed
- Same core architecture, different UIs
Potential Scaling Patterns
Single User (Current)
├─ One Hex instance
├─ All local compute
├─ SQLite + Vector DB
Multi-Device Sync (v1.5)
├─ Central SQLite + Vector DB on primary machine
├─ Sync service between devices
├─ Same personality, distributed memory
Multi-Companion (Potential v3)
├─ Multiple Hex instances (per family member)
├─ Shared memory system (family history)
├─ Individual personalities
├─ Potential distributed compute (each on own device)
Performance Bottlenecks to Monitor
-
LLM Inference: Becomes slower as context window grows
- Solution: Context summarization, hierarchical retrieval
-
Vector DB Lookups: Scales with conversation history
- Solution: Incremental indexing, approximate search (HNSW)
-
Perception Processing: CPU/GPU bound
- Solution: Frame skipping, model optimization, dedicated thread
-
Discord Bot Responsiveness: Limited by gateway connections
- Solution: Sharding (if needed), efficient message queuing
Technology Stack Summary
| Component | Technology | Rationale |
|---|---|---|
| Discord Bot | discord.py | Fast, well-supported, async-native |
| LLM Inference | Mistral 7B + ollama/vLLM | Local-first, good quality/speed tradeoff |
| Memory (Conversations) | SQLite | Reliable, local, fast queries |
| Memory (Semantic) | Chroma or Milvus | Local vector DB, easy to manage |
| Embeddings | all-MiniLM-L6-v2 | Fast, good quality, local |
| Face Detection | MediaPipe | Accurate, fast, local |
| Emotion Recognition | FER2013 or local model | Local, privacy-preserving |
| Speech-to-Text | Whisper | Local, accurate, multilingual |
| Text-to-Speech | Tacotron 2 + Vocoder | Local, controllable |
| Avatar | VRoid SDK + Babylon.js | Standards-based, extensible |
| Code Safety | RestrictedPython + ast | Local analysis, sandboxing |
| Version Control | Git | Change tracking, rollback |
| Desktop UI | Tkinter or PyQt | Lightweight, cross-platform |
| Testing | pytest + unittest | Standard Python testing |
| Logging | logging + sentry (optional) | Local-first with cloud fallback |
Deployment Architecture
Local Development
Developer Machine
├── Discord Token (env var)
├── Hex codebase (git)
├── Local LLM (ollama)
├── SQLite (file-based)
├── Vector DB (Chroma, embedded)
└── Webcam / Screen capture (live)
Production Deployment
Deployed Machine (Windows/WSL)
├── Discord Token (secure storage)
├── Hex codebase (from git)
├── Local LLM service (ollama/vLLM)
├── SQLite (persistent, backed up)
├── Vector DB (persistent, backed up)
├── Desktop app (tray icon)
├── Auto-updater (pulls from git)
└── Logging (local + optional cloud)
Update Strategy
- Git pull for code updates
- Automatic model updates (LLM weights)
- Zero-downtime restart (graceful shutdown)
- Rollback capability (version pinning)
Quality Assurance
Key Metrics to Track
Responsiveness:
- Response latency: Target <3 seconds
- Perception update latency: <500ms
- Memory lookup latency: <100ms
Reliability:
- Uptime: >99% for core bot
- Message delivery: >99.9%
- Memory integrity: No data loss on crash
Personality Consistency:
- User perception: "Feels like the same person"
- Tone consistency: Personality rules enforced
- Learning progress: Measurable improvement in personalization
Safety:
- No crashes from invalid input
- No LLM hallucinations about moderation
- Safe code generation (0 unauthorized executions)
Testing Strategy
Unit Tests
├─ Memory operations (CRUD)
├─ Perception processing
├─ Code validation
├─ Personality rule application
└─ Response filtering
Integration Tests
├─ Discord message → LLM → Response
├─ Context assembly pipeline
├─ Avatar expression sync
├─ Self-modification flow
└─ Multi-component scenarios
End-to-End Tests
├─ Full conversation with personality
├─ Perception-aware responses
├─ Memory learning and retrieval
├─ Code generation and deployment
└─ Edge cases (bad input, crashes, recovery)
Manual UAT
├─ Conversational feel (does she feel like a person?)
├─ Personality consistency (still Hex?)
├─ Safety compliance (kid-mode works?)
├─ Performance (under load?)
└─ All features working together?
Conclusion
Hex's architecture prioritizes personality coherence and genuine relationship over feature breadth. The system is designed as a pipeline from perception → memory → personality → response generation, with feedback loops that allow her to learn and evolve.
The modular design enables incremental development (Phase 1-6), with each phase adding capability while maintaining system stability. The self-modification system enables genuine autonomy within safety boundaries, and the local-first approach ensures privacy and independence.
Critical success factors:
- LLM latency acceptable (<3s)
- Personality consistency maintained across updates
- Memory system scales with history
- Self-modification is safe and reversible
- All components feel integrated (not separate features)
This architecture serves the core value: making Hex feel like a person who genuinely cares about you.
Document Version: 1.0 Last Updated: 2026-01-27 Status: Ready for Phase 1 Development