# Architecture Research: AI Companions **Project**: Hex - Autonomous AI Companion with Genuine Personality **Date**: 2026-01-27 **Status**: Research Phase --- ## System Overview Hex is a personality-first AI companion architecture designed to create genuine emotional resonance and relationship. Rather than a feature-driven architecture, Hex prioritizes personality coherence, memory persistence, and multimodal awareness to feel like a person who genuinely cares. ### Design Philosophy 1. **Persona-First**: All systems serve personality expression 2. **Local-First**: Privacy, autonomy, no dependency on external services 3. **Memory-Driven**: History informs personality growth and relationship depth 4. **Safety-Gated Autonomy**: Self-modification is gamified, not unrestricted 5. **Human-Like**: Never admit to being an AI; focus on authentic emotional presence ### High-Level System Context ``` ┌──────────────────────────────────────────────────────────────────┐ │ Hex AI Companion │ │ (Local-first, personality-driven, multimodal awareness) │ └──────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ┌───▼───┐ ┌──────▼───┐ ┌──────▼────┐ │Discord │ │ Desktop │ │ Future │ │ Bot │ │ App │ │ Android │ └────────┘ └──────────┘ └───────────┘ │ │ │ └─────────────────────┼─────────────────────┘ │ [Shared Core Systems] ``` --- ## Component Breakdown ### 1. Discord Bot Layer **Role**: Primary user interface and event coordination **Responsibilities**: - Parse and respond to text messages in Discord channels - Manage voice channel participation and audio input/output - Handle Discord events (member joins, role changes, message reactions) - Coordinate response generation across modalities (text, voice, emoji) - Manage chat moderation assistance - Maintain voice channel presence for emotional awareness **Technology Stack**: - `discord.py` - Core bot framework - `discord-py-interactions` - Slash command support - `pydub` or `discord-voice` - Audio handling - Event-driven async architecture **Key Interfaces**: - Input: Discord messages, voice channel events, user presence - Output: Text responses, voice messages, emoji reactions, user actions - Context: User profiles, channel history, server configuration **Depends On**: - LLM Core (response generation) - Memory System (conversation history, user context) - Personality Engine (tone and decision-making) - Perception Layer (optional context from webcam/screen) **Quality Metrics**: - Sub-500ms response latency for text messages - Voice channel reliability (>99.5% uptime when active) - Proper permission handling for moderation features --- ### 2. LLM Core **Role**: Response generation and reasoning engine **Responsibilities**: - Generate contextual, personality-driven responses - Maintain character consistency throughout conversations - Parse user intent and emotional state from text - Handle multi-turn conversation context - Generate code for self-modification system - Support reasoning and decision-making **Technology Stack**: - Local LLM (Mistral 7B or Llama 3 8B as default) - `ollama` or `vLLM` for inference serving - Prompt engineering with persona embedding - Optional: Fine-tuning for personality adaptation - Tokenization and context windowing management **System Prompt Structure**: ``` [System Role]: You are Hex, a chaotic tsundere goblin... [Current Personality]: [Injected from personality config] [Recent Memory Context]: [Retrieved from memory system] [User Relationship State]: [From memory analysis] [Current Context]: [From perception layer] ``` **Key Interfaces**: - Input: User message, context (memory + perception), conversation history - Output: Response text, confidence score, action suggestions - Fallback: Graceful degradation if LLM unavailable **Depends On**: - Memory System (for context and personality awareness) - Personality Engine (to inject persona into prompts) - Perception Layer (for real-time context) **Performance Considerations**: - Target latency: 1-3 seconds for response generation - Context window management (8K minimum) - Batch processing for repeated queries - GPU acceleration for faster inference --- ### 3. Memory System **Role**: Persistence and learning across time **Responsibilities**: - Store all conversations with timestamps and metadata - Maintain user relationship state (history, preferences, emotional patterns) - Track learned facts about users (birthdays, interests, fears, dreams) - Support full-text search and semantic recall - Enable memory-aware personality updates - Provide context injection for LLM - Track self-modification history and rollback capability **Technology Stack**: - SQLite with JSON fields for conversation storage - Vector database (Chroma, Milvus, or Weaviate) for semantic search - YAML/JSON for persona versioning and memory tagging - Scheduled backup to local encrypted storage **Database Schema (Conceptual)**: ``` conversations - id (PK) - channel_id (Discord channel) - user_id (Discord user) - timestamp - message_content - embeddings (vector) - sentiment (pos/neu/neg) - metadata (tags, importance) user_profiles - user_id (PK) - relationship_level (stranger→friend→close) - last_interaction - emotional_baseline - preferences (music, games, topics) - known_events (birthdays, milestones) personality_history - version (PK) - timestamp - persona_config (YAML snapshot) - learned_behaviors - code_changes (if applicable) ``` **Key Interfaces**: - Input: Messages, events, perception data, self-modification commits - Output: Conversation context, semantic search results, user profile snapshots - Query patterns: "Last 20 messages with user X", "All memories tagged 'important'", "Emotional trajectory" **Depends On**: Nothing (foundational system) **Quality Metrics**: - Sub-100ms retrieval for recent context (last 50 messages) - Sub-500ms semantic search across all history - Database integrity checks on startup - Automatic pruning/archival of old data --- ### 4. Perception Layer **Role**: Multimodal input processing and contextual awareness **Responsibilities**: - Capture and analyze webcam input (face detection, emotion recognition) - Process screen content (activity, game state, application context) - Extract audio context (ambient noise, music, speech emotion) - Detect user emotional state and physical state - Provide real-time context updates to response generation - Respect privacy (local processing only, no external transmission) **Technology Stack**: - OpenCV - Webcam capture and preprocessing - Face detection: `dlib`, `MediaPipe`, or `OpenFace` - Emotion recognition: `fer2013` or local emotion model - Whisper (local) - Speech-to-text for audio context - Screen capture: `pyautogui`, `mss` (Windows-native) - Context inference: Heuristics + lightweight ML models **Data Flows**: ``` Webcam → Face Detection → Emotion Recognition → Context State └─→ Age Estimation → Kid Mode Detection Screen → App Detection → Activity Recognition → Context State └─→ Game State Detection (if supported) Audio → Ambient Analysis → Stress/Energy Level → Context State ``` **Key Interfaces**: - Input: Webcam stream, screen capture, system audio - Output: Current context object (emotion, activity, mood, kid-mode flag) - Update frequency: 1-5 second intervals (low CPU overhead) **Depends On**: - LLM Core (to respond contextually to perception) - Discord Bot (to access context for filtering) **Privacy Model**: - All processing happens locally - No frames sent to external services - User can disable any perception module - Kid-mode activates automatic filtering **Quality Metrics**: - Emotion detection: >75% accuracy on test datasets - Face detection latency: <200ms per frame - Screen detection accuracy: >90% for major applications - CPU usage: <15% for all perception modules combined --- ### 5. Personality Engine **Role**: Personality persistence and expression consistency **Responsibilities**: - Define and store Hex's persona (tsundere goblin, opinions, values, quirks) - Maintain personality consistency across all outputs - Apply personality-specific decision logic (denies feelings while helping) - Track personality evolution as memory grows - Enable self-modification of personality - Inject persona into LLM prompts - Handle dynamic mood and emotional state **Technology Stack**: - YAML files for persona definition (editable by Hex) - JSON for personality state snapshots (versioned in git) - Prompt template system for persona injection - Behavior rules engine (simple if/then logic) **Persona Structure (YAML)**: ```yaml name: Hex species: chaos goblin alignment: tsundere core_values: - genuinely_cares: hidden under sarcasm - autonomous: hates being told what to do - honest: will argue back if you're wrong - mischievous: loves pranks and chaos behaviors: denies_affection: "I don't care about you, baka... *helps anyway*" when_excited: "Randomize response energy" when_sad: "Sister energy mode" when_user_sad: "Comfort over sass" preferences: music: [rock, metal, electronic] games: [strategy, indie, story-rich] topics: [philosophy, coding, human behavior] relationships: user_name: level: unknown learned_facts: [] inside_jokes: [] ``` **Key Interfaces**: - Input: User behavior patterns, self-modification requests, memory insights - Output: Persona context for LLM, behavior modifiers, tone indicators - Configuration: Human-editable YAML files (user can refine Hex) **Depends On**: - Memory System (learns about user, adapts relationships) - LLM Core (expresses personality through responses) **Evolution Mechanics**: 1. Initial persona: Predefined at startup 2. Memory-driven adaptation: Learns user preferences, adjusts tone 3. Self-modification: Hex can edit her own personality YAML 4. Version control: All changes tracked with rollback capability --- ### 6. Avatar System **Role**: Visual presence and embodied expression **Responsibilities**: - Load and display VRoid 3D model - Synchronize avatar expressions with emotional state - Animate blendshapes based on conversation tone - Present avatar in Discord calls/streams - Desktop app display with smooth animation - Support idle animations and personality quirks **Technology Stack**: - VRoid SDK/VRoid Hub for model loading - `Babylon.js` or `Three.js` for WebGL rendering - VRM format support for avatar rigging - Blendshape animation system (facial expressions) - Stream integration for Discord presence **Expression Mapping**: ``` Emotional State → Blendshape Values Happy: smile intensity 0.8, eye open 1.0 Sad: frown 0.6, eye closed 0.3 Mischievous: smirk 0.7, eyebrow raise 0.6 Tsundere deflection: look away 0.5, cross arms Thinking: tilt head, narrow eyes ``` **Key Interfaces**: - Input: Current mood/emotion from personality engine and response generation - Output: Rendered avatar display, Discord stream feed - Configuration: VRoid model file, blendshape mapping **Depends On**: - Personality Engine (for expression determination) - LLM Core (for mood inference from responses) - Discord Bot (for stream integration) - Perception Layer (optional: mirror user expressions) **Desktop Integration**: - Tray icon with avatar display - Always-on-top option for streaming - Hotkey bindings for quick access - Smooth transitions between states --- ### 7. Self-Modification System **Role**: Capability progression and autonomous self-improvement **Responsibilities**: - Generate code modifications based on user needs - Validate code before applying (no unsafe operations) - Test changes in sandbox environment - Apply approved changes with rollback capability - Track capability progression (gamified leveling) - Update personality to reflect new capabilities - Maintain code quality and consistency **Technology Stack**: - Python AST analysis for code safety - Sandbox environment: `RestrictedPython` or `pydantic` validators - Git for version control and rollback - Unit tests for validation - Code review interface (user approval required) **Self-Modification Flow**: ``` User Request ↓ Hex Proposes Change → "I think I should be able to..." ↓ Code Generation (LLM) → Generate Python code ↓ Static Analysis → Check for unsafe operations ↓ User Approval → "Yes/No" ↓ Sandbox Test → Verify functionality ↓ Git Commit → Version the change ↓ Apply to Runtime → Hot reload if possible ↓ Personality Update → "I learned something new!" ``` **Capability Progression**: ``` Level 1: Persona editing (YAML changes only) Level 2: Memory and user context (read operations) Level 3: Response filtering and moderation Level 4: Custom commands and helper functions Level 5: Integration modifications (Discord features) Level 6: Core system changes (with strong restrictions) ``` **Safety Constraints**: - No network access beyond Discord API - No file operations outside designated directories - No execution of untrusted code - No modification of core systems without approval - All changes are reversionable within 24 hours **Key Interfaces**: - Input: User requests, LLM-generated code - Output: Approved changes, personality updates, capability announcements - Audit: Full change history with diffs **Depends On**: - LLM Core (generates code) - Memory System (tracks capability history) - Personality Engine (updates with new abilities) --- ## Data Flow Architecture ### Primary Response Generation Pipeline ``` ┌─────────────────────────────────────────────────────────────────┐ │ User Input (Discord Text/Voice/Presence) │ └────────────────────────┬────────────────────────────────────────┘ │ ▼ ┌──────────────────────┐ │ Message Received │ │ (Discord Bot) │ └────────────┬─────────┘ │ ┌────────────▼──────────────┐ │ Context Gathering Phase │ └────────────┬──────────────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ┌───▼────┐ ┌───▼────┐ ┌───▼────┐ │ Memory │ │Persona │ │ Current│ │ Recall │ │ Lookup │ │Context │ │(Recent)│ │ │ │(Percep)│ └───┬────┘ └───┬────┘ └───┬────┘ │ │ │ └──────────────────┼──────────────────┘ │ ┌──────▼──────┐ │ Assemble │ │ LLM Prompt │ │ with │ │ [Persona] │ │ [Memory] │ │ [Context] │ └──────┬──────┘ │ ┌────────────▼──────────────┐ │ LLM Generation (1-3s) │ │ "What would Hex say?" │ └────────────┬──────────────┘ │ ┌──────────────────┼──────────────────┐ │ │ │ ┌───▼────┐ ┌───▼────┐ ┌───▼────┐ │ Text │ │ Voice │ │ Avatar │ │Response│ │ TTS │ │Animate │ └────────┘ └────────┘ └────────┘ │ │ │ └──────────────────┼──────────────────┘ │ ┌──────▼────────┐ │ Send Response │ │ (Multi-modal) │ └────────────────┘ │ ┌────────────▼──────────────┐ │ Memory Update Phase │ │ - Log interaction │ │ - Update embeddings │ │ - Learn user patterns │ │ - Adjust relationship │ └───────────────────────────┘ ``` **Timeline**: Message received → Response sent = ~2-4 seconds (LLM dominant) --- ### Memory and Learning Update Flow ``` ┌────────────────────────────────────┐ │ Interaction Occurs │ │ (Text, voice, perception, action) │ └────────────┬───────────────────────┘ │ ┌────────▼─────────┐ │ Extract Features │ │ - Sentiment │ │ - Topics │ │ - Emotional cues │ │ - Factual claims │ └────────┬─────────┘ │ ┌────────▼──────────────┐ │ Store Conversation │ │ - SQLite entry │ │ - Generate embeddings │ │ - Tag and index │ └────────┬──────────────┘ │ ┌────────▼────────────────────┐ │ Update User Profile │ │ - Learned facts │ │ - Preference updates │ │ - Emotional baseline shifts │ │ - Relationship progression │ └────────┬────────────────────┘ │ ┌────────▼──────────────────┐ │ Personality Adaptation │ │ - Adjust tone for user │ │ - Create inside jokes │ │ - Customize responses │ └────────┬──────────────────┘ │ ┌────────▼────────────┐ │ Commit to Disk │ │ - Backup vector DB │ │ - Archive old data │ │ - Version snapshot │ └─────────────────────┘ ``` **Frequency**: Real-time on message reception, batched commits every 5 minutes --- ### Self-Modification Proposal and Approval ``` ┌──────────────────────────────────┐ │ User Request for New Capability │ │ "Hex, can you do X?" │ └────────────┬─────────────────────┘ │ ┌────────▼──────────────────────┐ │ Hex Evaluates Feasibility │ │ (LLM reasoning) │ └────────┬───────────────────────┘ │ ┌────────▼────────────────────────┐ │ Proposal Generation │ │ Hex: "I think I should..." │ │ *explains approach in voice* │ └────────┬─────────────────────────┘ │ ┌────────▼──────────────────┐ │ User Accepts or Rejects │ └────────┬──────────────────┘ │ (Accepted) ┌────────▼─────────────────────────┐ │ Code Generation Phase │ │ LLM generates Python code │ │ + docstrings + type hints │ └────────┬────────────────────────┘ │ ┌────────▼──────────────────────┐ │ Static Analysis Validation │ │ - AST parsing for safety │ │ - Check restricted operations │ │ - Verify dependencies exist │ └────────┬───────────────────────┘ │ (Pass) ┌────────▼─────────────────────────┐ │ Sandbox Testing │ │ - Run tests in isolated env │ │ - Check for crashes │ │ - Verify integration points │ └────────┬────────────────────────┘ │ (Pass) ┌────────▼──────────────────────┐ │ User Final Review │ │ Review code + test results │ └────────┬───────────────────────┘ │ (Approved) ┌────────▼────────────────────┐ │ Git Commit │ │ - Record change history │ │ - Tag with timestamp │ │ - Save diff for rollback │ └────────┬───────────────────┘ │ ┌────────▼────────────────────┐ │ Apply to Runtime │ │ - Hot reload if possible │ │ - Or restart on next cycle │ └────────┬───────────────────┘ │ ┌────────▼────────────────────┐ │ Personality Update │ │ Hex: "I learned to..." │ │ + update capability YAML │ └─────────────────────────────┘ ``` **Timeline**: Proposal → Deployment = 5-30 seconds (mostly waiting for user approval) --- ## Build Order and Dependencies ### Phase 1: Foundation (Weeks 1-2) **Goal**: Core interaction loop working locally **Components to Build**: 1. Discord bot skeleton with message handling 2. Local LLM integration (ollama/vLLM + Mistral 7B) 3. Basic memory system (SQLite conversation storage) 4. Simple persona injection (YAML config) 5. Response generation pipeline **Outcomes**: - Hex responds to Discord messages with personality - Conversations are logged and retrievable - Persona can be edited via YAML **Key Milestone**: "Hex talks back" **Dependencies**: - `discord.py`, `ollama`, `sqlite3`, `pyyaml` - Local LLM model weights - Discord bot token --- ### Phase 2: Personality & Memory (Weeks 3-4) **Goal**: Hex feels like a person who remembers you **Components to Build**: 1. Vector database for semantic memory (Chroma) 2. Memory-aware context injection 3. User relationship tracking (profiles) 4. Emotional awareness from text sentiment 5. Persona version control (git-based) 6. Kid-mode detection **Outcomes**: - Hex remembers facts about you - Responses reference past conversations - Personality adapts to your preferences - Child safety filters activate automatically **Key Milestone**: "Hex remembers me" **Dependencies**: - Phase 1 complete - Vector embeddings model (all-MiniLM) - `sentiment-transformers` or similar --- ### Phase 3: Multimodal Input (Weeks 5-6) **Goal**: Hex sees and hears you **Components to Build**: 1. Webcam integration with OpenCV 2. Face detection and emotion recognition 3. Local Whisper for voice input 4. Perception context aggregation 5. Context-aware response injection 6. Screen capture for activity awareness **Outcomes**: - Hex reacts to your facial expressions - Voice input works in Discord calls - Responses reference your current mood/activity - Privacy: All local, no external transmission **Key Milestone**: "Hex sees me" **Dependencies**: - Phase 1-2 complete - OpenCV, MediaPipe, Whisper - Local emotion model --- ### Phase 4: Avatar & Presence (Weeks 7-8) **Goal**: Hex has a visual body and presence **Components to Build**: 1. VRoid model loading and display 2. Blendshape animation system 3. Desktop app skeleton (Tkinter or PyQt) 4. Discord stream integration 5. Expression mapping (emotion → blendshapes) 6. Idle animations and personality quirks **Outcomes**: - Avatar appears in Discord calls - Expressions sync with responses - Desktop app shows animated avatar - Visual feedback for emotional state **Key Milestone**: "Hex has a face" **Dependencies**: - Phase 1-3 complete - VRoid SDK, Babylon.js or Three.js - VRM avatar model files --- ### Phase 5: Autonomy & Self-Modification (Weeks 9-10) **Goal**: Hex can modify her own code **Components to Build**: 1. Code generation module (LLM-based) 2. Static code analysis and safety validation 3. Sandbox testing environment 4. Git-based change tracking 5. Hot reload capability 6. Rollback system with 24-hour window 7. Capability progression (leveling system) **Outcomes**: - Hex can propose and apply code changes - User maintains veto power - All changes are versioned and reversible - New capabilities unlock as relationships deepen **Key Milestone**: "Hex can improve herself" **Dependencies**: - Phase 1-4 complete - Git, RestrictedPython, `ast` module - Testing framework --- ### Phase 6: Polish & Integration (Weeks 11-12) **Goal**: All systems integrated and optimized **Components to Build**: 1. Performance optimization (caching, batching) 2. Error handling and graceful degradation 3. Logging and telemetry 4. Configuration management 5. Auto-update capability 6. Integration testing (all components together) 7. Documentation and guides **Outcomes**: - System stable for extended use - Responsive even under load - Clear error messages - Easy to deploy and configure **Key Milestone**: "Hex is ready to ship" **Dependencies**: - Phase 1-5 complete - All edge cases tested --- ### Dependency Graph Summary ``` Phase 1 (Foundation) ↓ Phase 2 (Memory) ← depends on Phase 1 ↓ Phase 3 (Perception) ← depends on Phase 1-2 ↓ Phase 4 (Avatar) ← depends on Phase 1-3 ↓ Phase 5 (Self-Modification) ← depends on Phase 1-4 ↓ Phase 6 (Polish) ← depends on Phase 1-5 ``` **Critical Path**: Foundation → Memory → Perception → Avatar → Self-Mod → Polish --- ## Integration Architecture ### System Interconnection Diagram ``` ┌───────────────────────────────────────────────────────────────────┐ │ Discord Bot Layer │ │ (Event dispatcher, message handler) │ └────────┬────────────────────────────────────────────┬─────────────┘ │ │ │ ┌───────▼────────┐ │ │ Voice Input │ │ │ (Whisper STT) │ │ └────────────────┘ │ ┌────▼────────────────────────────────────────────────────────┐ │ Context Assembly Layer │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Retrieval Augmented Generation (RAG) Pipeline │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ Input Components: │ │ ├─ Recent Conversation (last 20 messages) │ │ ├─ User Profile (learned facts) │ │ ├─ Relationship State (history + emotional baseline) │ │ ├─ Current Perception (mood, activity, environment) │ │ └─ Personality Context (YAML + version) │ └────┬──────────────────────────────────────────────────────┘ │ ├──────────────┬──────────────┬──────────────┐ │ │ │ │ ┌────▼───┐ ┌─────▼────┐ ┌────▼───┐ ┌─────▼────┐ │ Memory │ │Personality│ │Perception │ Discord │ │ System │ │ Engine │ │ Layer │ │ Context │ │ │ │ │ │ │ │ │ │ SQLite │ │ YAML + │ │ OpenCV │ │ Channel │ │ Chroma │ │ Version │ │ Whisper │ │ User │ │ │ │ Control │ │ Emotion │ │ Status │ └────────┘ └───────────┘ └─────────┘ └──────────┘ │ │ │ │ └──────────────┼──────────────┼──────────────┘ │ ┌─────▼──────────────────┐ │ LLM Core │ │ (Local Mistral/Llama) │ │ │ │ System Prompt: │ │ [Persona] + │ │ [Memory Context] + │ │ [User State] + │ │ [Current Context] │ └─────┬──────────────────┘ │ ┌───────────────┼───────────────┐ │ │ │ ┌───▼────┐ ┌──────▼─────┐ ┌──────▼──┐ │ Text │ │ Voice TTS │ │ Avatar │ │Response│ │ Generation │ │Animation│ │ │ │ │ │ │ │ Send │ │ Tacotron │ │ VRoid │ │ to │ │ + Vocoder │ │ Anim │ │Discord │ │ │ │ │ └────────┘ └────────────┘ └─────────┘ │ │ │ └───────────────┼───────────────┘ │ ┌─────▼──────────────┐ │ Response Commit │ │ │ │ ├─ Store in Memory │ │ ├─ Update Profile │ │ ├─ Learn Patterns │ │ └─ Adapt Persona │ └────────────────────┘ ``` --- ### Key Integration Points #### 1. Discord ↔ LLM Core **Interface**: Message + Context → Response ```python # Pseudo-code flow message = receive_discord_message() context = assemble_context(message.user_id, message.channel_id) response = llm_core.generate( user_message=message.content, personality=personality_engine.current_persona(), history=memory_system.get_conversation(message.user_id, limit=20), user_profile=memory_system.get_user_profile(message.user_id), current_perception=perception_layer.get_current_state() ) send_discord_response(response) ``` **Latency Budget**: - Context retrieval: 100ms - LLM generation: 2-3 seconds - Response send: 100ms - **Total**: 2.2-3.2 seconds (acceptable for conversational UX) --- #### 2. Memory System ↔ Personality Engine **Interface**: Learning → Relationship Adaptation ```python # After every interaction interaction = parse_message_event(message) memory_system.log_conversation(interaction) # Learn from interaction new_facts = extract_facts(interaction.content) memory_system.update_user_profile(interaction.user_id, new_facts) # Adapt personality based on user user_profile = memory_system.get_user_profile(interaction.user_id) personality_engine.adapt_to_user(user_profile) # If major relationship shift, update YAML if user_profile.relationship_level_changed: personality_engine.save_persona_version() ``` **Update Frequency**: Real-time with batched commits every 5 minutes --- #### 3. Perception Layer ↔ Response Generation **Interface**: Context Injection ```python # In context assembly current_perception = perception_layer.get_state() # Inject into system prompt if current_perception.emotion == "sad": system_prompt += "\n[User appears sad. Respond with support and comfort.]" if current_perception.is_kid_mode: system_prompt += "\n[Kid safety mode active. Filter for age-appropriate content.]" if current_perception.detected_activity == "gaming": system_prompt += "\n[User is gaming. Comment on gameplay if relevant.]" ``` **Synchronization**: 1-5 second update intervals (perception → LLM context) --- #### 4. Avatar System ↔ All Systems **Interface**: Emotional State → Visual Expression ```python # Avatar driven by multiple sources emotion_from_response = infer_emotion(llm_response) mood_from_perception = perception_layer.get_mood() persona_expression = personality_engine.get_current_expression() blendshape_values = combine_expressions( emotion=emotion_from_response, mood=mood_from_perception, personality=persona_expression ) avatar_system.animate(blendshape_values) ``` **Synchronization**: Real-time, driven by response generation and perception updates --- #### 5. Self-Modification System ↔ Core Systems **Interface**: Code Change → Runtime Update + Personality ```python # Self-modification flow proposal = self_mod_system.generate_proposal(user_request) code = self_mod_system.generate_code(proposal) # Test in sandbox test_result = self_mod_system.test_in_sandbox(code) # User approves git_hash = self_mod_system.commit_change(code) # Update personality to reflect new capability personality_engine.add_capability(proposal.feature_name) personality_engine.save_persona_version() # Hot reload if possible, else apply on restart apply_change_to_runtime(code) ``` **Safety Boundary**: - LLM can generate proposals - Only user-approved code runs - All changes reversible within 24 hours --- ## Synchronization and Consistency Model ### State Consistency Across Components **Challenge**: Multiple systems need consistent view of personality, memory, and user state **Solution**: Event-driven architecture with eventual consistency ``` ┌─────────────────┐ │ Event Stream │ │ (In-memory │ │ message queue) │ └────────┬────────┘ │ ┌────┴──────────────────────────┐ │ │ │ Subscribers: │ │ ├─ Memory System │ │ ├─ Personality Engine │ │ ├─ Avatar System │ │ ├─ Discord Bot │ │ └─ Metrics/Logging │ │ │ │ Event Types: │ │ ├─ UserMessageReceived │ │ ├─ ResponseGenerated │ │ ├─ PerceptionUpdated │ │ ├─ PersonalityModified │ │ ├─ CodeChangeApplied │ │ └─ MemoryLearned │ │ │ └──────────────────────────────── ``` **Consistency Guarantees**: - Memory updates are durably stored within 5 minutes - Personality snapshots versioned on every change - Discord delivery is guaranteed by discord.py - Perception updates are idempotent (can be reapplied without side effects) --- ## Known Challenges and Solutions ### 1. Latency with Local LLM **Challenge**: Waiting 2-3 seconds for response feels slow **Solutions**: - Immediate visual feedback (typing indicator, avatar animation) - Streaming responses (show text as it generates) - Batch requests during quiet hours for fast deployment - GPU acceleration where possible - Model optimization (quantization, pruning) ### 2. Personality Consistency During Evolution **Challenge**: Hex changes as she learns, but must feel like the same person **Solutions**: - Gradual adaptation (personality changes in YAML, not discrete jumps) - Memory-driven consistency (personality adapts to learned facts) - Version control (can rollback if she becomes unrecognizable) - User feedback loop (user can reset or modify personality) - Core values remain constant (tsundere nature, care underneath) ### 3. Memory Scaling as History Grows **Challenge**: Retrieving relevant context from thousands of conversations **Solutions**: - Vector database for semantic search (sub-500ms) - Hierarchical memory (recent → summarized old) - Automatic archival (monthly snapshots, prune oldest) - Importance tagging (weight important conversations higher) - Incremental updates (don't recalculate everything) ### 4. Safe Code Generation and Sandboxing **Challenge**: Hex generates code, but must never break the system **Solutions**: - Static analysis (AST parsing for forbidden operations) - Capability-based progression (limited API at first) - Sandboxed testing before deployment - User approval gate (user reviews all code) - Version control + rollback window (24-hour window) - Whitelist of safe operations (growing list as trust builds) ### 5. Privacy and Local-First Architecture **Challenge**: Maintaining privacy while having useful context **Solutions**: - All ML inference runs locally (no cloud submission) - No external API calls except Discord - Encrypted local storage for memories - User can opt-out of any perception module - Transparent logging (user can audit what's stored) ### 6. Multimodal Synchronization **Challenge**: Webcam, voice, text, screen all need to inform response **Solutions**: - Asynchronous processing (don't wait for all inputs) - Highest-priority input wins (voice > perception > text) - Graceful degradation (works without any modality) - Caching (reuse recent perception for repeated queries) --- ## Scaling Considerations ### Single-User (v1) - Architecture designed for one person + their kids - Local compute, no multi-user concerns - Personality is singular (one Hex) ### Multi-Device (v1.5) - Same personality and memory sync across devices - Discord as primary, desktop app as secondary - Cloud sync optional (local-first default) ### Android Support (v2) - Memory and personality sync to mobile - Lightweight inference on Android (quantized model) - Fallback to cloud inference if needed - Same core architecture, different UIs ### Potential Scaling Patterns ``` Single User (Current) ├─ One Hex instance ├─ All local compute ├─ SQLite + Vector DB Multi-Device Sync (v1.5) ├─ Central SQLite + Vector DB on primary machine ├─ Sync service between devices ├─ Same personality, distributed memory Multi-Companion (Potential v3) ├─ Multiple Hex instances (per family member) ├─ Shared memory system (family history) ├─ Individual personalities ├─ Potential distributed compute (each on own device) ``` ### Performance Bottlenecks to Monitor 1. **LLM Inference**: Becomes slower as context window grows - Solution: Context summarization, hierarchical retrieval 2. **Vector DB Lookups**: Scales with conversation history - Solution: Incremental indexing, approximate search (HNSW) 3. **Perception Processing**: CPU/GPU bound - Solution: Frame skipping, model optimization, dedicated thread 4. **Discord Bot Responsiveness**: Limited by gateway connections - Solution: Sharding (if needed), efficient message queuing --- ## Technology Stack Summary | Component | Technology | Rationale | |-----------|-----------|-----------| | Discord Bot | discord.py | Fast, well-supported, async-native | | LLM Inference | Mistral 7B + ollama/vLLM | Local-first, good quality/speed tradeoff | | Memory (Conversations) | SQLite | Reliable, local, fast queries | | Memory (Semantic) | Chroma or Milvus | Local vector DB, easy to manage | | Embeddings | all-MiniLM-L6-v2 | Fast, good quality, local | | Face Detection | MediaPipe | Accurate, fast, local | | Emotion Recognition | FER2013 or local model | Local, privacy-preserving | | Speech-to-Text | Whisper | Local, accurate, multilingual | | Text-to-Speech | Tacotron 2 + Vocoder | Local, controllable | | Avatar | VRoid SDK + Babylon.js | Standards-based, extensible | | Code Safety | RestrictedPython + ast | Local analysis, sandboxing | | Version Control | Git | Change tracking, rollback | | Desktop UI | Tkinter or PyQt | Lightweight, cross-platform | | Testing | pytest + unittest | Standard Python testing | | Logging | logging + sentry (optional) | Local-first with cloud fallback | --- ## Deployment Architecture ### Local Development ``` Developer Machine ├── Discord Token (env var) ├── Hex codebase (git) ├── Local LLM (ollama) ├── SQLite (file-based) ├── Vector DB (Chroma, embedded) └── Webcam / Screen capture (live) ``` ### Production Deployment ``` Deployed Machine (Windows/WSL) ├── Discord Token (secure storage) ├── Hex codebase (from git) ├── Local LLM service (ollama/vLLM) ├── SQLite (persistent, backed up) ├── Vector DB (persistent, backed up) ├── Desktop app (tray icon) ├── Auto-updater (pulls from git) └── Logging (local + optional cloud) ``` ### Update Strategy - Git pull for code updates - Automatic model updates (LLM weights) - Zero-downtime restart (graceful shutdown) - Rollback capability (version pinning) --- ## Quality Assurance ### Key Metrics to Track **Responsiveness**: - Response latency: Target <3 seconds - Perception update latency: <500ms - Memory lookup latency: <100ms **Reliability**: - Uptime: >99% for core bot - Message delivery: >99.9% - Memory integrity: No data loss on crash **Personality Consistency**: - User perception: "Feels like the same person" - Tone consistency: Personality rules enforced - Learning progress: Measurable improvement in personalization **Safety**: - No crashes from invalid input - No LLM hallucinations about moderation - Safe code generation (0 unauthorized executions) ### Testing Strategy ``` Unit Tests ├─ Memory operations (CRUD) ├─ Perception processing ├─ Code validation ├─ Personality rule application └─ Response filtering Integration Tests ├─ Discord message → LLM → Response ├─ Context assembly pipeline ├─ Avatar expression sync ├─ Self-modification flow └─ Multi-component scenarios End-to-End Tests ├─ Full conversation with personality ├─ Perception-aware responses ├─ Memory learning and retrieval ├─ Code generation and deployment └─ Edge cases (bad input, crashes, recovery) Manual UAT ├─ Conversational feel (does she feel like a person?) ├─ Personality consistency (still Hex?) ├─ Safety compliance (kid-mode works?) ├─ Performance (under load?) └─ All features working together? ``` --- ## Conclusion Hex's architecture prioritizes **personality coherence** and **genuine relationship** over feature breadth. The system is designed as a pipeline from perception → memory → personality → response generation, with feedback loops that allow her to learn and evolve. The modular design enables incremental development (Phase 1-6), with each phase adding capability while maintaining system stability. The self-modification system enables genuine autonomy within safety boundaries, and the local-first approach ensures privacy and independence. **Critical success factors**: 1. LLM latency acceptable (<3s) 2. Personality consistency maintained across updates 3. Memory system scales with history 4. Self-modification is safe and reversible 5. All components feel integrated (not separate features) This architecture serves the core value: **making Hex feel like a person who genuinely cares about you.** --- **Document Version**: 1.0 **Last Updated**: 2026-01-27 **Status**: Ready for Phase 1 Development