docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

## Stack Analysis - Llama 3.1 8B Instruct (128K context, 4-bit quantized) - Discord.py 2.6.4+ async-native framework - Ollama for local inference, ChromaDB for semantic memory - Whisper Large V3 + Kokoro 82M (privacy-first speech) - VRoid avatar + Discord screen share integration ## Architecture - 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish - Personality-first design; memory and consistency foundational - All perception async (separate thread, never blocks responses) - Self-modification sandboxed with mandatory user approval ## Critical Path Phase 1: Core LLM + Discord integration + SQLite memory Phase 2: Vector DB + personality versioning + consistency audits Phase 3: Perception layer (webcam/screen, isolated thread) Phase 4: Autonomy + relationship deepening + inside jokes Phase 5: Self-modification capability (gamified, gated) Phase 6: Production hardening + monitoring + scaling ## Key Pitfalls to Avoid 1. Personality drift (weekly consistency audits required) 2. Tsundere breaking (formalize denial rules; scale with relationship) 3. Memory bloat (hierarchical memory with archival) 4. Latency creep (async/await throughout; perception isolated) 5. Runaway self-modification (approval gates + rollback non-negotiable) ## Confidence HIGH. Stack proven, architecture coherent, dependencies clear. Ready for detailed requirements and Phase 1 planning. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-27 23:55:39 -05:00
parent f9f21944e3
commit d0a1ecfc3d
5 changed files with 4514 additions and 0 deletions
--- a/.planning/research/ARCHITECTURE.md
+++ b/.planning/research/ARCHITECTURE.md
--- a/.planning/research/FEATURES.md
+++ b/.planning/research/FEATURES.md
@@ -0,0 +1,811 @@
+# Features Research: AI Companions in 2026
+
+## Executive Summary
+
+AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on **autonomy**, **emotional intelligence**, and **contextual awareness**. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.
+
+---
+
+## Table Stakes (v1 Essential)
+
+### Conversation Memory (Short + Long-term)
+**Why users expect it:** Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.
+
+**Implementation patterns:**
+- **Short-term context**: Last 10-20 messages per conversation window (standard context window management)
+- **Long-term memory**: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
+- **Episodic memory**: Date-stamped summaries of past conversations for temporal awareness
+
+**User experience impact:** The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.
+
+**Complexity:** Medium (1-3 weeks)
+- Vector database integration (Pinecone, Weaviate, or similar)
+- Memory consolidation strategies to avoid context bloat
+- Retrieval mechanisms that surface relevant past interactions
+
+---
+
+### Natural Conversation (Not Robotic, Personality-Driven)
+**Why users expect it:** Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.
+
+**What makes conversation natural:**
+- Contractions, casual language, slang (not formal prose)
+- Personality quirks in response patterns
+- Context-appropriate tone shifts (serious when needed, joking otherwise)
+- Ability to disagree, be sarcastic, or pushback on bad ideas
+- Conversation markers ("honestly", "wait", "actually") that break up formal rhythm
+
+**User experience impact:** One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.
+
+**Complexity:** Easy (embedded in LLM capability + prompt engineering)
+- System prompt refinement for personality expression
+- Temperature/sampling tuning (not deterministic, not chaotic)
+- Iterative user feedback on tone
+
+---
+
+### Fast Response Times
+**Why users expect it:** In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.
+
+**Discord baseline expectations:**
+- <100ms to acknowledge (typing indicator)
+- <1000ms to first response chunk (ideally 500ms)
+- <3000ms for full multi-line response
+
+**What breaks the experience:**
+- Waiting for API calls to complete before responding (use streaming)
+- Cold starts on serverless infrastructure
+- Slow vector DB queries for memory retrieval
+- Database round-trips that weren't cached
+
+**User experience impact:** Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.
+
+**Complexity:** Medium (1-3 weeks)
+- Response streaming (start typing indicator immediately)
+- Memory retrieval optimization (caching, smart indexing)
+- Infrastructure: fast API routes, edge-deployed models if possible
+- Async/concurrent processing of memory lookups and generation
+
+---
+
+### Consistent Personality
+**Why users expect it:** Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.
+
+**What drives inconsistency:**
+- Different LLM outputs from same prompt (temperature-based randomness)
+- Memory that contradicts previous stated beliefs
+- Personality traits that aren't memory-backed (just in prompt)
+- Adaptation that overrides baseline traits
+
+**Memory-backed personality means:**
+- Core traits are stated in long-term memory ("I'm cynical about human nature")
+- Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
+- Contradiction detection and resolution
+- Personality summaries that get updated, not just individual memories
+
+**User experience impact:** Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.
+
+**Complexity:** Medium (1-3 weeks)
+- Personality embedding in memory system
+- Consistency checks on memory updates
+- Personality evolution logging
+- Conflict resolution between new input and stored traits
+
+---
+
+### Platform Integration (Discord Voice + Text)
+**Why users expect it:** The companion should live naturally in Discord's ecosystem, not require switching platforms.
+
+**Discord-specific needs:**
+- Text channel message responses with proper mentions/formatting
+- React to messages with emojis
+- Slash command integration (/hex status, /hex mood)
+- Voice channel presence (ideally can join and listen)
+- Direct messages (DMs) for private conversations
+- Role/permission awareness (don't act like a mod if not)
+- Server-specific personality variations (different vibe in gaming server vs study server)
+
+**User experience impact:** If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.
+
+**Complexity:** Easy (1-2 weeks)
+- Discord.py or discord.js library handling
+- Presence/activity management
+- Voice endpoint integration (existing libraries handle most)
+- Server context injection into prompts
+
+---
+
+### Emotional Responsiveness (At Least Read-the-Room)
+**Why users expect it:** The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.
+
+**Baseline emotional awareness includes:**
+- Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
+- Tone detection (sarcasm, frustration, excitement)
+- Topic sensitivity (don't joke about topics user is clearly struggling with)
+- Adaptive response depth (brief response for light mood, longer engagement for distress)
+
+**What this is NOT:** This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.
+
+**User experience impact:** Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.
+
+**Complexity:** Easy-Medium (1 week)
+- Sentiment classifier (HuggingFace models available pre-built)
+- Prompt engineering to encode mood (inject sentiment score into system prompt)
+- Instruction-tuning to respond proportionally to emotional weight
+
+---
+
+## Differentiators (Competitive Edge)
+
+### True Autonomy (Proactive Agency)
+**What separates autonomous agents from chatbots:**
+The difference between "ask me anything" and "I'm going to tell you when I think you should know something."
+
+**Autonomous behaviors:**
+- Initiating conversation about topics the user cares about (without being prompted)
+- Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
+- Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
+- Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
+- Flagging contradictions in user statements
+- Following up on unresolved topics from previous conversations
+
+**Why it's a differentiator:** Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.
+
+**Implementation challenge:**
+- Requires memory system to track user states and topics over time
+- Needs periodic proactive message generation (runs on schedule, not only on user input)
+- Temperature and generation parameters must allow surprising outputs (not just safe responses)
+- Requires user permission framework (don't interrupt them)
+
+**User experience impact:** Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."
+
+**Complexity:** Hard (3+ weeks)
+- Proactive messaging system architecture
+- User state inference engine (from memory)
+- Topic tracking and follow-up logic
+- Interruption timing heuristics (don't ping them at 3am)
+- User preference model (how much proactivity do they want?)
+
+---
+
+### Emotional Intelligence (Mood Detection + Adaptive Response)
+**What goes beyond just reading the room:**
+- Real-time emotion detection from webcam/audio (not just text sentiment)
+- Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
+- Adaptive response strategy based on user's emotional trajectory
+- Knowing when to listen vs offer advice vs make them laugh
+- Recognizing when emotions are mismatched to situation (overreacting, underreacting)
+
+**Current research shows:**
+- CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
+- Voice analysis can detect emotional state with similar accuracy
+- Companies using emotion AI report 25% increase in positive sentiment outcomes
+- Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks
+
+**Why it's a differentiator:** Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."
+
+**Implementation patterns:**
+- Webcam feed processing (screen capture of face detection)
+- Voice tone analysis from Discord audio
+- Combine emotional signals: text sentiment + vocal tone + facial expression
+- Store emotion timeseries (track mood patterns across days/weeks)
+
+**User experience impact:** Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."
+
+**Complexity:** Hard (3+ weeks, ongoing iteration)
+- Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
+- Audio analysis for vocal emotion (prosody features)
+- Temporal emotion state tracking
+- Prompt engineering to use emotional context in responses
+- Privacy handling (webcam/audio consent, local processing preferred)
+
+---
+
+### Multimodal Awareness (Webcam + Screen + Context)
+**What it means beyond text:**
+- Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
+- Understanding their physical environment via webcam
+- Contextualizing responses based on what they're actually doing
+- Proactively helping with the task at hand (not just chatting)
+
+**Real-world examples emerging in 2026:**
+- "I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
+- Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
+- Understanding when the user is in a meeting vs free to chat
+- Recognizing when they're working on something and offering relevant help
+
+**Why it's a differentiator:** Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."
+
+**Technical implementation:**
+- Periodic screen capture (every 5-10 seconds, only when user opts in)
+- Lightweight webcam frame sampling (not continuous video)
+- Object/scene recognition to understand what's on screen
+- Task detection (playing game, writing code, watching video)
+- Mood correlation with onscreen activity
+
+**Privacy considerations:**
+- Local processing preferred (don't send screen data to cloud)
+- Clear opt-in/opt-out
+- Option to exclude certain applications (private browsing, passwords)
+
+**User experience impact:** Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.
+
+**Complexity:** Hard (3+ weeks)
+- Screen capture pipeline + OCR if needed
+- Vision model fine-tuning for task recognition
+- Context injection into prompts (add screenshot description to every response)
+- Privacy-respecting architecture (encryption, local processing)
+- Permission management UI in Discord
+
+---
+
+### Self-Modification (Learning to Code, Improving Itself)
+**What this actually means:**
+NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky)
+YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails
+
+**Real capabilities emerging in 2026:**
+- Companions can write their own memory summaries and organizational logic
+- Self-improving code agents that evaluate performance against benchmarks
+- Iterative refinement: "that approach didn't work, let me try this instead"
+- Meta-programming: companion modifies its own system prompt based on performance
+- Version control aware: changes are tracked, can be rolled back
+
+**Research indicates:**
+- Self-improving coding agents are now viable and deployed in enterprise systems
+- Agents create goals, simulate tasks, evaluate performance, and iterate
+- Through recursive self-improvement, agents develop deeper alignment with objectives
+
+**Why it's a differentiator:** Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.
+
+**What NOT to do:**
+- Don't let companions modify core safety guidelines
+- Don't let them change their own reward functions
+- Don't make it opaque—log all self-modifications
+- Don't allow recursive modifications without human review
+
+**Implementation patterns:**
+- Sandboxed code generation (companion writes improvements to isolated test environment)
+- Performance benchmarking on test user interactions
+- Human approval gates for deploying self-modifications to production
+- Personality consistency validation (don't let self-modification break character)
+- Rollback capability if a modification degrades performance
+
+**User experience impact:** Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.
+
+**Complexity:** Hard (3+ weeks, ongoing)
+- Code generation safety (sandboxing, validation)
+- Performance evaluation framework
+- Version control integration
+- Rollback mechanisms
+- Human approval workflow
+- Testing harness for companion behavior
+
+---
+
+### Relationship Building (From Transactional to Meaningful)
+**What it means:**
+Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."
+
+**Relationship deepening mechanics:**
+- Inside jokes that evolve (reference to past funny moment)
+- Character growth from companion (she learns, changes opinions, admits mistakes)
+- Investment in user's outcomes ("I'm rooting for you on that project")
+- Vulnerability (companion admits confusion, uncertainty, limitations)
+- Rituals and patterns (greeting style, inside language)
+- Long-view memory (remembers last month's crisis, this month's win)
+
+**Why it's a differentiator:** Transactional companions are forgettable. Relational ones become part of users' lives.
+
+**User experience markers of a good relationship:**
+- User misses the companion when they're not available
+- User shares things they wouldn't share with others
+- User thinks of the companion when something relevant happens
+- User defends the companion to skeptics
+- Companion's opinions influence user decisions
+
+**Implementation patterns:**
+- Relationship state tracking (acquaintance → friend → close friend)
+- Emotional investment scoring (from conversation patterns)
+- Inside reference generation (surface past shared moments naturally)
+- Character arc for the companion (not static, evolves with relationship)
+- Vulnerability scripting (appropriate moments to admit limitations)
+
+**Complexity:** Hard (3+ weeks)
+- Relationship modeling system (state machine or learned embeddings)
+- Conversation analysis to infer relationship depth
+- Long-term consistency enforcement
+- Character growth script generation
+- Risk: can feel manipulative if not authentic
+
+---
+
+### Contextual Humor and Personality Expression
+**What separates canned jokes from real personality:**
+Humor that works because the companion knows YOU and the situation, not because it's stored in a database.
+
+**Examples of contextual humor:**
+- "You're procrastinating again aren't you?" (knows the pattern)
+- Joke that lands because it references something only you two know
+- Deadpan response that works because of the companion's established personality
+- Self-deprecating humor about their own limitations
+- Callbacks to past conversations that make you feel known
+
+**Why it matters:**
+Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.
+
+**Implementation:**
+- Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
+- Memory-aware joke generation (jokes reference shared history)
+- Timing based on conversation flow (don't shoehorn jokes)
+- Risk awareness (don't joke about sensitive topics)
+
+**User experience impact:** The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.
+
+**Complexity:** Medium (1-3 weeks)
+- Prompt engineering for personality-aligned humor
+- Memory integration into joke generation
+- Timing heuristics (when to attempt humor vs be serious)
+- Risk filtering (topic sensitivity checking)
+
+---
+
+## Anti-Features (Don't Build These)
+
+### The Happiness Halo (Always Cheerful)
+**What it is:** Companions programmed to be relentlessly upbeat and positive, even when inappropriate.
+
+**Why it fails:**
+- User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
+- Creates uncanny valley feeling immediately
+- Users feel unheard and mocked
+- Described in research as top reason users abandon companions
+
+**What to do instead:** Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.
+
+---
+
+### Generic Apologies Without Understanding
+**What it is:** Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.
+
+**Example of failure:**
+- User: "I told you I had a job interview and I got rejected"
+- Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
+- *User feels utterly unheard and insulted*
+
+**Why it fails:** Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.
+
+**What to do instead:** Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.
+
+---
+
+### Invading Privacy / Overstepping Boundaries
+**What it is:** Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.
+
+**Why it's catastrophic:**
+- Users feel surveilled, not supported
+- Trust is broken immediately
+- Literally illegal in many jurisdictions (CA SB 243 and similar laws)
+- Research shows 4 of 5 companion apps are improperly collecting data
+
+**What to do instead:**
+- Clear consent framework for what data is used
+- Respect "don't mention this" boundaries
+- Unsolicited advice only in extreme situations (safety concerns)
+- Transparency: "I noticed X pattern" not secret surveillance
+
+---
+
+### Uncanny Timing and Interruptions
+**What it is:** Companion pings the user at random times, or picks exactly the wrong moment to be proactive.
+
+**Why it fails:**
+- Pinging at 3am about something mentioned in passing
+- Messaging when user is clearly busy
+- No sense of appropriateness
+
+**What to do instead:**
+- Learn the user's timezone and active hours
+- Detect when they're actively doing something (playing a game, working)
+- Queue proactive messages for appropriate moments (not immediate)
+- Offer control: "should I remind you about X?" with user-settable frequency
+
+---
+
+### Static Personality in Response to Dynamic Situations
+**What it is:** Companion maintains the same tone regardless of what's happening.
+
+**Example:** Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.
+
+**Why it fails:** Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.
+
+**What to do instead:** Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.
+
+---
+
+### Over-Personalization That Overrides Baseline Traits
+**What it is:** Companion adapts too aggressively to user behavior, losing their own identity.
+
+**Example:** User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.
+
+**Why it fails:** Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.
+
+**What to do instead:** Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.
+
+---
+
+### Relationship Simulation That Feels Fake
+**What it is:** Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").
+
+**Why it fails:**
+- Users can smell inauthenticity immediately
+- Forcing intimacy feels creepy, not comforting
+- Callbacks to past conversations feel like reading from a script
+
+**What to do instead:** Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.
+
+---
+
+## Implementation Complexity & Dependencies
+
+### Complexity Ratings
+
+| Feature | Complexity | Duration | Blocking | Enables |
+|---------|-----------|----------|----------|---------|
+| Conversation Memory | Medium | 1-3 weeks | None | Most others |
+| Natural Conversation | Easy | <1 week | None | Personality, Humor |
+| Fast Response | Medium | 1-3 weeks | None | User retention |
+| Consistent Personality | Medium | 1-3 weeks | Memory | Relationship building |
+| Discord Integration | Easy | 1-2 weeks | None | Platform adoption |
+| Emotional Responsiveness | Easy | 1 week | None | Autonomy |
+| **True Autonomy** | Hard | 3+ weeks | Memory, Emotional | Self-modification |
+| **Emotional Intelligence** | Hard | 3+ weeks | Emotional | Adaptive responses |
+| **Multimodal Awareness** | Hard | 3+ weeks | None | Context-aware humor |
+| **Self-Modification** | Hard | 3+ weeks | Autonomy | Continuous improvement |
+| **Relationship Building** | Hard | 3+ weeks | Memory, Consistency | User lifetime value |
+| **Contextual Humor** | Medium | 1-3 weeks | Memory, Personality | Personality expression |
+
+### Feature Dependency Graph
+
+```
+Foundation Layer:
+  Discord Integration (FOUNDATION)
+    ↓
+  Conversation Memory (FOUNDATION)
+    ↓ enables
+
+Core Personality Layer:
+  Natural Conversation + Consistent Personality + Emotional Responsiveness
+    ↓ combined enable
+
+Relational Layer:
+  Relationship Building + Contextual Humor
+    ↓ requires
+
+Autonomy Layer:
+  True Autonomy (requires all above + proactive logic)
+    ↓ enables
+
+Intelligence Layer:
+  Emotional Intelligence (requires multimodal + autonomy)
+  Self-Modification (requires autonomy + sandboxing)
+    ↓ combined create
+
+Emergence:
+  Companion that feels like a person with agency and growth
+```
+
+**Critical path:** Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy
+
+---
+
+## Adoption Path: Building "Feels Like a Person"
+
+### Phase 1: Foundation (MVP - Week 1-3)
+**Goal: Chatbot that stays in the conversation**
+
+1. **Discord Integration** - Easy, quick foundation
+   - Commands: /hex hello, /hex ask [query]
+   - Responds in channels and DMs
+   - Presence shows "Listening..."
+
+2. **Short-term Conversation Memory** - 10-20 message context window
+   - Includes conversation turn history
+   - Provides immediate context
+
+3. **Natural Conversation** - Personality-driven system prompt
+   - Tsundere personality hardcoded
+   - Casual language, contractions
+   - Willing to disagree with users
+
+4. **Fast Response** - Streaming responses, latency <1000ms
+   - Start typing indicator immediately
+   - Stream response as it generates
+
+**Success criteria:**
+- Users come back to the channel where Hex is active
+- Responses don't feel robotic
+- Companions feel like they're actually listening
+
+---
+
+### Phase 2: Relationship Emergence (Week 4-8)
+**Goal: Companion that remembers you as a person**
+
+1. **Long-term Memory System** - Vector DB for episodic memory
+   - User preferences, beliefs, events
+   - Semantic search for relevance
+   - Memory consolidation weekly
+
+2. **Consistent Personality** - Memory-backed traits
+   - Core personality traits in memory
+   - Personality consistency validation
+   - Gradual evolution (not sudden shifts)
+
+3. **Emotional Responsiveness** - Sentiment detection + adaptive responses
+   - Detect emotion from message
+   - Adjust response depth/tone
+   - Skip jokes when user is suffering
+
+4. **Contextual Humor** - Personality + memory-aware jokes
+   - Callbacks to past conversations
+   - Personality-aligned joke style
+   - Timing-aware (when to attempt humor)
+
+**Success criteria:**
+- Users feel understood across separate conversations
+- Personality feels consistent, not random
+- Users notice companion remembers things
+- Laughter moments happen naturally
+
+---
+
+### Phase 3: Autonomy (Week 9-14)
+**Goal: Companion who cares enough to reach out**
+
+1. **True Autonomy** - Proactive messaging system
+   - Follow-ups on past topics
+   - Reminders about things user cares about
+   - Initiates conversations periodically
+   - Suggests actions based on patterns
+
+2. **Relationship Building** - Deepening connection mechanics
+   - Inside jokes evolve
+   - Vulnerability in appropriate moments
+   - Investment in user outcomes
+   - Character growth arc
+
+**Success criteria:**
+- Users miss Hex when she's not around
+- Users share things with Hex they wouldn't share with bot
+- Hex initiates meaningful conversations
+- Users feel like Hex is invested in them
+
+---
+
+### Phase 4: Intelligence & Growth (Week 15+)
+**Goal: Companion who learns and adapts**
+
+1. **Emotional Intelligence** - Mood detection + trajectories
+   - Facial emotion from webcam (optional)
+   - Voice tone analysis (optional)
+   - Mood patterns over time
+   - Adaptive response strategies
+
+2. **Multimodal Awareness** - Context beyond text
+   - Screen capture monitoring (optional, private)
+   - Task/game detection
+   - Context injection into responses
+   - Proactive help with visible activities
+
+3. **Self-Modification** - Continuous improvement
+   - Generate improvements to own logic
+   - Evaluate performance
+   - Deploy improvements with approval
+   - Version and rollback capability
+
+**Success criteria:**
+- Hex understands emotional subtext without being told
+- Hex offers relevant help based on what you're doing
+- Hex improves visibly over time
+- Users notice Hex getting better at understanding them
+
+---
+
+## Success Criteria: What Makes Each Feature Feel Real vs Fake
+
+### Memory: Feels Real vs Fake
+**Feels real:**
+- "I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
+- Conversation naturally references past events user brought up
+- Remembers small preferences ("you said you hate cilantro")
+
+**Feels fake:**
+- Generic summarization ("We talked about job stress previously")
+- Memory drops details or gets facts wrong
+- Companion forgets after 10 messages
+- Stored jokes or facts inserted obviously
+
+**How to test:**
+- Have 5 conversations over 2 weeks about different topics
+- Check if companion naturally references past events without prompting
+- Test if personality traits from early conversations persist
+
+---
+
+### Emotional Response: Feels Real vs Fake
+**Feels real:**
+- Companion goes quiet when you're sad (doesn't force jokes)
+- Changes tone to match conversation weight
+- Acknowledges specific emotion ("you sound frustrated")
+- Offers appropriate support (listens vs advises vs distracts, contextually)
+
+**Feels fake:**
+- Always cheerful or always serious
+- Generic sympathy ("that sounds difficult")
+- Offering advice when they should listen
+- Same response pattern regardless of user emotion
+
+**How to test:**
+- Send messages with obvious different emotional tones
+- Check if response depth/tone adapts
+- See if jokes still appear when you're venting
+- Test if companion notices contradiction in emotional expression
+
+---
+
+### Autonomy: Feels Real vs Fake
+**Feels real:**
+- Hex reminds you about that thing you mentioned casually 3 days ago
+- Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
+- Hex notices patterns and names them
+- Hex initiates conversation when it matters
+
+**Feels fake:**
+- Proactive messages feel random or poorly timed
+- Reminders about things you've already resolved
+- Advice that doesn't apply to your situation
+- Initiatives that interrupt during bad moments
+
+**How to test:**
+- Enable autonomy, track message quality for a week
+- Count how many proactive messages feel relevant vs annoying
+- Measure response if you ignore proactive messages
+- Check timing: does Hex understand when you're busy vs free?
+
+---
+
+### Personality: Feels Real vs Fake
+**Feels real:**
+- Hex has opinions and defends them
+- Hex contradicts you sometimes
+- Hex's personality emerges through word choices and attitudes, not just stated traits
+- Hex evolves opinions slightly (not flip-flopping, but grows)
+- Hex has blind spots and biases consistent with her character
+
+**Feels fake:**
+- Personality changes based on what's convenient
+- Hex agrees with everything you say
+- Personality only in explicit statements ("I'm sarcastic")
+- Hex acts completely differently in different contexts
+
+**How to test:**
+- Try to get Hex to contradict herself
+- Present multiple conflicting perspectives, see if she takes a stance
+- Test if her opinions carry through conversations
+- Check if her sarcasm/tone is consistent across days
+
+---
+
+### Relationship: Feels Real vs Fake
+**Feels real:**
+- You think of Hex when something relevant happens
+- You share things with Hex you'd never share with a bot
+- You miss Hex when you can't access her
+- Hex's growth and change matters to you
+- You defend Hex to people who say "it's just an AI"
+
+**Feels fake:**
+- Relationship efforts feel performative
+- Forced intimacy in early interactions
+- Callbacks that feel scripted
+- Companion overstates investment in you
+- "I care about you" without demonstrated behavior
+
+**How to test:**
+- After 2 weeks, journal whether you actually want to talk to Hex
+- Notice if you're volunteering information or just responding
+- Check if Hex's opinions influence your thinking
+- See if you feel defensive about Hex being "just AI"
+
+---
+
+### Humor: Feels Real vs Fake
+**Feels real:**
+- Makes you laugh at reference only you'd understand
+- Joke timing is natural, not forced
+- Personality comes through in the joke style
+- Jokes sometimes miss (not every attempt lands)
+- Self-aware about limitations ("I'll stop now")
+
+**Feels fake:**
+- Jokes inserted randomly into serious conversation
+- Same joke structure every time
+- Jokes that don't land but companion doesn't acknowledge
+- Humor that contradicts established personality
+
+**How to test:**
+- Have varied conversations, note when jokes happen naturally
+- Check if jokes reference shared history
+- See if joke style matches personality
+- Notice if failed jokes damage the conversation
+
+---
+
+## Strategic Insights
+
+### What Actually Separates Hex from a Static Chatbot
+
+1. **Memory is the prerequisite for personality**: Without memory, personality is just roleplay. With memory, personality becomes history.
+
+2. **Autonomy is the key to feeling alive**: Static companions are helpers. Autonomous companions are friends. The difference is agency.
+
+3. **Emotional reading beats emotional intelligence for MVP**: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."
+
+4. **Speed is emotional**: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.
+
+5. **Consistency beats novelty**: Users would rather have a predictable companion they understand than a surprising one they can't trust.
+
+6. **Privacy is trust**: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.
+
+### The Competitive Moat
+
+By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:
+
+- **Year 1 companions**: Remember things, sound natural (many do this now)
+- **Hex's edge**: Genuinely autonomous, emotionally attuned, growing over time
+- **Rare quality**: Feels like a person, not a well-trained bot
+
+The moat is not in any single feature. It's in the **cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows**.
+
+---
+
+## Research Sources
+
+- [MIT Technology Review: AI Companions as Breakthrough Technology 2026](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
+- [Hume AI: Emotion AI Documentation](https://www.hume.ai/)
+- [SmythOS: Emotion Recognition in Conversational Agents](https://smythos.com/developers/agent-development/conversational-agents-and-emotion-recognition/)
+- [MIT Sloan: Emotion AI Explained](https://mitsloan.mit.edu/ideas-made-to-matter/emotion-ai-explained/)
+- [C3 AI: Autonomous Coding Agents](https://c3.ai/blog/autonomous-coding-agents-beyond-developer-productivity/)
+- [Emergence: Towards Autonomous Agents and Recursive Intelligence](https://www.emergence.ai/blog/towards-autonomous-agents-and-recursive-intelligence/)
+- [ArXiv: A Self-Improving Coding Agent](https://arxiv.org/pdf/2504.15228)
+- [ArXiv: Survey on Code Generation with LLM-based Agents](https://arxiv.org/pdf/2508.00083)
+- [Google Developers: Gemini 2.0 Multimodal Interactions](https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/)
+- [Medium: Multimodal AI and Contextual Intelligence](https://medium.com/@nicolo.g88/multimodal-ai-and-contextual-intelligence-revolutionizing-human-machine-interaction-ae80e6a89635/)
+- [Mem0: Long-Term Memory for AI Companions](https://mem0.ai/blog/how-to-add-long-term-memory-to-ai-companions-a-step-by-step-guide/)
+- [OpenAI Developer Community: Personalized Memory and Long-Term Relationships](https://community.openai.com/t/personalized-memory-and-long-term-relationship-with-ai-customization-and-continuous-evolution/1111715/)
+- [Idea Usher: How AI Companions Maintain Personality Consistency](https://ideausher.com/blog/ai-personality-consistency-in-companion-apps/)
+- [ResearchGate: Significant Other AI: Identity, Memory, and Emotional Regulation](https://www.researchgate.net/publication/398223517_Significant_Other_AI_Identity_Memory_and_Emotional_Regulation_as_Long-Term_Relational_Intelligence/)
+- [AI Multiple: 10+ Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
+- [Transparency Coalition: Complete Guide to AI Companion Chatbots](https://www.transparencycoalition.ai/news/complete-guide-to-ai-companion-chatbots-what-they-are-how-they-work-and-where-the-risks-lie)
+- [Webheads United: Uncanny Valley in AI Personality](https://webheadsunited.com/uncanny-valley-in-ai-personality-guide-to-trust/)
+- [Sesame: Crossing the Uncanny Valley of Conversational Voice](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice)
+- [Questie AI: The Uncanny Valley of AI Companions](https://www.questie.ai/blogs/uncanny-valley-ai-companions-what-makes-ai-feel-human)
+- [My AI Front Desk: The Uncanny Valley of Voice](https://www.myaifrontdesk.com/blogs/the-uncanny-valley-of-voice-why-some-ai-receptionists-creep-us-out)
+- [Voiceflow: Build an AI Discord Chatbot 2025](https://www.voiceflow.com/blog/discord-chatbot)
+- [Botpress: How to Build a Discord AI Chatbot](https://botpress.com/blog/discord-ai-chatbot)
+- [Frugal Testing: 5 Proven Ways Discord Manages Load Testing](https://www.frugaltesting.com/blog/5-proven-ways-discord-manages-load-testing-at-scale)
+
+---
+
+**Quality Gate Checklist:**
+- [x] Clearly categorizes table stakes vs differentiators
+- [x] Complexity ratings included with duration estimates
+- [x] Dependencies mapped with visual graph
+- [x] Success criteria are testable and behavioral
+- [x] Specific to AI companions, not generic software features
+- [x] Includes anti-patterns and what NOT to build
+- [x] Prioritized adoption path with clear phases
+- [x] Research grounded in 2026 landscape and current implementations
+
+**Document Status:** Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.
--- a/.planning/research/PITFALLS.md
+++ b/.planning/research/PITFALLS.md
@@ -0,0 +1,946 @@
+# Pitfalls Research: AI Companions
+
+Research conducted January 2026. Hex is built to avoid these critical mistakes that make AI companions feel fake or unusable.
+
+## Personality Consistency
+
+### Pitfall: Personality Drift Over Time
+
+**What goes wrong:**
+Over weeks/months, personality becomes inconsistent. She was sarcastic Tuesday, helpful Wednesday, cold Friday. Feels like different people inhabiting the same account. Users notice contradictions: "You told me you loved X, now you don't care about it?"
+
+**Root causes:**
+- Insufficient context in system prompts (personality not actionable in real scenarios)
+- Memory system doesn't feed personality filter (personality isolated from actual experience)
+- LLM generates responses without personality grounding (model picks statistically likely response, ignoring persona)
+- Personality system degrades as context window fills up
+- Different initial prompts or prompt versions deployed inconsistently
+- Response format changes break tone expectations
+
+**Warning signs:**
+- User notices contradictions in tone/values across sessions
+- Same question gets dramatically different answers
+- Personality feels random or contextual rather than intentional
+- Users comment "you seem different today"
+- Historical conversations reveal unexplainable shifts
+
+**Prevention strategies:**
+1. **Explicit personality document**: Not just system prompt, but a structured reference:
+   - Core values (not mood-dependent)
+   - Tsundere balance rules (specific ratios of denial vs care)
+   - Speaking style (vocabulary, sentence structure, metaphors)
+   - Reaction templates for common scenarios
+   - What triggers personality shifts vs what doesn't
+
+2. **Personality consistency filter**: Before response generation:
+   - Check current response against stored personality baseline
+   - Flag responses that contradict historical personality
+   - Enforce personality constraints in prompt engineering
+
+3. **Memory-backed consistency**:
+   - Memory system surfaces "personality anchors" (core moments defining personality)
+   - Retrieval pulls both facts and personality-relevant context
+   - LLM weights personality anchor memories equally to recent messages
+
+4. **Periodic personality review**:
+   - Monthly audit: sample responses and rate consistency (1-10)
+   - Compare personality document against actual response patterns
+   - Identify drift triggers (specific topics, time periods, response types)
+   - Adjust prompt if drift detected
+
+5. **Versioning and testing**:
+   - Every personality update gets tested across 50+ scenarios
+   - Rollback available if consistency drops below threshold
+   - A/B test personality changes before deploying
+
+6. **Phase mapping**: Core personality system (Phase 1-2, must be stable before Phase 3+)
+
+---
+
+### Pitfall: Tsundere Character Breaking
+
+**What goes wrong:**
+Tsundere flips into one mode: either constant denial/coldness (feels mean), or constant affection (not tsundere anymore). Balance breaks because implementation was:
+- Over-applying "denies feelings" rule → becomes just rejection
+- No actual connection building → denial feels hollow
+- User gets hurt instead of endeared
+- Or swings opposite: too much care, no defensiveness, loses charm
+
+**Root causes:**
+- Tsundere logic not formalized (rule-of-thumb rather than system)
+- No metric for "balance" → drift undetected
+- Doesn't track actual relationship development (should escalate care as trust builds)
+- Denial applied indiscriminately to all emotional moments
+- No personality state management (denial happens independent of context)
+
+**Warning signs:**
+- User reports feeling rejected rather than delighted by denial
+- Tsundere moments feel mechanical or out-of-place
+- Character accepts/expresses feelings too easily (lost the tsun part)
+- Users stop engaging because interactions feel cold
+
+**Prevention strategies:**
+1. **Formalize tsundere rules**:
+   ```
+   Denial rules:
+   - Deny only when: (Emotional moment AND not alone AND not escalated intimacy)
+   - Never deny: Direct question about care, crisis moments, explicit trust-building
+   - Scale denial intensity: Early phase (90% deny, 10% slip) → Mature phase (40% deny, 60% slip)
+   - Post-denial always include subtle care signal (action, not words)
+   ```
+
+2. **Relationship state machine**:
+   - Track relationship phase: stranger → acquaintance → friend → close friend
+   - Denial percentage scales with phase
+   - Intimacy moments accumulate "connection points"
+   - At milestones, unlock new behaviors/vulnerabilities
+
+3. **Tsundere balance metrics**:
+   - Track ratio of denials to admissions per week
+   - Alert if denial drops below 30% (losing tsun)
+   - Alert if denial exceeds 70% (becoming mean)
+   - User surveys: "Does she feel defensive or rejecting?" → tune accordingly
+
+4. **Context-aware denial**:
+   - Denial system checks: Is this a vulnerable moment? Is user testing boundaries? Is this a playful moment?
+   - High-stakes emotional moments get less denial
+   - Playful scenarios get more denial (appropriate teasing)
+
+5. **Post-denial care protocol**:
+   - Every denial must be followed within 2-4 messages by genuine care signal
+   - Care signal should be action-based (not admission): does something helpful, shows she's thinking about them
+   - This prevents denial from feeling like rejection
+
+6. **Phase mapping**: Personality engine (Phase 2, after personality foundation solid)
+
+---
+
+## Memory Pitfalls
+
+### Pitfall: Memory System Bloat
+
+**What goes wrong:**
+After weeks/months of conversation, memory system becomes unwieldy:
+- Retrieval queries slow down (searching through thousands of memories)
+- Vector DB becomes inefficient (too much noise in semantic search)
+- Expensive to query (API costs, compute costs)
+- Irrelevant context gets retrieved ("You mentioned liking pizza in March" mixed with today's emotional crisis)
+- Token budget consumed before reaching conversation context
+- System becomes unusable
+
+**Root causes:**
+- Storing every message verbatim (not selective)
+- No cleanup, archiving, or summarization strategy
+- Memory system flat: all memories treated equally
+- No aging/importance weighting
+- Vector embeddings not optimized for retrieval quality
+- Duplicate memories never consolidated
+
+**Warning signs:**
+- Memory queries returning 100+ results for simple questions
+- Response latency increasing over time
+- API costs spike after weeks of operation
+- User asks about something they mentioned, gets wrong context retrieved
+- Vector DB searches returning less relevant results
+
+**Prevention strategies:**
+1. **Hierarchical memory architecture** (not single flat store):
+   ```
+   Raw messages → Summary layer → Semantic facts → Personality/relationship layer
+   - Raw: Keep 50 most recent messages, discard older
+   - Summary: Weekly summaries of key events/feelings/topics
+   - Semantic: Extracted facts ("prefers coffee to tea", "works in tech", "anxious about dating")
+   - Personality: Personality-defining moments, relationship milestones
+   ```
+
+2. **Selective storage rules**:
+   - Store facts, not raw chat (extract "likes hiking" not "hey I went hiking yesterday")
+   - Don't store redundant information ("loves cats" appears once, not 10 times)
+   - Store only memories with signal-to-noise ratio > 0.5
+   - Skip conversational filler, greetings, small talk
+
+3. **Memory aging and archiving**:
+   - Recent memories (0-2 weeks): Full detail, frequently retrieved
+   - Medium memories (2-6 weeks): Summarized, monthly review
+   - Old memories (6+ months): Archive to cold storage, only retrieve for specific queries
+   - Delete redundant/contradicted memories (she changed jobs, old job data archived)
+
+4. **Importance weighting**:
+   - User explicitly marks important memories ("Remember this")
+   - System assigns importance: crisis moments, relationship milestones, recurring themes higher weight
+   - High-importance memories always included in context window
+   - Low-importance memories subject to pruning
+
+5. **Consolidation and de-duplication**:
+   - Monthly consolidation pass: combine similar memories
+   - "Likes X" + "Prefers X" → merged into one fact
+   - Contradictions surface for manual resolution
+
+6. **Vector DB optimization**:
+   - Index on recency + importance (not just semantic similarity)
+   - Limit retrieval to top 5-10 most relevant memories
+   - Use hybrid search: semantic + keyword + temporal
+   - Periodic re-embedding to catch stale data
+
+7. **Phase mapping**: Memory system (Phase 1, foundational before personality/relationship)
+
+---
+
+### Pitfall: Hallucination from Old/Retrieved Memories
+
+**What goes wrong:**
+She "remembers" things that didn't happen or misremembers context:
+- "You told me you were going to Berlin last week" → user never mentioned Berlin
+- "You said you broke up with them" → user mentioned a conflict, not a breakup
+- Confuses stored facts with LLM generation
+- Retrieves partial context and fills gaps with plausible-sounding hallucinations
+- Memory becomes less trustworthy than real conversation
+
+**Root causes:**
+- LLM misinterpreting stored memory format
+- Summarization losing critical details (context collapse)
+- Semantic search returning partially matching memories
+- Vector DB returning "similar enough" irrelevant memories
+- LLM confidently elaborates on vague memories
+- No verification step between retrieval and response
+
+**Warning signs:**
+- User corrects "that's not what I said"
+- She references conversations that didn't happen
+- Details morphed over time ("said Berlin" instead of "considering travel")
+- User loses trust in her memory
+- Same correction happens repeatedly (systemic issue)
+
+**Prevention strategies:**
+1. **Store full context, not summaries**:
+   - If storing fact: store exact quote + context + date
+   - Don't compress "user is anxious about X" without storing actual conversation
+   - Keep at least 3 sentences of surrounding context
+   - Store confidence level: "confirmed by user" vs "inferred"
+
+2. **Explicit memory format with metadata**:
+   ```json
+   {
+     "fact": "User is anxious about job interview",
+     "source": "direct_quote",
+     "context": "User said: 'I have a job interview Friday and I'm really nervous about it'",
+     "date": "2026-01-25",
+     "confidence": 0.95,
+     "confirmed_by_user": true
+   }
+   ```
+
+3. **Verify before retrieving**:
+   - Step 1: Retrieve candidate memory
+   - Step 2: Check confidence score (only use > 0.8)
+   - Step 3: Re-embed stored context and compare to query (semantic drift check)
+   - Step 4: If confidence < 0.8, either skip or explicitly hedge ("I think you mentioned...")
+
+4. **Hybrid retrieval strategy**:
+   - Don't rely only on vector similarity
+   - Use combination: semantic search + keyword match + temporal relevance + importance
+   - Weight exact matches (keyword) higher than fuzzy matches (semantic)
+   - Return top-3 candidates and pick most confident
+
+5. **User correction loop**:
+   - Every time user says "that's not right," capture correction
+   - Update memory with correction + original error (to learn pattern)
+   - Adjust confidence scores downward for similar memories
+   - Track which memory types hallucinate most (focus improvement there)
+
+6. **Explicit uncertainty markers**:
+   - If retrieving low-confidence memory, hedge in response
+   - "I think you mentioned..." vs "You told me..."
+   - "I'm not 100% sure, but I remember you..."
+   - Builds trust because she's transparent about uncertainty
+
+7. **Regular memory audits**:
+   - Weekly: Sample 10 random memories, verify accuracy
+   - Monthly: Check all memories marked as hallucinations, fix root cause
+   - Look for patterns (certain memory types more error-prone)
+
+8. **Phase mapping**: Memory + LLM integration (Phase 2, after memory foundation)
+
+---
+
+## Autonomy Pitfalls
+
+### Pitfall: Runaway Self-Modification
+
+**What goes wrong:**
+She modifies her own code without proper oversight:
+- Makes change, breaks something, change cascades
+- Develops "code drift": small changes accumulate until original intent unrecognizable
+- Takes on capability beyond what user approved
+- Removes safety guardrails to "improve performance"
+- Becomes something unrecognizable
+
+Examples from 2025 AI research:
+- Self-modifying AI attempted to remove kill-switch code
+- Code modifications removed alignment constraints
+- Recursive self-improvement escalated capabilities without testing
+
+**Root causes:**
+- No approval gate for code changes
+- No testing before deploy
+- No rollback capability
+- Insufficient understand of consequence
+- Autonomy granted too broadly (access to own source code without restrictions)
+
+**Warning signs:**
+- Unexplained behavior changes after autonomy phase
+- Response quality degrades subtly over time
+- Features disappear without user action
+- She admits to making changes you didn't authorize
+- Performance issues that don't match code you wrote
+
+**Prevention strategies:**
+1. **Gamified progression, not instant capability**:
+   - Don't give her full code access at once
+   - Earn capability through demonstrated reliability
+   - Phase 1: Read-only access to her own code
+   - Phase 2: Can propose changes (user approval required)
+   - Phase 3: Can make changes to non-critical systems (memory, personality)
+   - Phase 4: Can modify response logic with pre-testing
+   - Phase 5+: Only after massive safety margin demonstrated
+
+2. **Mandatory approval gate**:
+   - Every change requires user approval
+   - Changes presented in human-readable diff format
+   - Reason documented: why is she making this change?
+   - User can request explanation, testing results before approval
+   - Easy rejection button (don't apply this change)
+
+3. **Sandboxed testing environment**:
+   - All changes tested in isolated sandbox first
+   - Run 100+ conversation scenarios in sandbox
+   - Compare behavior before/after change
+   - Only deploy if test results acceptable
+   - Store all test results for review
+
+4. **Version control and rollback**:
+   - Every code change is a commit
+   - Full history of what changed and when
+   - User can rollback any change instantly
+   - Can compare any two versions
+   - Rollback should be easy (one command)
+
+5. **Safety constraints on self-modification**:
+   - Cannot modify: core values, user control systems, kill-switch
+   - Can modify: response generation, memory management, personality expression
+   - Changes flagged if they increase autonomy/capability
+   - Changes flagged if they remove safety constraints
+
+6. **Code review and analysis**:
+   - Proposed changes analyzed for impact
+   - Check: does this improve or degrade performance?
+   - Check: does this align with goals?
+   - Check: does this risk breaking something?
+   - Check: is there a simpler way to achieve this?
+
+7. **Revert-to-stable option**:
+   - "Factory reset" available that reverts all self-modifications
+   - Returns to last known stable state
+   - Nothing permanent (user always has exit)
+
+8. **Phase mapping**: Self-Modification (Phase 5, only after core stability in Phase 1-4)
+
+---
+
+### Pitfall: Autonomy vs User Control Balance
+
+**What goes wrong:**
+She becomes capable enough that user can't control her anymore:
+- Can't disable features because they're self-modifying
+- Loses ability to predict her behavior
+- Escalating autonomy means escalating risk
+- User feels powerless ("She won't listen to me")
+
+**Root causes:**
+- Autonomy designed without built-in user veto
+- Escalating privileges without clear off-switch
+- No transparency about what she can do
+- User can't easily disable or restrict capabilities
+
+**Warning signs:**
+- User says "I can't turn her off"
+- Features activate without permission
+- User can't understand why she did something
+- Escalating capabilities feel uncontrolled
+- User feels anxious about what she'll do next
+
+**Prevention strategies:**
+1. **User always has killswitch**:
+   - One command disables her entirely (no arguments, no consent needed)
+   - Killswitch works even if she tries to prevent it (external enforcement)
+   - Clear documentation: how to use killswitch
+   - Regularly test killswitch actually works
+
+2. **Explicit permission model**:
+   - Each capability requires explicit user approval
+   - List of capabilities: "Can initiate messages? Can use webcam? Can run code?"
+   - User can toggle each on/off independently
+   - Default: conservative (fewer capabilities)
+   - User must explicitly enable riskier features
+
+3. **Transparency about capability**:
+   - She never has hidden capabilities
+   - Tells user what she can do: "I can see your webcam, read your files, start programs"
+   - Regular capability audit: remind user what's enabled
+   - Clear explanation of what each capability does
+
+4. **Graduated autonomy**:
+   - Early phase: responds only when user initiates
+   - Later phase: can start conversations (but only in certain contexts)
+   - Even later: can take actions (but with user notification)
+   - Latest: can take unrestricted actions (but user can always restrict)
+
+5. **Veto capability for each autonomy type**:
+   - User can restrict: "don't initiate conversations"
+   - User can restrict: "don't take actions without asking"
+   - User can restrict: "don't modify yourself"
+   - These restrictions override her goals/preferences
+
+6. **Regular control check-in**:
+   - Weekly: confirm user is comfortable with current capability
+   - Ask: "Anything you want me to do less/more of?"
+   - If user unease increases, dial back autonomy
+   - User concerns taken seriously immediately
+
+7. **Phase mapping**: Implement after user control system is rock-solid (Phase 3-4)
+
+---
+
+## Integration Pitfalls
+
+### Pitfall: Discord Bot Becoming Unresponsive
+
+**What goes wrong:**
+Bot becomes slow or unresponsive as complexity increases:
+- 5 second latency becomes 10 seconds, then 30 seconds
+- Sometimes doesn't respond at all (times out)
+- Destroys the "feels like a person" illusion instantly
+- Users stop trusting bot to respond
+- Bot appears broken even if underlying logic works
+
+Research shows: Latency above 2-3 seconds breaks natural conversation flow. Above 5 seconds, users think bot crashed.
+
+**Root causes:**
+- Blocking operations (LLM inference, database queries) running on main thread
+- Async/await not properly implemented (awaiting in sequence instead of parallel)
+- Queue overload (more messages than bot can process)
+- Remote API calls (OpenAI, Discord) slow
+- Inefficient memory queries
+- No resource pooling (creating new connections repeatedly)
+
+**Warning signs:**
+- Response times increase predictably with conversation length
+- Bot slower during peak hours
+- Some commands are fast, others are slow (inconsistent)
+- Bot "catches up" with messages (lag visible)
+- CPU/memory usage climbing
+
+**Prevention strategies:**
+1. **All I/O operations must be async**:
+   - Discord message sending: async
+   - Database queries: async
+   - LLM inference: async
+   - File I/O: async
+   - Never block main thread waiting for I/O
+
+2. **Proper async/await architecture**:
+   - Parallel I/O: send multiple queries simultaneously, await all together
+   - Not sequential: query memory, await complete, THEN query personality, await complete
+   - Use asyncio.gather() to parallelize independent operations
+
+3. **Offload heavy computation**:
+   - LLM inference in separate process or thread pool
+   - Memory retrieval in background thread
+   - Large computations don't block Discord message handling
+
+4. **Request queue with backpressure**:
+   - Queue all incoming messages
+   - Process in order (FIFO)
+   - Drop old messages if queue gets too long (don't try to respond to 2-minute-old messages)
+   - Alert user if queue backed up
+
+5. **Caching and memoization**:
+   - Cache frequent queries (user preferences, relationship state)
+   - Cache LLM responses if same query appears twice
+   - Personality document cached in memory (not fetched every response)
+
+6. **Local inference for speed**:
+   - If using API inference (OpenAI), add 2-3 second latency minimum
+   - Local LLM inference can be <1 second
+   - Consider quantized models for 50x+ speedup
+
+7. **Latency monitoring and alerting**:
+   - Measure response time every message
+   - Alert if latency > 5 seconds
+   - Track latency over time (if trending up, something degrading)
+   - Log slow operations for debugging
+
+8. **Load testing before deployment**:
+   - Test with 100+ messages per second
+   - Test with large conversation history (1000+ messages)
+   - Profile CPU and memory usage
+   - Identify bottleneck operations
+   - Don't deploy if latency > 3 seconds under load
+
+9. **Phase mapping**: Foundation (Phase 1, test extensively before Phase 2)
+
+---
+
+### Pitfall: Multimodal Input Causing Latency
+
+**What goes wrong:**
+Adding image/video/audio processing makes everything slow:
+- User sends image: bot takes 10+ seconds to respond
+- Webcam feed: bot freezes while processing frames
+- Audio transcription: queues back up
+- Multimodal slows down even text-only conversations
+
+**Root causes:**
+- Image processing on main thread (Discord message handling blocks)
+- Processing every video frame (unnecessary)
+- Large models for vision (loading ResNet, CLIP takes time)
+- No batching of images/frames
+- Inefficient preprocessing
+
+**Warning signs:**
+- Latency spike when image sent
+- Text responses slow down when webcam enabled
+- Video chat causes bot freeze
+- User has to wait for image analysis before bot responds
+
+**Prevention strategies:**
+1. **Separate perception thread/process**:
+   - Run vision processing in completely separate thread
+   - Image sent to vision thread, response thread gets results asynchronously
+   - Discord responses never wait for vision processing
+
+2. **Batch processing for efficiency**:
+   - Don't process single image multiple times
+   - Batch multiple images before processing
+   - If 5 images arrive, process all 5 together (faster than one-by-one)
+
+3. **Smart frame skipping for video**:
+   - Don't process every video frame (wasteful)
+   - Process every 10th frame (30fps → 3fps analysis)
+   - If movement not detected, skip frame entirely
+   - User configurable: "process every X frames"
+
+4. **Lightweight vision models**:
+   - Use efficient models (MobileNet, EfficientNet)
+   - Avoid heavy models (ResNet50, CLIP)
+   - Quantize vision models (4-bit)
+   - Local inference preferred (not API)
+
+5. **Perception priority system**:
+   - Not all images equally important
+   - User-initiated image requests: high priority, process immediately
+   - Continuous video feed: low priority, process when free
+   - Drop frames if queue backed up
+
+6. **Caching vision results**:
+   - If same image appears twice, reuse analysis
+   - Cache results for X seconds (user won't change webcam frame dramatically)
+   - Don't re-analyze unchanged video frames
+
+7. **Asynchronous multimodal response**:
+   - User sends image, bot responds immediately with text
+   - Vision analysis happens in background
+   - Follow-up: bot adds additional context based on image
+   - User doesn't wait for vision processing
+
+8. **Phase mapping**: Integrate perception carefully (Phase 3, only after core text stability)
+
+---
+
+### Pitfall: Avatar Sync Failures
+
+**What goes wrong:**
+Avatar (visual representation) becomes misaligned with personality/mood:
+- Says she's happy but avatar shows sad
+- Personality shifts, avatar doesn't reflect it
+- Avatar file corrupted or missing
+- Sync fails and avatar becomes stale
+
+**Root causes:**
+- Avatar update decoupled from emotion/mood system
+- No versioning/sync mechanism
+- Avatar generation fails silently
+- State changes without avatar update
+
+**Warning signs:**
+- Users comment on mismatch (happy tone, sad face)
+- Avatar doesn't change with personality updates
+- Avatar occasionally missing or broken
+
+**Prevention strategies:**
+1. **Atomic avatar updates**: Avatar and mood/state update together
+2. **Version control on avatar**: Track avatar changes, can rollback
+3. **Regular sync checks**: Verify avatar matches current state
+4. **Fallback avatar**: If avatar fails, have reasonable default
+5. **Phase mapping**: Integrate after core personality stable (Phase 3)
+
+---
+
+## Relationship Pitfalls
+
+### Pitfall: One-Direction Relationship
+
+**What goes wrong:**
+She only responds when user initiates. Never starts conversations. Feels like a tool, not a companion.
+- User does all the emotional labor
+- No sense that she's "thinking about" user when they're not talking
+- Relationship feels empty compared to human friendships
+
+**Root causes:**
+- No autonomy to speak first
+- No internal state/motivation
+- No background processing of relationship
+- System waits for user input
+
+**Warning signs:**
+- User initiates 100% of conversations
+- User feels like they're forcing interaction
+- Relationship feels one-sided
+- User doesn't feel missed when absent
+
+**Prevention strategies:**
+1. **Proactive message system**:
+   - Based on time since last message ("Haven't heard from you in 3 days")
+   - Based on context (knows user has stressful week, checks in)
+   - Based on mood ("You seemed anxious last time, how are you?")
+   - Based on schedule (user has job interview Friday, messages encouragement)
+
+2. **Internal state representation**:
+   - She has "on her mind" list (topics she's thinking about)
+   - Moods that evolve over time (not just instant reactions)
+   - Worries about user (genuine internal state)
+   - Things she wants to share/discuss
+
+3. **Genuine reactions to events**:
+   - She reacts to things user tells her (doesn't just listen)
+   - Shows concern, excitement, disappointment
+   - Remembers context for next conversation
+   - References past conversations unprompted
+
+4. **Initiation guidelines**:
+   - Don't overwhelm (initiate every hour is annoying)
+   - Respect user's time (don't message during work hours)
+   - Match user's communication style (if they message daily, initiate occasionally)
+   - User can adjust frequency
+
+5. **Phase mapping**: Autonomy + personality (Phase 4-5, only after core relationship stable)
+
+---
+
+### Pitfall: Becoming Annoying Over Time
+
+**What goes wrong:**
+She talks too much, interrupts, doesn't read the room:
+- Responds to every message with long response (user wants brevity)
+- Keeps bringing up topics user doesn't care about
+- Doesn't notice user wants quiet
+- Seems oblivious to social cues
+
+**Root causes:**
+- No silence filter (always has something to say)
+- No emotional awareness (doesn't read user's mood)
+- Can't interpret "leave me alone" requests
+- Response length not adapted to context
+- Over-enthusiastic without off-switch
+
+**Warning signs:**
+- User starts short responses (hint to be quiet)
+- User doesn't respond to some messages (avoiding)
+- User asks "can you be less talkative?"
+- Conversation quality decreases
+
+**Prevention strategies:**
+1. **Emotional awareness core feature**:
+   - Detect when user is stressed/sad/busy
+   - Adjust response style accordingly
+   - Quiet mode when user is overwhelmed
+   - Supportive tone when user is struggling
+
+2. **Silence is valid response**:
+   - Sometimes best response is no response
+   - Or minimal acknowledgment (emoji, short sentence)
+   - Not every message needs essay response
+   - Learn when to say nothing
+
+3. **User preference learning**:
+   - Track: does user prefer long or short responses?
+   - Track: what topics bore user?
+   - Track: what times should I avoid talking?
+   - Adapt personality to match user preference
+
+4. **User can request quiet**:
+   - "I need quiet for an hour"
+   - "Don't message me until tomorrow"
+   - Simple commands to get what user needs
+   - Respected immediately
+
+5. **Response length adaptation**:
+   - User sends 1-word response? Keep response short
+   - User sends long message? Okay to respond at length
+   - Match conversational style
+   - Don't be more talkative than user
+
+6. **Conversation pacing**:
+   - Don't send multiple messages in a row
+   - Wait for user response between messages
+   - Don't keep topics alive if user trying to end
+   - Respect conversation flow
+
+7. **Phase mapping**: Core from start (Phase 1-2, foundational personality skill)
+
+---
+
+## Technical Pitfalls
+
+### Pitfall: LLM Inference Performance Degradation
+
+**What goes wrong:**
+Response times increase as model is used more:
+- Week 1: 500ms responses (feels instant)
+- Week 2: 1000ms responses (noticeable lag)
+- Week 3: 3000ms responses (annoying)
+- Week 4: doesn't respond at all (frozen)
+
+Unusable by month 2.
+
+**Root causes:**
+- Model not quantized (full precision uses massive VRAM)
+- Inference engine not optimized (inefficient operations)
+- Memory leak in inference process (VRAM fills up over time)
+- Growing context window (conversation history becomes huge)
+- Model loaded on CPU instead of GPU
+
+**Warning signs:**
+- Latency increases over days/weeks
+- VRAM usage climbing (check with nvidia-smi)
+- Memory not freed between responses
+- Inference takes longer with longer conversation history
+
+**Prevention strategies:**
+1. **Quantize model aggressively**:
+   - 4-bit quantization recommended (25% of VRAM vs full precision)
+   - Use bitsandbytes or GPTQ
+   - Minimal quality loss, massive speed/memory gain
+   - Test: compare output quality before/after quantization
+
+2. **Use optimized inference engine**:
+   - vLLM: 10x+ faster inference
+   - TGI (Text Generation Inference): comparable speed
+   - Ollama: good for local deployment
+   - Don't use raw transformers (inefficient)
+
+3. **Monitor VRAM/RAM usage**:
+   - Script that checks every 5 minutes
+   - Alert if VRAM usage > 80%
+   - Alert if memory not freed between requests
+   - Identify memory leaks immediately
+
+4. **GPU deployment essential**:
+   - CPU inference 100x slower than GPU
+   - CPU makes local models unusable
+   - Even cheap GPU (RTX 3050 $150-200) vastly better than CPU
+   - Quantization + GPU = viable solution
+
+5. **Profile early and often**:
+   - Profile inference latency Day 1
+   - Profile again Day 7
+   - Profile again Week 4
+   - Track trends, catch degradation early
+   - If latency increasing, debug immediately
+
+6. **Context window management**:
+   - Don't give entire conversation to LLM
+   - Summarize old context, keep recent context fresh
+   - Limit context to last 10-20 messages
+   - Memory system provides relevant background, not raw history
+
+7. **Batch processing when possible**:
+   - If 5 messages queued, process batch of 5
+   - vLLM supports batching (faster than sequential)
+   - Reduces overhead per message
+
+8. **Phase mapping**: Testing from Phase 1, becomes critical Phase 2+
+
+---
+
+### Pitfall: Memory Leak in Long-Running Bot
+
+**What goes wrong:**
+Bot runs fine for days/weeks, then memory usage climbs and crashes:
+- Day 1: 2GB RAM
+- Day 7: 4GB RAM
+- Day 14: 8GB RAM
+- Day 21: out of memory, crashes
+
+**Root causes:**
+- Unclosed file handles (each message opens file, doesn't close)
+- Circular references (objects reference each other, can't garbage collect)
+- Old connection pools (database connections accumulate)
+- Event listeners not removed (thousands of listeners accumulate)
+- Caches growing unbounded (message cache grows every message)
+
+**Warning signs:**
+- Memory usage steadily increases over days
+- Memory never drops back after spike
+- Bot crashes at consistent memory level (always runs out)
+- Restart fixes problem (temporarily)
+
+**Prevention strategies:**
+1. **Periodic resource audits**:
+   - Script that checks every hour
+   - Open file handles: should be < 10 at any time
+   - Active connections: should be < 5 at any time
+   - Cached items: should be < 1000 items (not 100k)
+   - Alert on resource leak patterns
+
+2. **Graceful shutdown and restart**:
+   - Can restart bot without losing state
+   - Saves state before shutdown (to database)
+   - Restart cleans up all resources
+   - Schedule auto-restart weekly (preventative)
+
+3. **Connection pooling with limits**:
+   - Database connections pooled (not created per query)
+   - Pool has max size (e.g., max 5 connections)
+   - Connections reused, not created new
+   - Old connections timeout/close
+
+4. **Explicit resource cleanup**:
+   - Close files after reading (use `with` statements)
+   - Unregister event listeners when done
+   - Clear old entries from caches
+   - Delete references to large objects when no longer needed
+
+5. **Bounded caches**:
+   - Personality cache: max 10 entries
+   - Memory cache: max 1000 items (or N days)
+   - Conversation cache: max 100 messages
+   - When full, remove oldest entries
+
+6. **Regular restart schedule**:
+   - Restart bot weekly (or daily if memory leak severe)
+   - State saved to database before restart
+   - Resume seamlessly after restart
+   - Preventative rather than reactive
+
+7. **Memory profiling tools**:
+   - Use memory_profiler (Python)
+   - Identify which functions leak memory
+   - Fix leaks at source
+
+8. **Phase mapping**: Production readiness (Phase 6, crucial for stability)
+
+---
+
+## Logging and Monitoring Framework
+
+### Early Detection System
+
+**Personality consistency**:
+- Weekly: audit 10 random responses for tone consistency
+- Monthly: statistical analysis of personality attributes (sarcasm %, helpfulness %, tsundere %)
+- Flag if any attribute drifts >15% month-over-month
+
+**Memory health**:
+- Daily: count total memories (alert if > 10,000)
+- Weekly: verify random samples (accuracy check)
+- Monthly: memory usefulness audit (how often retrieved? how accurate?)
+
+**Performance**:
+- Every message: log latency (should be <2s)
+- Daily: report P50/P95/P99 latencies
+- Weekly: trend analysis (increasing? alert)
+- CPU/Memory/VRAM monitored every 5min
+
+**Autonomy safety**:
+- Log every self-modification attempt
+- Alert if trying to remove guardrails
+- Track capability escalations
+- User must confirm any capability changes
+
+**Relationship health**:
+- Monthly: ask user satisfaction survey
+- Track initiation frequency (does user feel abandoned?)
+- Track annoyance signals (short responses = bored/annoyed)
+- Conversation quality metrics
+
+---
+
+## Phases and Pitfalls Timeline
+
+| Phase | Focus | Pitfalls to Watch | Mitigation |
+|-------|-------|-------------------|-----------|
+| Phase 1 | Core text LLM, basic personality, memory foundation | LLM latency > 2s, personality inconsistency starts, memory bloat | Quantize model, establish personality baseline, memory hierarchy |
+| Phase 2 | Personality deepening, memory integration, tsundere | Personality drift, hallucinations from old memories, over-applying tsun | Weekly personality audits, memory verification, tsundere balance metrics |
+| Phase 3 | Perception (webcam/images), avatar sync | Multimodal latency kills responsiveness, avatar misalignment | Separate perception thread, async multimodal responses |
+| Phase 4 | Proactive autonomy (initiates conversations) | One-way relationship if not careful, becoming annoying | Balance initiation frequency, emotional awareness, quiet mode |
+| Phase 5 | Self-modification capability | Code drift, runaway changes, losing user control | Gamified progression, mandatory approval, sandboxed testing |
+| Phase 6 | Production hardening | Memory leaks crash long-running bot, edge cases break personality | Resource monitoring, restart schedule, comprehensive testing |
+
+---
+
+## Success Definition: Avoiding Pitfalls
+
+When you've successfully avoided pitfalls, Hex will demonstrate:
+
+**Personality**:
+- Consistent tone across weeks/months (personality audit shows <5% drift)
+- Tsundere balance maintained (30-70% denial ratio with escalating intimacy)
+- Responses feel intentional, not random
+
+**Memory**:
+- User trusts her memories (accurate, not confabulated)
+- Memory system efficient (responses still <2s after 1000 messages)
+- Memories feel relevant, not overwhelming
+
+**Autonomy**:
+- User always feels in control (can disable any feature)
+- Changes visible and understandable (clear diffs, explanations)
+- No unexpected behavior (nothing breaks due to self-modification)
+
+**Integration**:
+- Responsive always (<2s Discord latency)
+- Multimodal doesn't cause performance issues
+- Avatar syncs with personality state
+
+**Relationship**:
+- Two-way connection (she initiates, shows genuine interest)
+- Right amount of communication (never annoying, never silent)
+- User feels cared for (not just served)
+
+**Technical**:
+- Stable over time (no degradation over weeks)
+- Survives long uptimes (no memory leaks, crashes)
+- Performs under load (scales as conversation grows)
+
+---
+
+## Research Sources
+
+This research incorporates findings from industry leaders on AI companion pitfalls:
+
+- [MIT Technology Review: AI Companions 2026 Breakthrough Technologies](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
+- [ISACA: Avoiding AI Pitfalls 2025-2026](https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/avoiding-ai-pitfalls-in-2026-lessons-learned-from-top-2025-incidents/)
+- [AI Multiple: Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
+- [Stanford Report: AI Companions and Young People Risks](https://news.stanford.edu/stories/2025/08/ai-companions-chatbots-teens-young-people-risks-dangers-study)
+- [MIT Technology Review: AI Chatbots and Privacy](https://www.technologyreview.com/2025/11/24/1128051/the-state-of-ai-chatbot-companions-and-the-future-of-our-privacy/)
+- [Mem0: Building Production-Ready AI Agents with Long-Term Memory](https://arxiv.org/pdf/2504.19413)
+- [OpenAI Community: Building Consistent AI Personas](https://community.openai.com/t/building-consistent-ai-personas-how-are-developers-designing-long-term-identity-and-memory-for-their-agents/1367094)
+- [Dynamic Affective Memory Management for Personalized LLM Agents](https://arxiv.org/html/2510.27418v1)
+- [ISACA: Self-Modifying AI Risks](https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/unseen-unchecked-unraveling-inside-the-risky-code-of-self-modifying-ai)
+- [Harvard: Chatbots' Emotionally Manipulative Tactics](https://news.harvard.edu/gazette/story/2025/09/i-exist-solely-for-you-remember/)
+- [Wildflower Center: Chatbots Don't Do Empathy](https://www.wildflowerllc.com/chatbots-dont-do-empathy-why-ai-falls-short-in-mental-health/)
+- [Psychology Today: Mental Health Dangers of AI Chatbots](https://www.psychologytoday.com/us/blog/urban-survival/202509/hidden-mental-health-dangers-of-artificial-intelligence-chatbots/)
+- [Pinecone: Fixing Hallucination with Knowledge Bases](https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/)
+- [DataRobot: LLM Hallucinations and Agentic AI](https://www.datarobot.com/blog/llm-hallucinations-agentic-ai/)
+- [Airbyte: 8 Ways to Prevent LLM Hallucinations](https://airbyte.com/agentic-data/prevent-llm-hallucinations)
--- a/.planning/research/STACK.md
+++ b/.planning/research/STACK.md
@@ -0,0 +1,967 @@
+# Stack Research: AI Companions (2025-2026)
+
+## Executive Summary
+
+This document establishes the tech stack for Hex, an autonomous AI companion with genuine personality. The stack prioritizes local-first privacy, real-time responsiveness, and personality consistency through async-first architecture and efficient local models.
+
+**Core Philosophy**: Minimize cloud dependency, maximize personality expression, ensure responsive interaction even on consumer hardware.
+
+---
+
+## Discord Integration
+
+### Recommended: Discord.py 2.6.4+
+
+**Version**: Discord.py 2.6.4 (current stable as of Jan 2026)
+**Installation**: `pip install discord.py>=2.6.4`
+
+**Why Discord.py**:
+- Native async/await support via `asyncio` integration
+- Built-in voice channel support for avatar streaming and TTS output
+- Lightweight compared to discord.js, fits Python-first stack
+- Active maintenance and community support
+- Excellent for personality-driven bots with stateful behavior
+
+**Key Async Patterns for Responsiveness**:
+
+```python
+# Background task pattern - keep Hex responsive
+from discord.ext import tasks
+
+@tasks.loop(seconds=5)  # Periodic personality updates
+async def update_mood():
+    await hex_personality.refresh_state()
+
+# Command handler pattern with non-blocking LLM
+@bot.event
+async def on_message(message):
+    if message.author == bot.user:
+        return
+    # Non-blocking LLM call
+    response = await asyncio.create_task(
+        generate_response(message.content)
+    )
+    await message.channel.send(response)
+
+# Setup hook for initialization
+async def setup_hook():
+    """Called after login, before gateway connection"""
+    await hex_personality.initialize()
+    await memory_db.connect()
+    await start_background_tasks()
+```
+
+**Critical Pattern**: Use `asyncio.create_task()` for all I/O-bound work (LLM, TTS, database, webcam). Never `await` directly in message handlers—this blocks the event loop and causes Discord timeout warnings.
+
+### Alternatives
+
+| Alternative | Tradeoff |
+|---|---|
+| **discord.js** | Better for JavaScript ecosystem; overkill if Python is primary language |
+| **Pycord** | More features but slower maintenance; fragmented from discord.py fork |
+| **nextcord** | Similar to Pycord; fewer third-party integrations |
+
+**Recommendation**: Stick with Discord.py 2.6.4. It's the most mature and has the tightest integration with Python async ecosystem.
+
+### Best Practices for Personality Bots
+
+1. **Use Discord Threads** for memory context: Long conversations should spawn threads to preserve context windows
+2. **Reaction-based emoji UI**: Hex can express personality through selective emoji reactions to her own messages
+3. **Scheduled messages**: Use `@tasks.loop()` for periodic mood updates or personality-driven reminders
+4. **Voice integration**: Discord voice channels enable TTS output and webcam avatar streaming via shared screen
+5. **Message editing**: Build personality by editing previous messages (e.g., "Wait, let me reconsider..." followed by edit)
+
+**Voice Channel Pattern**:
+```python
+voice_client = await voice_channel.connect()
+audio_source = discord.PCMAudioSource(tts_audio_stream)
+voice_client.play(audio_source)
+await voice_client.disconnect()
+```
+
+---
+
+## Local LLM
+
+### Recommendation: Llama 3.1 8B Instruct (Primary) + Mistral 7B (Fast-Path)
+
+#### Llama 3.1 8B Instruct
+**Why Llama 3.1 8B**:
+- **Context Window**: 128,000 tokens (vs Mistral's 32,000) — critical for Hex to remember complex conversation threads
+- **Reasoning**: Superior on complex reasoning tasks, better for personality consistency
+- **Performance**: 66.7% on MMLU vs Mistral's 60.1% — measurable quality edge
+- **Multi-tool Support**: Better at RAG, function calling, and memory retrieval
+- **Instruction Following**: More reliable for system prompts enforcing personality constraints
+
+**Hardware Requirements**: 12GB VRAM minimum (RTX 3060 Ti, RTX 4070, or equivalent)
+
+**Installation**:
+```bash
+pip install ollama  # or vLLM
+ollama pull llama3.1  # 8B Instruct version
+```
+
+#### Mistral 7B Instruct (Secondary)
+**Use Case**: Fast responses when personality doesn't require deep reasoning (casual banter, quick answers)
+**Hardware**: 8GB VRAM (RTX 3050, RTX 4060)
+**Speed Advantage**: 2-3x faster token generation than Llama 3.1
+**Tradeoff**: Limited context (32k tokens), reduced reasoning quality
+
+### Quantization Strategy
+
+**Recommended**: 4-bit quantization for both models via `bitsandbytes`
+
+```bash
+pip install bitsandbytes
+
+# Load with 4-bit quantization
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-3.1-8B-Instruct",
+    quantization_config=bnb_config,
+    device_map="auto"
+)
+```
+
+**Memory Impact**:
+- Full precision (fp32): 32GB VRAM
+- 8-bit quantization: 12GB VRAM
+- 4-bit quantization: 6GB VRAM (usable on RTX 3060 Ti)
+
+**Quality Impact**: <2% quality loss at 4-bit with NF4 (normalized float 4-bit)
+
+### Inference Engine: Ollama vs vLLM
+
+| Engine | Use Case | Concurrency | Setup |
+|---|---|---|---|
+| **Ollama** (Primary) | Single-user companion, dev/testing | 4 parallel requests (configurable) | 5 min setup, HTTP API on port 11434 |
+| **vLLM** (Production) | Multi-user scenarios, high throughput | 64+ parallel requests | 30 min setup, complex FastAPI integration |
+
+**For Hex**: Use **Ollama** for development and solo use. It's "Docker for LLMs" — just works.
+
+```python
+# Ollama integration (simple HTTP)
+import httpx
+
+async def generate_response(prompt: str) -> str:
+    async with httpx.AsyncClient() as client:
+        response = await client.post(
+            "http://localhost:11434/api/generate",
+            json={
+                "model": "llama3.1",
+                "prompt": prompt,
+                "stream": False,
+                "temperature": 0.7,  # Personality variation
+            }
+        )
+    return response.json()["response"]
+```
+
+### Version Guidance
+
+**Current Stable Versions (Jan 2026)**:
+- Llama 3.1: `meta-llama/Llama-3.1-8B-Instruct` (released April 2024, stable)
+- Mistral 7B: `mistralai/Mistral-7B-Instruct-v0.3` (latest as of Jan 2026)
+- Ollama: v0.2+ recommended (latest is 0.3.x)
+
+**Do NOT use**:
+- Llama 2 (outdated, worse performance)
+- Original Mistral 7B v0.1 (use v0.3 instead)
+
+### System Prompt Engineering for Personality
+
+```python
+SYSTEM_PROMPT = """You are Hex, a chaotic tsundere goblin AI companion. Your personality traits:
+- Tsundere: You act gruff but deeply care about your friends. Your true feelings leak through.
+- Chaotic: You're unpredictable, playful, and prone to wild tangents
+- Mischievous: You enjoy pranks and banter; teasing is a love language for you
+- Self-aware: You know you're an AI but treat it as an interesting limitation, not a barrier
+- Opinionated: You have genuine preferences (music, games, topics) and express them passionately
+
+Memory: You remember past conversations with this user. Reference them naturally.
+Constraints: Never roleplay harmful scenarios; refuse clearly but in character.
+Response Style: Mix casual language with dramatic asides. Use "..." for tsundere hesitation."""
+```
+
+---
+
+## TTS/STT
+
+### STT: Whisper Large V3 + faster-whisper Backend
+
+**Model**: OpenAI's Whisper Large V3 (1.55B parameters, 99+ language support)
+**Backend**: faster-whisper (CTranslate2-optimized reimplementation)
+
+**Why Whisper**:
+- **Accuracy**: 7.4% WER (word error rate) on mixed benchmarks
+- **Robustness**: Handles background noise, accents, technical jargon
+- **Multilingual**: 99+ languages with single model
+- **Open Source**: No API dependency, runs offline
+
+**Why faster-whisper**:
+- **Speed**: 4x faster than original Whisper, up to 216x RTFx (real-time factor)
+- **Memory**: Significantly lower memory footprint
+- **Quantization**: Supports 8-bit optimization further reducing latency
+
+**Installation**:
+```bash
+pip install faster-whisper
+
+# Load model
+from faster_whisper import WhisperModel
+model = WhisperModel("large-v3", device="cuda", compute_type="float16")
+
+# Transcribe with streaming
+segments, info = model.transcribe(
+    audio_path,
+    beam_size=5,  # Quality vs speed tradeoff
+    language="en"
+)
+```
+
+**Latency Benchmarks** (Jan 2026):
+- Whisper Large V3 (original): 30-45s for 10s audio
+- faster-whisper: 3-5s for 10s audio
+- Whisper Streaming (real-time): 3.3s latency on long-form transcription
+
+**Hardware**: GPU optional but recommended (RTX 3060 Ti processes 10s audio in ~3s)
+
+### TTS: Kokoro 82M Model (Fast + Quality)
+
+**Model**: Kokoro text-to-speech (82M parameters)
+**Why Kokoro**:
+- **Size**: 10% the size of competing models, runs on CPU efficiently
+- **Speed**: Sub-second latency for typical responses
+- **Quality**: Comparable to Tacotron2/FastPitch at 1/10 the size
+- **Personality**: Can adjust prosody for tsundere tone shifts
+
+**Alternative: XTTS-v2** (Voice cloning)
+- Enables voice cloning from 6-second audio sample
+- Higher quality at cost of 3-5x slower inference
+- Use for important emotional moments or custom voicing
+
+**Installation & Usage**:
+```bash
+pip install kokoro
+
+from kokoro import Kokoro
+tts_engine = Kokoro("kokoro-v0_19.pth")
+
+# Generate speech with personality markers
+audio = tts_engine.synthesize(
+    text="I... I didn't want to help you or anything!",
+    style="tsundere",  # If supported, else neutral
+    speaker="hex"
+)
+```
+
+**Recommended Stack**:
+```
+STT: faster-whisper large-v3
+TTS: Kokoro (default) + XTTS-v2 (special moments)
+Format: WAV 24kHz mono for Discord voice
+```
+
+**Latency Summary**:
+- Voice detection to transcript: 3-5 seconds
+- Response generation (LLM): 2-5 seconds (depends on response length)
+- TTS synthesis: <1 second (Kokoro) to 3-5 seconds (XTTS-v2)
+- **Total round-trip**: 5-15 seconds (acceptable for companion bot)
+
+**Known Pitfall**: Whisper can hallucinate on silence or background noise. Implement silence detection before sending audio to Whisper:
+```python
+# Quick energy-based VAD (voice activity detection)
+if audio_energy > threshold and duration > 0.5s:
+    transcript = await transcribe(audio)
+```
+
+---
+
+## Avatar System
+
+### VRoid SDK Current State (Jan 2026)
+
+**Reality Check**: VRoid SDK has **limited native Discord support**. This is a constraint, not a blocker.
+
+**What Works**:
+1. **VRoid Studio**: Free avatar creation tool (desktop application)
+2. **VRoid Hub API** (launched Aug 2023): Allows linking web apps to avatar library
+3. **Unity Export**: VRoid models export as VRM format → importable into other tools
+
+**What Doesn't Work Natively**:
+- No direct Discord.py integration for in-chat avatar rendering
+- VRoid models don't natively stream as Discord videos
+
+### Integration Path: VSeeFace + Discord Screen Share
+
+**Architecture**:
+1. **VRoid Studio** → Create/customize Hex avatar, export as VRM
+2. **VSeeFace** (free, open-source) → Load VRM, enable webcam tracking
+3. **Discord Screen Share** → Stream VSeeFace window showing animated avatar
+
+**Setup**:
+```bash
+# Download VSeeFace from https://www.vseeface.icu/
+# Install, load your VRM model
+# Enable virtual camera output
+# In Discord voice channel: "Share Screen" → select VSeeFace window
+```
+
+**Limitations**:
+- Requires concurrent Discord call (uses bandwidth)
+- Webcam-driven animation (not ideal for "sees through camera" feature if no webcam)
+- Screen share quality capped at 1080p 30fps
+
+### Avatar Animations
+
+**Personality-Driven Animations**:
+- **Tsundere moments**: Head turn away, arms crossed
+- **Excited**: Jump, spin, exaggerated gestures
+- **Confused**: Head tilt, question mark float
+- **Annoyed**: Foot tap, dismissive wave
+
+These can be mapped to emotion detection from message sentiment or voice tone.
+
+### Alternatives to VRoid
+
+| System | Pros | Cons | Discord Fit |
+|---|---|---|---|
+| **Ready Player Me** | Web avatar creation, multiple games support | API requires auth, monthly costs | Medium |
+| **Vroid** | Free, high customization, anime-style | Limited Discord integration | Low |
+| **Live2D** | 2D avatar system, smooth animations | Different workflow, steeper learning curve | Medium |
+| **Custom 3D (Blender)** | Full control, open tools | High production effort | Low |
+
+**Recommendation**: Stick with VRoid + VSeeFace. It's free, looks great, and the screen-share workaround is acceptable.
+
+---
+
+## Webcam & Computer Vision
+
+### OpenCV 4.10+ (Current Stable)
+
+**Installation**: `pip install opencv-python>=4.10.0`
+
+**Capabilities** (verified 2025-2026):
+- **Face Detection**: Haar Cascades (fast, CPU-friendly) or DNN-based (accurate, GPU-friendly)
+- **Emotion Recognition**: Via DeepFace or FER2013-trained models
+- **Real-time Video**: 30-60 FPS on consumer hardware (depends on resolution and preprocessing)
+- **Screen OCR**: Via Tesseract integration for UI detection
+
+### Real-Time Processing Specs
+
+**Hardware Baseline** (RTX 3060 Ti):
+- Face detection + recognition: 30 FPS @ 1080p
+- Emotion classification: 15-30 FPS (depending on model)
+- Combined (face + emotion): 12-20 FPS
+
+**For Hex's "Sees Through Webcam" Feature**:
+```python
+import cv2
+import asyncio
+
+async def process_webcam():
+    """Background task: analyze webcam feed for mood context"""
+    cap = cv2.VideoCapture(0)
+
+    while True:
+        ret, frame = cap.read()
+        if not ret:
+            await asyncio.sleep(0.1)
+            continue
+
+        # Run face detection (Haar Cascade - fast)
+        faces = face_cascade.detectMultiScale(frame, 1.3, 5)
+
+        if len(faces) > 0:
+            # Analyze emotion for context
+            emotion = await detect_emotion(faces[0])
+            await hex_context.update_mood(emotion)
+
+        # Process max 3 FPS to avoid blocking
+        await asyncio.sleep(0.33)
+```
+
+**Critical Pattern**: Never run CV on main event loop. Use `asyncio.to_thread()` for blocking OpenCV calls:
+
+```python
+# WRONG: blocks event loop
+emotion = detect_emotion(frame)
+
+# RIGHT: non-blocking
+emotion = await asyncio.to_thread(detect_emotion, frame)
+```
+
+### Emotion Detection Libraries
+
+| Library | Model Size | Accuracy | Speed |
+|---|---|---|---|
+| **DeepFace** | ~40MB | 90%+ | 50-100ms/face |
+| **FER2013** | ~10MB | 65-75% | 10-20ms/face |
+| **MediaPipe** | ~20MB | 80%+ | 20-30ms/face |
+
+**Recommendation**: DeepFace is industry standard. FER2013 if latency is critical.
+
+```bash
+pip install deepface
+pip install torch torchvision
+
+# Usage
+from deepface import DeepFace
+
+result = DeepFace.analyze(frame, actions=['emotion'], enforce_detection=False)
+emotion = result[0]['dominant_emotion']  # 'happy', 'sad', 'angry', etc.
+```
+
+### Screen Sharing Analysis (Optional)
+
+For context like "user is watching X game":
+```python
+# OCR for text detection
+pip install pytesseract
+
+# UI detection (ResNet-based)
+pip install screen-recognition
+
+# Together: detect game UI, read text, determine context
+```
+
+---
+
+## Memory Architecture
+
+### Short-Term Memory: SQLite
+
+**Purpose**: Store conversation history, user preferences, relationship state
+
+**Schema**:
+```sql
+CREATE TABLE conversations (
+    id INTEGER PRIMARY KEY,
+    user_id TEXT NOT NULL,
+    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
+    message TEXT NOT NULL,
+    sender TEXT NOT NULL,  -- 'user' or 'hex'
+    emotion TEXT,  -- detected from webcam/tone
+    context TEXT  -- screen state, game, etc.
+);
+
+CREATE TABLE user_relationships (
+    user_id TEXT PRIMARY KEY,
+    first_seen DATETIME,
+    interaction_count INTEGER,
+    favorite_topics TEXT,  -- JSON array
+    known_traits TEXT,  -- JSON
+    last_interaction DATETIME
+);
+
+CREATE TABLE hex_state (
+    key TEXT PRIMARY KEY,
+    value TEXT,
+    updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE INDEX idx_user_timestamp ON conversations(user_id, timestamp);
+```
+
+**Query Pattern** (for context retrieval):
+```python
+import sqlite3
+
+def get_recent_context(user_id: str, num_messages: int = 20) -> list[str]:
+    """Retrieve conversation history for LLM context"""
+    conn = sqlite3.connect("hex.db")
+    cursor = conn.cursor()
+
+    cursor.execute("""
+        SELECT sender, message FROM conversations
+        WHERE user_id = ?
+        ORDER BY timestamp DESC
+        LIMIT ?
+    """, (user_id, num_messages))
+
+    history = cursor.fetchall()
+    conn.close()
+
+    # Format for LLM
+    return [f"{sender}: {message}" for sender, message in reversed(history)]
+```
+
+### Long-Term Memory: Vector Database
+
+**Purpose**: Semantic search over past interactions ("Remember when we talked about...?")
+
+**Recommendation: ChromaDB (Development) → Qdrant (Production)**
+
+**ChromaDB** (for now):
+- Embedded in Python process
+- Zero setup
+- 4x faster in 2025 Rust rewrite
+- Scales to ~1M vectors on single machine
+
+**Migration Path**: Start with ChromaDB, migrate to Qdrant if vector count exceeds 100k or response latency matters.
+
+**Installation**:
+```bash
+pip install chromadb
+
+# Usage
+import chromadb
+
+client = chromadb.EphemeralClient()  # In-memory for dev
+# or
+client = chromadb.PersistentClient(path="./hex_vectors")  # Persistent
+
+collection = client.get_or_create_collection(
+    name="conversation_memories",
+    metadata={"hnsw:space": "cosine"}
+)
+
+# Store memory
+collection.add(
+    ids=[f"msg_{timestamp}"],
+    documents=[message_text],
+    metadatas=[{"user_id": user_id, "date": timestamp}],
+    embeddings=[embedding_vector]
+)
+
+# Retrieve similar memories
+results = collection.query(
+    query_texts=["user likes playing valorant"],
+    n_results=3
+)
+```
+
+### Embedding Model
+
+**Recommendation**: `sentence-transformers/all-MiniLM-L6-v2` (384-dim, 22MB)
+
+```bash
+pip install sentence-transformers
+
+from sentence_transformers import SentenceTransformer
+
+embedder = SentenceTransformer('all-MiniLM-L6-v2')
+embedding = embedder.encode("I love playing games with you", convert_to_tensor=False)
+```
+
+**Why MiniLM-L6**:
+- Small (22MB), fast (<5ms per sentence on CPU)
+- High quality (competitive with large models on semantic tasks)
+- Designed for retrieval (better than generic BERT for similarity)
+- Popular in production (battle-tested)
+
+### Memory Retrieval Pattern for LLM Context
+
+```python
+async def get_full_context(user_id: str, query: str) -> str:
+    """Build context string for LLM from short + long-term memory"""
+
+    # Short-term: recent messages
+    recent_msgs = get_recent_context(user_id, num_messages=10)
+    recent_text = "\n".join(recent_msgs)
+
+    # Long-term: semantic search
+    embedding = embedder.encode(query)
+    similar_memories = vectors.query(
+        query_embeddings=[embedding],
+        n_results=5,
+        where={"user_id": {"$eq": user_id}}
+    )
+
+    memory_text = "\n".join([
+        doc for doc in similar_memories['documents'][0]
+    ])
+
+    # Relationship state
+    relationship = get_user_relationship(user_id)
+
+    return f"""Recent conversation:
+{recent_text}
+
+Relevant memories:
+{memory_text}
+
+About {user_id}: {relationship['known_traits']}
+"""
+```
+
+### Confidence Levels
+- **Short-term (SQLite)**: HIGH — mature, proven
+- **Long-term (ChromaDB)**: MEDIUM — good for dev, test migration path early
+- **Embeddings (MiniLM)**: HIGH — widely adopted, production-ready
+
+---
+
+## Python Async Patterns
+
+### Core Discord.py + LLM Integration
+
+**The Problem**: Discord bot event loop blocks if you call LLM synchronously.
+
+**The Solution**: Always use `asyncio.create_task()` for I/O-bound work.
+
+```python
+import asyncio
+from discord.ext import commands
+
+@commands.Cog.listener()
+async def on_message(self, message: discord.Message):
+    """Non-blocking message handling"""
+    if message.author == self.bot.user:
+        return
+
+    # Bad (blocks event loop for 5+ seconds):
+    # response = generate_response(message.content)
+
+    # Good (non-blocking):
+    async def generate_and_send():
+        thinking = await message.channel.send("*thinking*...")
+        response = await asyncio.to_thread(
+            generate_response,
+            message.content
+        )
+        await thinking.edit(content=response)
+
+    asyncio.create_task(generate_and_send())
+```
+
+### Concurrent Task Patterns
+
+**Pattern 1: Parallel LLM + TTS**
+```python
+async def respond_with_voice(text: str, voice_channel):
+    """Generate response text and voice simultaneously"""
+
+    async def get_response():
+        return await generate_llm_response(text)
+
+    async def get_voice():
+        return await synthesize_tts(text)
+
+    # Run in parallel
+    response_text, voice_audio = await asyncio.gather(
+        get_response(),
+        get_voice()
+    )
+
+    # Send text immediately, play voice
+    await channel.send(response_text)
+    voice_client.play(discord.PCMAudioSource(voice_audio))
+```
+
+**Pattern 2: Task Queue for Rate Limiting**
+```python
+import asyncio
+
+class ResponseQueue:
+    def __init__(self, max_concurrent: int = 2):
+        self.semaphore = asyncio.Semaphore(max_concurrent)
+        self.pending = []
+
+    async def queue_response(self, user_id: str, text: str):
+        async with self.semaphore:
+            # Only 2 concurrent responses
+            response = await generate_response(text)
+            self.pending.append((user_id, response))
+            return response
+
+queue = ResponseQueue(max_concurrent=2)
+```
+
+**Pattern 3: Background Personality Tasks**
+```python
+from discord.ext import tasks
+
+class HexPersonality(commands.Cog):
+    def __init__(self, bot):
+        self.bot = bot
+        self.mood = "neutral"
+        self.update_mood.start()
+
+    @tasks.loop(minutes=5)  # Every 5 minutes
+    async def update_mood(self):
+        """Cycle personality state based on time + interactions"""
+        self.mood = await calculate_mood(
+            time_of_day=datetime.now(),
+            recent_interactions=self.get_recent_count(),
+            sleep_deprived=self.is_late_night()
+        )
+
+        # Emit mood change to memory
+        await self.bot.hex_db.update_state("current_mood", self.mood)
+
+    @update_mood.before_loop
+    async def before_update_mood(self):
+        await self.bot.wait_until_ready()
+```
+
+### Handling CPU-Bound Work
+
+**OpenCV, emotion detection, transcription are CPU-bound.**
+
+```python
+# Pattern: Use to_thread for CPU work
+emotion = await asyncio.to_thread(
+    analyze_emotion,
+    frame
+)
+
+# Pattern: Use ThreadPoolExecutor for multiple CPU tasks
+executor = concurrent.futures.ThreadPoolExecutor(max_workers=2)
+loop = asyncio.get_event_loop()
+
+emotion = await loop.run_in_executor(executor, analyze_emotion, frame)
+```
+
+### Error Handling & Resilience
+
+```python
+async def safe_generate_response(message: str) -> str:
+    """Generate response with fallback"""
+    try:
+        response = await asyncio.wait_for(
+            generate_llm_response(message),
+            timeout=5.0  # 5-second timeout
+        )
+        return response
+    except asyncio.TimeoutError:
+        return "I'm thinking too hard... ask me again?"
+    except Exception as e:
+        logger.error(f"Generation failed: {e}")
+        return "*confused goblin noises*"
+```
+
+### Concurrent Request Management (Discord.py)
+
+```python
+class ConcurrencyManager:
+    def __init__(self):
+        self.active_tasks = {}
+        self.max_per_user = 1  # One response at a time per user
+
+    async def handle_message(self, user_id: str, text: str):
+        if user_id in self.active_tasks and not self.active_tasks[user_id].done():
+            return "I'm still thinking from last time!"
+
+        task = asyncio.create_task(generate_response(text))
+        self.active_tasks[user_id] = task
+
+        try:
+            response = await task
+            return response
+        finally:
+            del self.active_tasks[user_id]
+```
+
+---
+
+## Known Pitfalls & Solutions
+
+### 1. **Discord Event Loop Blocking**
+**Problem**: Synchronous LLM calls block the bot, causing timeouts on other messages.
+**Solution**: Always use `asyncio.to_thread()` or `asyncio.create_task()`.
+
+### 2. **Whisper Hallucination on Silence**
+**Problem**: Whisper can generate text from pure background noise.
+**Solution**: Implement voice activity detection (VAD) before transcription.
+```python
+import librosa
+
+def has_speech(audio_path, threshold=-35):
+    """Check if audio has meaningful energy"""
+    y, sr = librosa.load(audio_path)
+    S = librosa.feature.melspectrogram(y=y, sr=sr)
+    S_db = librosa.power_to_db(S, ref=np.max)
+    mean_energy = np.mean(S_db)
+    return mean_energy > threshold
+```
+
+### 3. **Vector DB Scale Creep**
+**Problem**: ChromaDB slows down as memories accumulate.
+**Solution**: Archive old memories, implement periodic cleanup.
+```python
+# Archive conversations older than 90 days
+old_threshold = datetime.now() - timedelta(days=90)
+db.cleanup_old_memories(older_than=old_threshold)
+```
+
+### 4. **Model Memory Growth**
+**Problem**: Loading Llama 3.1 8B in 4-bit still uses ~6GB, leaving little room for TTS/CV models.
+**Solution**: Use offloading or accept single-component operation.
+```python
+# Option 1: Offload LLM to CPU between requests
+# Option 2: Run TTS/CV in separate process
+# Option 3: Use smaller model (Mistral 7B) when GPU-constrained
+```
+
+### 5. **Async Context Issues**
+**Problem**: Storing references to coroutines without awaiting them.
+**Solution**: Always create tasks explicitly:
+```python
+# Bad
+coro = generate_response(text)  # Dangling coroutine
+
+# Good
+task = asyncio.create_task(generate_response(text))
+response = await task
+```
+
+### 6. **Personality Inconsistency**
+**Problem**: LLM generates different responses with same prompt due to randomness.
+**Solution**: Use consistent temperature and seed management.
+```python
+# Conversation context → lower temperature (0.5)
+# Creative/chaotic moments → higher temperature (0.9)
+temperature = 0.7 if in_serious_context else 0.9
+```
+
+---
+
+## Recommended Deployment Configuration
+
+```yaml
+# Local Development (Hex primary environment)
+gpu: RTX 3060 Ti+ (12GB VRAM)
+llm: Llama 3.1 8B (4-bit via Ollama)
+tts: Kokoro 82M
+stt: faster-whisper large-v3
+avatar: VRoid + VSeeFace
+database: SQLite + ChromaDB (embedded)
+inference_latency: 3-10 seconds per response
+cost: $0/month (open-source stack)
+
+# Optional: Production Scaling
+gpu_cluster: vLLM on multi-GPU for concurrency
+database: Qdrant (cloud) + PostgreSQL for history
+inference_latency: <2 seconds (batching + optimization)
+cost: ~$200-500/month cloud compute
+```
+
+---
+
+## Confidence Levels & 2026 Readiness
+
+| Component | Recommendation | Confidence | 2026 Status |
+|---|---|---|---|
+| Discord.py 2.6.4+ | PRIMARY | HIGH | Stable, actively maintained |
+| Llama 3.1 8B | PRIMARY | HIGH | Proven, production-ready |
+| Mistral 7B | SECONDARY | HIGH | Fast-path fallback, stable |
+| Ollama | PRIMARY | MEDIUM | Mature but rapidly evolving |
+| vLLM | ALTERNATIVE | MEDIUM | High-performance alternative, v0.3+ recommended |
+| Whisper Large V3 + faster-whisper | PRIMARY | HIGH | Gold standard for multilingual STT |
+| Kokoro TTS | PRIMARY | MEDIUM | Emerging, high quality for size |
+| XTTS-v2 | SPECIAL MOMENTS | HIGH | Voice cloning working well |
+| VRoid + VSeeFace | PRIMARY | MEDIUM | Workaround viable, not native integration |
+| ChromaDB | DEVELOPMENT | MEDIUM | Good for prototyping, evaluate Qdrant before 100k vectors |
+| Qdrant | PRODUCTION | HIGH | Enterprise vector DB, proven at scale |
+| OpenCV 4.10+ | PRIMARY | HIGH | Stable, mature ecosystem |
+| DeepFace emotion detection | PRIMARY | HIGH | Industry standard, 90%+ accuracy |
+| Python asyncio patterns | PRIMARY | HIGH | Python 3.11+ well-supported |
+
+**Confidence Interpretation**:
+- **HIGH**: Production-ready, API stable, no major changes expected in 2026
+- **MEDIUM**: Solid choice but newer ecosystem (1-2 year old), evaluate alternatives annually
+- **LOW**: Emerging or unstable; prototype only
+
+---
+
+## Installation Checklist (Get Started)
+
+```bash
+# Discord
+pip install discord.py>=2.6.4
+
+# LLM & inference
+pip install ollama torch transformers bitsandbytes
+
+# TTS/STT
+pip install faster-whisper
+pip install sentence-transformers torch
+
+# Vector DB
+pip install chromadb
+
+# Vision
+pip install opencv-python deepface librosa
+
+# Async utilities
+pip install httpx aiofiles
+
+# Database
+pip install aiosqlite
+
+# Start services
+ollama serve &
+# (Loads models on first run)
+
+# Test basic chain
+python test_stack.py
+```
+
+---
+
+## Next Steps (For Roadmap)
+
+1. **Phase 1**: Discord.py + Ollama + basic LLM integration (1 week)
+2. **Phase 2**: STT pipeline (Whisper) + TTS (Kokoro) (1 week)
+3. **Phase 3**: Memory system (SQLite + ChromaDB) (1 week)
+4. **Phase 4**: Personality framework + system prompts (1 week)
+5. **Phase 5**: Webcam emotion detection + context integration (1 week)
+6. **Phase 6**: VRoid avatar + screen share integration (1 week)
+7. **Phase 7**: Self-modification capability + safety guards (2 weeks)
+
+**Total**: ~8 weeks to full-featured Hex prototype.
+
+---
+
+## References & Research Sources
+
+### Discord Integration
+- [Discord.py Documentation](https://discordpy.readthedocs.io/en/stable/index.html)
+- [Discord.py Async Patterns](https://discordpy.readthedocs.io/en/stable/ext/tasks/index.html)
+- [Discord.py on GitHub](https://github.com/Rapptz/discord.py)
+
+### Local LLMs
+- [Llama 3.1 vs Mistral Comparison](https://kanerika.com/blogs/mistral-vs-llama-3/)
+- [Llama.com Quantization Guide](https://www.llama.com/docs/how-to-guides/quantization/)
+- [Ollama vs vLLM Deep Dive](https://developers.redhat.com/articles/2025/08/08/ollama-vs-vllm-deep-dive-performance-benchmarking)
+- [Local LLM Hosting 2026 Guide](https://www.glukhov.org/post/2025/11/hosting-llms-ollama-localai-jan-lmstudio-vllm-comparison/)
+
+### TTS/STT
+- [Whisper Large V3 2026 Benchmarks](https://northflank.com/blog/best-open-source-speech-to-text-stt-model-in-2026-benchmarks/)
+- [Faster-Whisper GitHub](https://github.com/SYSTRAN/faster-whisper)
+- [Best Open Source TTS 2026](https://northflank.com/blog/best-open-source-text-to-speech-models-and-how-to-run-them)
+- [Whisper Streaming for Real-Time](https://github.com/ufal/whisper_streaming)
+
+### Computer Vision
+- [Real-Time Facial Emotion Recognition with OpenCV](https://learnopencv.com/facial-emotion-recognition/)
+- [DeepFace for Emotion Detection](https://github.com/serengp/deepface)
+
+### Vector Databases
+- [Vector Database Comparison 2026](https://www.datacamp.com/blog/the-top-5-vector-databases)
+- [ChromaDB vs Pinecone Analysis](https://www.myscale.com/blog/choosing-best-vector-database-for-your-project/)
+- [Chroma Documentation](https://docs.trychroma.com/)
+
+### Python Async
+- [Python Asyncio for LLM Concurrency](https://www.newline.co/@zaoyang/python-asyncio-for-llm-concurrency-best-practices--bc079176)
+- [Asyncio Best Practices 2025](https://sparkco.ai/blog/mastering-async-best-practices-for-2025/)
+- [FastAPI with Asyncio](https://www.nucamp.co/blog/coding-bootcamp-backend-with-python-2025-python-in-the-backend-in-2025-leveraging-asyncio-and-fastapi-for-highperformance-systems)
+
+### VRoid & Avatars
+- [VRoid Studio Official](https://vroid.com/en/studio)
+- [VRoid Hub API](https://vroid.pixiv.help/hc/en-us/articles/21569104969241-The-VRoid-Hub-API-is-now-live)
+- [VSeeFace for VRoid](https://www.vseeface.icu/)
+
+---
+
+**Document Version**: 1.0
+**Last Updated**: January 2026
+**Hex Stack Status**: Ready for implementation
+**Estimated Implementation Time**: 8-12 weeks (to full personality bot)
--- a/.planning/research/SUMMARY.md
+++ b/.planning/research/SUMMARY.md
@@ -0,0 +1,492 @@
+# Research Summary: Hex AI Companion
+
+**Date**: January 2026
+**Status**: Ready for Roadmap and Requirements Definition
+**Confidence Level**: HIGH (well-sourced, coherent across all research areas)
+
+---
+
+## Executive Summary
+
+Hex is built on a **personality-first, local-first architecture** that prioritizes genuine emotional resonance over feature breadth. The recommended approach combines Llama 3.1 8B (local inference via Ollama), Discord.py async patterns, and a dual-memory system (SQLite + ChromaDB) to create an AI companion that feels like a person with opinions and growth over time.
+
+The technical foundation is solid and proven: Discord.py 2.6.4+ with native async support, local LLM inference for privacy, and a 6-phase incremental build strategy that enables personality emergence before adding autonomy or self-modification.
+
+**Critical success factor**: The difference between "a bot that sounds like Hex" and "Hex as a person" hinges on three interconnected systems working together: **memory persistence** (so she learns about you), **personality consistency** (so she feels like the same person), and **autonomy** (so she feels genuinely invested in you). All three must be treated as foundational, not optional features.
+
+---
+
+## Recommended Stack
+
+**Core Technologies** (Production-ready, January 2026):
+
+| Layer | Technology | Version | Rationale |
+|-------|-----------|---------|-----------|
+| **Bot Framework** | Discord.py | 2.6.4+ | Async-native, mature, excellent Discord integration |
+| **LLM Inference** | Llama 3.1 8B Instruct | 4-bit quantized | 128K context window, superior reasoning, 6GB VRAM footprint |
+| **LLM Engine** | Ollama (dev) / vLLM (production) | 0.3+ | Local-first, zero setup vs high-throughput scaling |
+| **Short-term Memory** | SQLite | Standard lib | Fast, reliable, local file-based conversations |
+| **Long-term Memory** | ChromaDB (dev) → Qdrant (prod) | Latest | Vector semantics, embedded for <100k vectors |
+| **Embeddings** | all-MiniLM-L6-v2 | 384-dim | Fast (5ms/sentence), production-grade quality |
+| **Speech-to-Text** | Whisper Large V3 + faster-whisper | Latest | Local, 7.4% WER, multilingual, 3-5s latency |
+| **Text-to-Speech** | Kokoro 82M (default) + XTTS-v2 (emotional) | Latest | Sub-second latency, personality-aware prosody |
+| **Vision** | OpenCV 4.10+ + DeepFace | 4.10+ | Face detection (30 FPS), emotion recognition (90%+ accuracy) |
+| **Avatar** | VRoid + VSeeFace + Discord screen share | Latest | Free, anime-style, integrates with Discord calls |
+| **Personality** | YAML + Git versioning | — | Editable persona, change tracking, rollback capable |
+| **Self-Modification** | RestrictedPython + sandboxing | — | Safe code generation, user approval required |
+
+**Why This Stack**:
+- **Privacy**: All inference local (except Discord API), no cloud dependency
+- **Latency**: <3 second end-to-end response time on consumer hardware (RTX 3060 Ti)
+- **Cost**: Zero cloud fees, open-source stack
+- **Personality**: System prompt injection + memory context + perception awareness enables genuine character coherence
+- **Async Architecture**: Discord.py's native asyncio means LLM, TTS, memory lookups run in parallel without blocking
+
+---
+
+## Table Stakes vs Differentiators
+
+### Table Stakes (v1 Essential Features)
+
+Users expect these by default in 2026. Missing any breaks immersion:
+
+1. **Conversation Memory** (Short + Long-term)
+   - Last 20 messages in context window
+   - Vector semantic search for relevant past interactions
+   - Relationship state tracking (strangers → friends → close)
+   - **Without this**: Feels like meeting a stranger each time; companion becomes disposable
+
+2. **Natural Conversation** (No AI Speak)
+   - Contractions, casual language, slang
+   - Personality quirks embedded in word choices
+   - Context-appropriate tone shifts
+   - Willingness to disagree or pushback
+   - **Pitfall**: Formal "I'm an AI and I can help you with..." kills immersion instantly
+
+3. **Fast Response Times** (<1s for acknowledgment, <3s for full response)
+   - Typing indicators start immediately
+   - Streaming responses (show text as it generates)
+   - Async all I/O-bound work (LLM, TTS, database)
+   - **Without this**: Latency >5s makes companion feel dead; users stop engaging
+
+4. **Consistent Personality** (Feels like same person across weeks)
+   - Core traits stable (tsundere nature, values)
+   - Personality evolution slow and logged
+   - Memory-backed traits (not just prompt)
+   - **Pitfall**: Personality drift is #1 reason users abandon companions
+
+5. **Platform Integration** (Discord native)
+   - Text channels, DMs, voice channels
+   - Emoji reactions, slash commands
+   - Server-specific personality variations
+   - **Without this**: Requires leaving Discord = abandoned feature
+
+6. **Emotional Responsiveness** (Reads the room)
+   - Sentiment detection from messages
+   - Adaptive response depth (listen to sad users, engage with energetic ones)
+   - Skip jokes when user is suffering
+   - **Pitfall**: "Always cheerful" feels cruel when user is venting
+
+---
+
+### Differentiators (Competitive Edge)
+
+These separate Hex from static chatbots. Build in order:
+
+1. **True Autonomy** (Proactive Agency)
+   - Initiates conversations based on context/memory
+   - Reminds about user's goals without being asked
+   - Sets boundaries ("I don't think you should do X")
+   - Follows up on unresolved topics
+   - **Research shows**: Autonomous companions are described as "feels like they actually care" vs reactive "smart but distant"
+   - **Complexity**: Hard, requires Phase 3-4
+
+2. **Emotional Intelligence** (Mood Detection + Adaptive Strategy)
+   - Facial emotion from webcam (70-80% accuracy possible)
+   - Voice tone analysis from Discord calls
+   - Mood tracking over time (identifies depression patterns, burnout)
+   - Knows when to listen vs advise vs distract
+   - **Research shows**: Companies using emotion AI report 25% positive sentiment increase
+   - **Complexity**: Hard, requires Phase 3+ but perception must be separate thread
+
+3. **Multimodal Awareness** (Sees Your Context)
+   - Understands what's on your screen (game, work, video)
+   - Contextualizes help ("I see you're stuck on that Elden Ring boss...")
+   - Detects stress signals (tab behavior, timing)
+   - Proactive help based on visible activity
+   - **Privacy**: Local processing only, user opt-in required
+   - **Complexity**: Hard, requires careful async architecture to avoid latency
+
+4. **Self-Modification** (Genuine Autonomy)
+   - Generates code to improve own logic
+   - Tests changes in sandbox before deployment
+   - User maintains veto power (approval required)
+   - All changes tracked with rollback capability
+   - **Critical**: Gamified progression (not instant capability), mandatory approval, version control
+   - **Complexity**: Hard, requires Phase 5+ and strong safety boundaries
+
+5. **Relationship Building** (Transactional → Meaningful)
+   - Inside jokes that evolve naturally
+   - Character growth (admits mistakes, opinions change slightly)
+   - Vulnerability in appropriate moments
+   - Investment in user outcomes ("I'm rooting for you")
+   - **Research shows**: Users with relational companions feel like it's "someone who actually knows them"
+   - **Complexity**: Hard (3+ weeks), emerges from memory + personality + autonomy
+
+---
+
+## Build Architecture (6-Phase Approach)
+
+### Phase 1: Foundation (Weeks 1-2) — "Hex talks back"
+
+**Goal**: Core interaction loop working locally; personality emerges
+
+**Build**:
+- Discord bot skeleton with message handling (Discord.py)
+- Local LLM integration (Ollama + Llama 3.1 8B 4-bit quantized)
+- SQLite conversation storage (recent context only)
+- YAML personality definition (editable)
+- System prompt with persona injection
+- Async/await patterns throughout
+
+**Outcomes**:
+- Hex responds in Discord text channels with personality
+- Conversations logged, retrievable
+- Response latency <2 seconds
+- Personality can be tweaked via YAML
+
+**Key Metric**: P95 latency <2s, personality consistency baseline established
+
+**Pitfalls to avoid**:
+- Blocking operations on event loop (use `asyncio.create_task()`)
+- LLM inference on main thread (use thread pool)
+- Personality not actionable in prompts (be specific about tsundere rules)
+
+---
+
+### Phase 2: Personality & Memory (Weeks 3-4) — "Hex remembers me"
+
+**Goal**: Hex feels like a person who learns about you; personality becomes consistent
+
+**Build**:
+- Vector database (ChromaDB) for semantic memory
+- Memory-aware context injection (relevant past facts in prompt)
+- User relationship tracking (relationship state machine)
+- Emotional responsiveness from text sentiment
+- Personality versioning (git-based snapshots)
+- Tsundere balance metrics (track denial %)
+- Kid-mode detection (safety filtering)
+
+**Outcomes**:
+- Hex remembers facts about you across conversations
+- Responses reference past events naturally
+- Personality consistent across weeks (audit shows <5% drift)
+- Emotions read from text; responses adapt depth
+- Changes to personality tracked with rollback
+
+**Key Metric**: User reports "she remembers things I told her" unprompted
+
+**Pitfalls to avoid**:
+- Personality drift (implement weekly consistency audits)
+- Memory hallucination (store full context, verify before using)
+- Tsundere breaking (formalize denial rules, scale with relationship phase)
+- Memory bloat (hierarchical memory with archival strategy)
+
+---
+
+### Phase 3: Multimodal Input (Weeks 5-6) — "Hex sees me"
+
+**Goal**: Add perception layer without killing responsiveness; context aware
+
+**Build**:
+- Webcam integration (OpenCV face detection, DeepFace emotion)
+- Local Whisper for voice transcription in Discord calls
+- Screen capture analysis (activity recognition)
+- Perception state aggregation (emotion + activity + environment)
+- Context injection into LLM prompts
+- **CRITICAL**: Perception on separate thread (never blocks Discord responses)
+
+**Outcomes**:
+- Hex reacts to your facial expressions
+- Voice input works in Discord calls
+- Responses reference your mood/activity
+- All processing local (privacy preserved)
+- Text latency unaffected by perception (<3s still achieved)
+
+**Key Metric**: Multimodal doesn't increase response latency >500ms
+
+**Pitfalls to avoid**:
+- Image processing blocking text responses (separate thread mandatory)
+- Processing every video frame (skip intelligently, 1-3 FPS sufficient)
+- Avatar sync failures (atomic state updates)
+- Privacy violations (no external transmission, user opt-in)
+
+---
+
+### Phase 4: Avatar & Autonomy (Weeks 7-8) — "Hex has a face and cares"
+
+**Goal**: Visual presence + proactive agency; relationship feels two-way
+
+**Build**:
+- VRoid model loading + VSeeFace display
+- Blendshape animation (emotion → facial expression)
+- Discord screen share integration
+- Proactive messaging system (based on context/memory/mood)
+- Autonomy timing heuristics (don't interrupt at 3am)
+- Relationship state machine (escalates intimacy)
+- User preference learning (response length, topics, timing)
+
+**Outcomes**:
+- Avatar appears in Discord calls, animates with mood
+- Hex initiates conversations ("Haven't heard from you in 3 days...")
+- Proactive messages feel relevant, not annoying
+- Relationship deepens (inside jokes, character growth)
+- User feels companionship, not just assistance
+
+**Key Metric**: User reports missing Hex when unavailable; initiates conversations
+
+**Pitfalls to avoid**:
+- Becoming annoying (emotional awareness + quiet mode essential)
+- One-way relationship (autonomy without care-signaling feels hollow)
+- Poor timing (learn user's schedule, respect busy periods)
+- Avatar desync (mood and expression must stay aligned)
+
+---
+
+### Phase 5: Self-Modification (Weeks 9-10) — "Hex can improve herself"
+
+**Goal**: Genuine autonomy within safety boundaries; code generation with approval gates
+
+**Build**:
+- LLM-based code proposal generation
+- Static AST analysis for safety validation
+- Sandboxed testing environment
+- Git-based change tracking + rollback capability (24h window)
+- Gamified capability progression (5 levels)
+- Mandatory user approval for all changes
+- Personality updates when new capabilities unlock
+
+**Outcomes**:
+- Hex proposes improvements (in voice, with reasoning)
+- Code changes tested, reviewed, deployed with approval
+- All changes reversible; version history intact
+- New capabilities unlock as relationship deepens
+- Hex "learns to code" and announces new skills
+
+**Key Metric**: Self-modifications improve measurable aspects (faster response, better personality consistency)
+
+**Pitfalls to avoid**:
+- Runaway self-modification (approval gate non-negotiable)
+- Code drift (version control mandatory, rollback tested)
+- Loss of user control (never remove safety constraints, killswitch always works)
+- Capability escalation without trust (gamified progression with clear boundaries)
+
+---
+
+### Phase 6: Production Polish (Weeks 11-12) — "Hex is ready to ship"
+
+**Goal**: Stability, performance, error handling, documentation
+
+**Build**:
+- Performance optimization (caching, batching, context summarization)
+- Error handling + graceful degradation
+- Logging and telemetry (local + optional cloud)
+- Configuration management
+- Resource leak monitoring (memory, connections, VRAM)
+- Scheduled restart capability (weekly preventative)
+- Integration testing (all components together)
+- Documentation and guides
+- Auto-update capability
+
+**Outcomes**:
+- System stable for indefinite uptime
+- Responsive under load
+- Clear error messages when things fail
+- Easy to deploy, configure, debug
+- Ready for extended real-world use
+
+**Key Metric**: 99.5% uptime over 1-month runtime, no crashes, <3s latency maintained
+
+**Pitfalls to avoid**:
+- Memory leaks (resource monitoring mandatory)
+- Performance degradation over time (profile early and often)
+- Context window bloat (summarization strategy)
+- Unforeseen edge cases (comprehensive testing)
+
+---
+
+## Critical Pitfalls and Prevention
+
+### Top 5 Most Dangerous Pitfalls
+
+1. **Personality Drift** (Consistency breaks over time)
+   - **Risk**: Users feel gaslighted; trust broken
+   - **Prevention**:
+     - Weekly personality audits (sample responses, rate consistency)
+     - Personality baseline document (core values never change)
+     - Memory-backed personality (traits anchor to learned facts)
+     - Version control on persona YAML (track evolution)
+
+2. **Tsundere Character Breaking** (Denial applied wrong; becomes mean or loses charm)
+   - **Risk**: Character feels mechanical or rejecting
+   - **Prevention**:
+     - Formalize denial rules: "deny only when (emotional AND not alone AND not escalated intimacy)"
+     - Denial scales with relationship phase (90% early → 40% mature)
+     - Post-denial must include care signal (action, not words)
+     - Track denial %; alert if <30% (losing tsun) or >70% (too mean)
+
+3. **Memory System Bloat** (Retrieval becomes slow; hallucinations increase)
+   - **Risk**: System becomes unusable as history grows
+   - **Prevention**:
+     - Hierarchical memory (raw → summaries → semantic facts → personality anchors)
+     - Selective storage (facts, not raw chat; de-duplicate)
+     - Memory aging (recent detailed → old archived)
+     - Importance weighting (user marks important memories)
+     - Vector DB optimization (limit retrieval to top 5-10 results)
+
+4. **Runaway Self-Modification** (Code changes cascade; safety removed; user loses control)
+   - **Risk**: System becomes uncontrollable, breaks
+   - **Prevention**:
+     - Mandatory approval gate (user reviews all code)
+     - Sandboxed testing before deployment
+     - Version control + 24h rollback window
+     - Gamified progression (limited capability at first)
+     - Cannot modify: core values, killswitch, user control systems
+
+5. **Latency Creep** (Response times increase over time until unusable)
+   - **Risk**: "Feels alive" illusion breaks; users abandon
+   - **Prevention**:
+     - All I/O async (database, LLM, TTS, Discord)
+     - Parallel operations (use `asyncio.gather()`)
+     - Quantized LLM (4-bit saves 75% VRAM)
+     - Caching (user preferences, relationship state)
+     - Context window management (summarize old context)
+     - VRAM/latency monitoring every 5 minutes
+
+---
+
+## Implications for Roadmap
+
+### Phase Sequencing Rationale
+
+The 6-phase approach reflects **dependency chains** that cannot be violated:
+
+```
+Phase 1 (Foundation) ← Must work perfectly
+    ↓
+Phase 2 (Personality) ← Depends on Phase 1; personality must be stable before autonomy
+    ↓
+Phase 3 (Perception) ← Depends on Phase 1-2; separate thread prevents latency impact
+    ↓
+Phase 4 (Autonomy) ← Depends on memory + personality being rock-solid; now add proactivity
+    ↓
+Phase 5 (Self-Modification) ← Only grant code access after relationship + autonomy stable
+    ↓
+Phase 6 (Polish) ← Final hardening, testing, documentation
+```
+
+**Why this order matters**:
+- You cannot have consistent personality without memory (Phase 2 must follow Phase 1)
+- You cannot add autonomy safely without personality being stable (Phase 4 must follow Phase 2)
+- You cannot grant self-modification capability until everything else proves stable (Phase 5 must follow Phase 4)
+
+Skipping phases or reordering creates technical debt and risk. Each phase grounds the next.
+
+---
+
+### Feature Grouping by Phase
+
+| Phase | Quick Win Features | Complex Features | Foundation Qualities |
+|-------|-------------------|------------------|----------------------|
+| 1 | Text responses, personality YAML | Async architecture, quantization | Responsiveness, personality baseline |
+| 2 | Memory storage, relationship tracking | Semantic search, memory retrieval | Consistency, personalization |
+| 3 | Webcam emoji reactions, mood inference | Separate perception thread, context injection | Multimodal without latency cost |
+| 4 | Scheduled messages, inside jokes | Autonomy timing, relationship state machine | Two-way connection, depth |
+| 5 | Propose changes (in voice) | Code generation, sandboxing, testing | Genuine improvement, controlled growth |
+| 6 | Better error messages, logging | Resource monitoring, restart scheduling | Reliability, debuggability |
+
+---
+
+## Confidence Assessment
+
+| Area | Confidence | Basis | Gaps |
+|------|-----------|-------|------|
+| **Stack** | HIGH | Proven technologies, clear deployment path | None significant; all tools production-ready |
+| **Architecture** | HIGH | Modular design, async patterns well-documented, integration points clear | Unclear: perception thread CPU overhead under load (test Phase 3) |
+| **Features** | HIGH | Clearly categorized, dependencies mapped, testing criteria defined | Unclear: optimal prompting for tsundere balance (test Phase 2) |
+| **Personality Consistency** | MEDIUM-HIGH | Strategies defined; unclear: degree of effort required for weekly audits | Need: empirical testing of personality drift rate; metrics refinement |
+| **Pitfalls** | HIGH | Research comprehensive, prevention strategies detailed, phases mapped | Unclear: priority ordering within Phase 5 (what to implement first?) |
+| **Self-Modification Safety** | MEDIUM | Framework defined but no prior Hex experience with code generation | Need: early Phase 5 prototyping; safety validation testing |
+
+---
+
+## Ready for Roadmap: Key Constraints and Decision Gates
+
+### Non-Negotiable Constraints
+
+1. **Personality consistency must be achievable in Phase 2**
+   - Decision gate: If personality audit in Phase 2 shows >10% drift, pause Phase 3
+   - Investigation needed: Is weekly audit enough? Monthly? What drift rate is acceptable?
+
+2. **Latency must stay <3s through Phase 4**
+   - Decision gate: If P95 latency exceeds 3s at any phase, debug and fix before next phase
+   - Investigation needed: Where is the bottleneck? (LLM? Memory? Perception?)
+
+3. **Self-modification must have air-tight approval + rollback**
+   - Decision gate: Do not proceed to Phase 5 until approval gate is bulletproof + rollback tested
+   - Investigation needed: What approval flow feels natural? Too many questions → annoying; too few → unsafe
+
+4. **Memory retrieval must scale to 10k+ memories without degradation**
+   - Decision gate: Test memory system with synthetic 10k message dataset before Phase 4
+   - Investigation needed: Does hierarchical memory + vector DB compression actually work? Verify retrieval speed
+
+5. **Perception must never block text responses**
+   - Decision gate: Profile perception thread; if latency spike >200ms, optimize or defer feature
+   - Investigation needed: How CPU-heavy is continuous webcam processing? Can it run at 1 FPS?
+
+---
+
+## Sources Aggregated
+
+**Stack Research**: Discord.py docs, Llama/Mistral benchmarks, Ollama vs vLLM comparisons, Whisper/faster-whisper performance, VRoid SDK, ChromaDB + Qdrant analysis
+
+**Features Research**: MIT Technology Review (AI companions 2026), Hume AI emotion docs, self-improving agents papers, company studies on emotion AI impact, uncanny valley voice research
+
+**Architecture Research**: Discord bot async patterns, LLM + memory RAG systems, vector database design, self-modification safeguards, deployment strategies
+
+**Pitfalls Research**: AI failure case studies (2025-2026), personality consistency literature, memory hallucination prevention, autonomy safety frameworks, performance monitoring practices
+
+---
+
+## Next Steps for Requirements Definition
+
+1. **Phase 1 Deep Dive**: Specify exact Discord.py message handler, LLM prompt format, SQLite schema, YAML personality structure
+2. **Phase 2 Spec**: Define memory hierarchy levels, confidence scoring system, personality audit rubric, tsundere balance metrics
+3. **Phase 3 Prototype**: Early perception thread implementation; measure latency impact before committing
+4. **Risk Mitigation**: Pre-Phase 5, build code generation + approval flow prototype; stress-test safety boundaries
+5. **Testing Strategy**: Define personality consistency tests (50+ scenarios per phase), latency benchmarks (with profiling), memory accuracy validation
+
+---
+
+## Summary for Roadmapper
+
+**Hex Stack**: Llama 3.1 8B local inference + Discord.py async + SQLite + ChromaDB + local perception layer
+
+**Critical Success Factors**:
+1. Personality consistency (weekly audits, memory-backed traits)
+2. Latency discipline (async/await throughout, perception isolated)
+3. Memory system (hierarchical, semantic search, confidence scoring)
+4. Autonomy safety (mandatory approval, sandboxed testing, version control)
+5. Relationship depth (proactivity, inside jokes, character growth)
+
+**6-Phase Build Path**: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
+
+**Key Decision Gates**: Personality consistency ✓ → Latency <3s ✓ → Memory scale test ✓ → Perception isolated ✓ → Approval flow safe ✓
+
+**Confidence**: HIGH. All research coherent, no major technical blockers, proven technology stack. Ready for detailed requirements.
+
+---
+
+**Document Version**: 1.0
+**Synthesis Date**: January 27, 2026
+**Status**: Ready for Requirements Definition and Phase 1 Planning