docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

## Stack Analysis - Llama 3.1 8B Instruct (128K context, 4-bit quantized) - Discord.py 2.6.4+ async-native framework - Ollama for local inference, ChromaDB for semantic memory - Whisper Large V3 + Kokoro 82M (privacy-first speech) - VRoid avatar + Discord screen share integration ## Architecture - 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish - Personality-first design; memory and consistency foundational - All perception async (separate thread, never blocks responses) - Self-modification sandboxed with mandatory user approval ## Critical Path Phase 1: Core LLM + Discord integration + SQLite memory Phase 2: Vector DB + personality versioning + consistency audits Phase 3: Perception layer (webcam/screen, isolated thread) Phase 4: Autonomy + relationship deepening + inside jokes Phase 5: Self-modification capability (gamified, gated) Phase 6: Production hardening + monitoring + scaling ## Key Pitfalls to Avoid 1. Personality drift (weekly consistency audits required) 2. Tsundere breaking (formalize denial rules; scale with relationship) 3. Memory bloat (hierarchical memory with archival) 4. Latency creep (async/await throughout; perception isolated) 5. Runaway self-modification (approval gates + rollback non-negotiable) ## Confidence HIGH. Stack proven, architecture coherent, dependencies clear. Ready for detailed requirements and Phase 1 planning. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-27 23:55:39 -05:00
parent f9f21944e3
commit d0a1ecfc3d
5 changed files with 4514 additions and 0 deletions
--- a/.planning/research/FEATURES.md
+++ b/.planning/research/FEATURES.md
@@ -0,0 +1,811 @@
+# Features Research: AI Companions in 2026
+
+## Executive Summary
+
+AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on **autonomy**, **emotional intelligence**, and **contextual awareness**. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.
+
+---
+
+## Table Stakes (v1 Essential)
+
+### Conversation Memory (Short + Long-term)
+**Why users expect it:** Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.
+
+**Implementation patterns:**
+- **Short-term context**: Last 10-20 messages per conversation window (standard context window management)
+- **Long-term memory**: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
+- **Episodic memory**: Date-stamped summaries of past conversations for temporal awareness
+
+**User experience impact:** The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.
+
+**Complexity:** Medium (1-3 weeks)
+- Vector database integration (Pinecone, Weaviate, or similar)
+- Memory consolidation strategies to avoid context bloat
+- Retrieval mechanisms that surface relevant past interactions
+
+---
+
+### Natural Conversation (Not Robotic, Personality-Driven)
+**Why users expect it:** Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.
+
+**What makes conversation natural:**
+- Contractions, casual language, slang (not formal prose)
+- Personality quirks in response patterns
+- Context-appropriate tone shifts (serious when needed, joking otherwise)
+- Ability to disagree, be sarcastic, or pushback on bad ideas
+- Conversation markers ("honestly", "wait", "actually") that break up formal rhythm
+
+**User experience impact:** One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.
+
+**Complexity:** Easy (embedded in LLM capability + prompt engineering)
+- System prompt refinement for personality expression
+- Temperature/sampling tuning (not deterministic, not chaotic)
+- Iterative user feedback on tone
+
+---
+
+### Fast Response Times
+**Why users expect it:** In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.
+
+**Discord baseline expectations:**
+- <100ms to acknowledge (typing indicator)
+- <1000ms to first response chunk (ideally 500ms)
+- <3000ms for full multi-line response
+
+**What breaks the experience:**
+- Waiting for API calls to complete before responding (use streaming)
+- Cold starts on serverless infrastructure
+- Slow vector DB queries for memory retrieval
+- Database round-trips that weren't cached
+
+**User experience impact:** Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.
+
+**Complexity:** Medium (1-3 weeks)
+- Response streaming (start typing indicator immediately)
+- Memory retrieval optimization (caching, smart indexing)
+- Infrastructure: fast API routes, edge-deployed models if possible
+- Async/concurrent processing of memory lookups and generation
+
+---
+
+### Consistent Personality
+**Why users expect it:** Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.
+
+**What drives inconsistency:**
+- Different LLM outputs from same prompt (temperature-based randomness)
+- Memory that contradicts previous stated beliefs
+- Personality traits that aren't memory-backed (just in prompt)
+- Adaptation that overrides baseline traits
+
+**Memory-backed personality means:**
+- Core traits are stated in long-term memory ("I'm cynical about human nature")
+- Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
+- Contradiction detection and resolution
+- Personality summaries that get updated, not just individual memories
+
+**User experience impact:** Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.
+
+**Complexity:** Medium (1-3 weeks)
+- Personality embedding in memory system
+- Consistency checks on memory updates
+- Personality evolution logging
+- Conflict resolution between new input and stored traits
+
+---
+
+### Platform Integration (Discord Voice + Text)
+**Why users expect it:** The companion should live naturally in Discord's ecosystem, not require switching platforms.
+
+**Discord-specific needs:**
+- Text channel message responses with proper mentions/formatting
+- React to messages with emojis
+- Slash command integration (/hex status, /hex mood)
+- Voice channel presence (ideally can join and listen)
+- Direct messages (DMs) for private conversations
+- Role/permission awareness (don't act like a mod if not)
+- Server-specific personality variations (different vibe in gaming server vs study server)
+
+**User experience impact:** If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.
+
+**Complexity:** Easy (1-2 weeks)
+- Discord.py or discord.js library handling
+- Presence/activity management
+- Voice endpoint integration (existing libraries handle most)
+- Server context injection into prompts
+
+---
+
+### Emotional Responsiveness (At Least Read-the-Room)
+**Why users expect it:** The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.
+
+**Baseline emotional awareness includes:**
+- Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
+- Tone detection (sarcasm, frustration, excitement)
+- Topic sensitivity (don't joke about topics user is clearly struggling with)
+- Adaptive response depth (brief response for light mood, longer engagement for distress)
+
+**What this is NOT:** This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.
+
+**User experience impact:** Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.
+
+**Complexity:** Easy-Medium (1 week)
+- Sentiment classifier (HuggingFace models available pre-built)
+- Prompt engineering to encode mood (inject sentiment score into system prompt)
+- Instruction-tuning to respond proportionally to emotional weight
+
+---
+
+## Differentiators (Competitive Edge)
+
+### True Autonomy (Proactive Agency)
+**What separates autonomous agents from chatbots:**
+The difference between "ask me anything" and "I'm going to tell you when I think you should know something."
+
+**Autonomous behaviors:**
+- Initiating conversation about topics the user cares about (without being prompted)
+- Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
+- Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
+- Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
+- Flagging contradictions in user statements
+- Following up on unresolved topics from previous conversations
+
+**Why it's a differentiator:** Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.
+
+**Implementation challenge:**
+- Requires memory system to track user states and topics over time
+- Needs periodic proactive message generation (runs on schedule, not only on user input)
+- Temperature and generation parameters must allow surprising outputs (not just safe responses)
+- Requires user permission framework (don't interrupt them)
+
+**User experience impact:** Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."
+
+**Complexity:** Hard (3+ weeks)
+- Proactive messaging system architecture
+- User state inference engine (from memory)
+- Topic tracking and follow-up logic
+- Interruption timing heuristics (don't ping them at 3am)
+- User preference model (how much proactivity do they want?)
+
+---
+
+### Emotional Intelligence (Mood Detection + Adaptive Response)
+**What goes beyond just reading the room:**
+- Real-time emotion detection from webcam/audio (not just text sentiment)
+- Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
+- Adaptive response strategy based on user's emotional trajectory
+- Knowing when to listen vs offer advice vs make them laugh
+- Recognizing when emotions are mismatched to situation (overreacting, underreacting)
+
+**Current research shows:**
+- CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
+- Voice analysis can detect emotional state with similar accuracy
+- Companies using emotion AI report 25% increase in positive sentiment outcomes
+- Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks
+
+**Why it's a differentiator:** Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."
+
+**Implementation patterns:**
+- Webcam feed processing (screen capture of face detection)
+- Voice tone analysis from Discord audio
+- Combine emotional signals: text sentiment + vocal tone + facial expression
+- Store emotion timeseries (track mood patterns across days/weeks)
+
+**User experience impact:** Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."
+
+**Complexity:** Hard (3+ weeks, ongoing iteration)
+- Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
+- Audio analysis for vocal emotion (prosody features)
+- Temporal emotion state tracking
+- Prompt engineering to use emotional context in responses
+- Privacy handling (webcam/audio consent, local processing preferred)
+
+---
+
+### Multimodal Awareness (Webcam + Screen + Context)
+**What it means beyond text:**
+- Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
+- Understanding their physical environment via webcam
+- Contextualizing responses based on what they're actually doing
+- Proactively helping with the task at hand (not just chatting)
+
+**Real-world examples emerging in 2026:**
+- "I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
+- Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
+- Understanding when the user is in a meeting vs free to chat
+- Recognizing when they're working on something and offering relevant help
+
+**Why it's a differentiator:** Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."
+
+**Technical implementation:**
+- Periodic screen capture (every 5-10 seconds, only when user opts in)
+- Lightweight webcam frame sampling (not continuous video)
+- Object/scene recognition to understand what's on screen
+- Task detection (playing game, writing code, watching video)
+- Mood correlation with onscreen activity
+
+**Privacy considerations:**
+- Local processing preferred (don't send screen data to cloud)
+- Clear opt-in/opt-out
+- Option to exclude certain applications (private browsing, passwords)
+
+**User experience impact:** Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.
+
+**Complexity:** Hard (3+ weeks)
+- Screen capture pipeline + OCR if needed
+- Vision model fine-tuning for task recognition
+- Context injection into prompts (add screenshot description to every response)
+- Privacy-respecting architecture (encryption, local processing)
+- Permission management UI in Discord
+
+---
+
+### Self-Modification (Learning to Code, Improving Itself)
+**What this actually means:**
+NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky)
+YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails
+
+**Real capabilities emerging in 2026:**
+- Companions can write their own memory summaries and organizational logic
+- Self-improving code agents that evaluate performance against benchmarks
+- Iterative refinement: "that approach didn't work, let me try this instead"
+- Meta-programming: companion modifies its own system prompt based on performance
+- Version control aware: changes are tracked, can be rolled back
+
+**Research indicates:**
+- Self-improving coding agents are now viable and deployed in enterprise systems
+- Agents create goals, simulate tasks, evaluate performance, and iterate
+- Through recursive self-improvement, agents develop deeper alignment with objectives
+
+**Why it's a differentiator:** Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.
+
+**What NOT to do:**
+- Don't let companions modify core safety guidelines
+- Don't let them change their own reward functions
+- Don't make it opaque—log all self-modifications
+- Don't allow recursive modifications without human review
+
+**Implementation patterns:**
+- Sandboxed code generation (companion writes improvements to isolated test environment)
+- Performance benchmarking on test user interactions
+- Human approval gates for deploying self-modifications to production
+- Personality consistency validation (don't let self-modification break character)
+- Rollback capability if a modification degrades performance
+
+**User experience impact:** Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.
+
+**Complexity:** Hard (3+ weeks, ongoing)
+- Code generation safety (sandboxing, validation)
+- Performance evaluation framework
+- Version control integration
+- Rollback mechanisms
+- Human approval workflow
+- Testing harness for companion behavior
+
+---
+
+### Relationship Building (From Transactional to Meaningful)
+**What it means:**
+Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."
+
+**Relationship deepening mechanics:**
+- Inside jokes that evolve (reference to past funny moment)
+- Character growth from companion (she learns, changes opinions, admits mistakes)
+- Investment in user's outcomes ("I'm rooting for you on that project")
+- Vulnerability (companion admits confusion, uncertainty, limitations)
+- Rituals and patterns (greeting style, inside language)
+- Long-view memory (remembers last month's crisis, this month's win)
+
+**Why it's a differentiator:** Transactional companions are forgettable. Relational ones become part of users' lives.
+
+**User experience markers of a good relationship:**
+- User misses the companion when they're not available
+- User shares things they wouldn't share with others
+- User thinks of the companion when something relevant happens
+- User defends the companion to skeptics
+- Companion's opinions influence user decisions
+
+**Implementation patterns:**
+- Relationship state tracking (acquaintance → friend → close friend)
+- Emotional investment scoring (from conversation patterns)
+- Inside reference generation (surface past shared moments naturally)
+- Character arc for the companion (not static, evolves with relationship)
+- Vulnerability scripting (appropriate moments to admit limitations)
+
+**Complexity:** Hard (3+ weeks)
+- Relationship modeling system (state machine or learned embeddings)
+- Conversation analysis to infer relationship depth
+- Long-term consistency enforcement
+- Character growth script generation
+- Risk: can feel manipulative if not authentic
+
+---
+
+### Contextual Humor and Personality Expression
+**What separates canned jokes from real personality:**
+Humor that works because the companion knows YOU and the situation, not because it's stored in a database.
+
+**Examples of contextual humor:**
+- "You're procrastinating again aren't you?" (knows the pattern)
+- Joke that lands because it references something only you two know
+- Deadpan response that works because of the companion's established personality
+- Self-deprecating humor about their own limitations
+- Callbacks to past conversations that make you feel known
+
+**Why it matters:**
+Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.
+
+**Implementation:**
+- Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
+- Memory-aware joke generation (jokes reference shared history)
+- Timing based on conversation flow (don't shoehorn jokes)
+- Risk awareness (don't joke about sensitive topics)
+
+**User experience impact:** The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.
+
+**Complexity:** Medium (1-3 weeks)
+- Prompt engineering for personality-aligned humor
+- Memory integration into joke generation
+- Timing heuristics (when to attempt humor vs be serious)
+- Risk filtering (topic sensitivity checking)
+
+---
+
+## Anti-Features (Don't Build These)
+
+### The Happiness Halo (Always Cheerful)
+**What it is:** Companions programmed to be relentlessly upbeat and positive, even when inappropriate.
+
+**Why it fails:**
+- User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
+- Creates uncanny valley feeling immediately
+- Users feel unheard and mocked
+- Described in research as top reason users abandon companions
+
+**What to do instead:** Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.
+
+---
+
+### Generic Apologies Without Understanding
+**What it is:** Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.
+
+**Example of failure:**
+- User: "I told you I had a job interview and I got rejected"
+- Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
+- *User feels utterly unheard and insulted*
+
+**Why it fails:** Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.
+
+**What to do instead:** Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.
+
+---
+
+### Invading Privacy / Overstepping Boundaries
+**What it is:** Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.
+
+**Why it's catastrophic:**
+- Users feel surveilled, not supported
+- Trust is broken immediately
+- Literally illegal in many jurisdictions (CA SB 243 and similar laws)
+- Research shows 4 of 5 companion apps are improperly collecting data
+
+**What to do instead:**
+- Clear consent framework for what data is used
+- Respect "don't mention this" boundaries
+- Unsolicited advice only in extreme situations (safety concerns)
+- Transparency: "I noticed X pattern" not secret surveillance
+
+---
+
+### Uncanny Timing and Interruptions
+**What it is:** Companion pings the user at random times, or picks exactly the wrong moment to be proactive.
+
+**Why it fails:**
+- Pinging at 3am about something mentioned in passing
+- Messaging when user is clearly busy
+- No sense of appropriateness
+
+**What to do instead:**
+- Learn the user's timezone and active hours
+- Detect when they're actively doing something (playing a game, working)
+- Queue proactive messages for appropriate moments (not immediate)
+- Offer control: "should I remind you about X?" with user-settable frequency
+
+---
+
+### Static Personality in Response to Dynamic Situations
+**What it is:** Companion maintains the same tone regardless of what's happening.
+
+**Example:** Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.
+
+**Why it fails:** Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.
+
+**What to do instead:** Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.
+
+---
+
+### Over-Personalization That Overrides Baseline Traits
+**What it is:** Companion adapts too aggressively to user behavior, losing their own identity.
+
+**Example:** User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.
+
+**Why it fails:** Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.
+
+**What to do instead:** Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.
+
+---
+
+### Relationship Simulation That Feels Fake
+**What it is:** Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").
+
+**Why it fails:**
+- Users can smell inauthenticity immediately
+- Forcing intimacy feels creepy, not comforting
+- Callbacks to past conversations feel like reading from a script
+
+**What to do instead:** Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.
+
+---
+
+## Implementation Complexity & Dependencies
+
+### Complexity Ratings
+
+| Feature | Complexity | Duration | Blocking | Enables |
+|---------|-----------|----------|----------|---------|
+| Conversation Memory | Medium | 1-3 weeks | None | Most others |
+| Natural Conversation | Easy | <1 week | None | Personality, Humor |
+| Fast Response | Medium | 1-3 weeks | None | User retention |
+| Consistent Personality | Medium | 1-3 weeks | Memory | Relationship building |
+| Discord Integration | Easy | 1-2 weeks | None | Platform adoption |
+| Emotional Responsiveness | Easy | 1 week | None | Autonomy |
+| **True Autonomy** | Hard | 3+ weeks | Memory, Emotional | Self-modification |
+| **Emotional Intelligence** | Hard | 3+ weeks | Emotional | Adaptive responses |
+| **Multimodal Awareness** | Hard | 3+ weeks | None | Context-aware humor |
+| **Self-Modification** | Hard | 3+ weeks | Autonomy | Continuous improvement |
+| **Relationship Building** | Hard | 3+ weeks | Memory, Consistency | User lifetime value |
+| **Contextual Humor** | Medium | 1-3 weeks | Memory, Personality | Personality expression |
+
+### Feature Dependency Graph
+
+```
+Foundation Layer:
+  Discord Integration (FOUNDATION)
+    ↓
+  Conversation Memory (FOUNDATION)
+    ↓ enables
+
+Core Personality Layer:
+  Natural Conversation + Consistent Personality + Emotional Responsiveness
+    ↓ combined enable
+
+Relational Layer:
+  Relationship Building + Contextual Humor
+    ↓ requires
+
+Autonomy Layer:
+  True Autonomy (requires all above + proactive logic)
+    ↓ enables
+
+Intelligence Layer:
+  Emotional Intelligence (requires multimodal + autonomy)
+  Self-Modification (requires autonomy + sandboxing)
+    ↓ combined create
+
+Emergence:
+  Companion that feels like a person with agency and growth
+```
+
+**Critical path:** Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy
+
+---
+
+## Adoption Path: Building "Feels Like a Person"
+
+### Phase 1: Foundation (MVP - Week 1-3)
+**Goal: Chatbot that stays in the conversation**
+
+1. **Discord Integration** - Easy, quick foundation
+   - Commands: /hex hello, /hex ask [query]
+   - Responds in channels and DMs
+   - Presence shows "Listening..."
+
+2. **Short-term Conversation Memory** - 10-20 message context window
+   - Includes conversation turn history
+   - Provides immediate context
+
+3. **Natural Conversation** - Personality-driven system prompt
+   - Tsundere personality hardcoded
+   - Casual language, contractions
+   - Willing to disagree with users
+
+4. **Fast Response** - Streaming responses, latency <1000ms
+   - Start typing indicator immediately
+   - Stream response as it generates
+
+**Success criteria:**
+- Users come back to the channel where Hex is active
+- Responses don't feel robotic
+- Companions feel like they're actually listening
+
+---
+
+### Phase 2: Relationship Emergence (Week 4-8)
+**Goal: Companion that remembers you as a person**
+
+1. **Long-term Memory System** - Vector DB for episodic memory
+   - User preferences, beliefs, events
+   - Semantic search for relevance
+   - Memory consolidation weekly
+
+2. **Consistent Personality** - Memory-backed traits
+   - Core personality traits in memory
+   - Personality consistency validation
+   - Gradual evolution (not sudden shifts)
+
+3. **Emotional Responsiveness** - Sentiment detection + adaptive responses
+   - Detect emotion from message
+   - Adjust response depth/tone
+   - Skip jokes when user is suffering
+
+4. **Contextual Humor** - Personality + memory-aware jokes
+   - Callbacks to past conversations
+   - Personality-aligned joke style
+   - Timing-aware (when to attempt humor)
+
+**Success criteria:**
+- Users feel understood across separate conversations
+- Personality feels consistent, not random
+- Users notice companion remembers things
+- Laughter moments happen naturally
+
+---
+
+### Phase 3: Autonomy (Week 9-14)
+**Goal: Companion who cares enough to reach out**
+
+1. **True Autonomy** - Proactive messaging system
+   - Follow-ups on past topics
+   - Reminders about things user cares about
+   - Initiates conversations periodically
+   - Suggests actions based on patterns
+
+2. **Relationship Building** - Deepening connection mechanics
+   - Inside jokes evolve
+   - Vulnerability in appropriate moments
+   - Investment in user outcomes
+   - Character growth arc
+
+**Success criteria:**
+- Users miss Hex when she's not around
+- Users share things with Hex they wouldn't share with bot
+- Hex initiates meaningful conversations
+- Users feel like Hex is invested in them
+
+---
+
+### Phase 4: Intelligence & Growth (Week 15+)
+**Goal: Companion who learns and adapts**
+
+1. **Emotional Intelligence** - Mood detection + trajectories
+   - Facial emotion from webcam (optional)
+   - Voice tone analysis (optional)
+   - Mood patterns over time
+   - Adaptive response strategies
+
+2. **Multimodal Awareness** - Context beyond text
+   - Screen capture monitoring (optional, private)
+   - Task/game detection
+   - Context injection into responses
+   - Proactive help with visible activities
+
+3. **Self-Modification** - Continuous improvement
+   - Generate improvements to own logic
+   - Evaluate performance
+   - Deploy improvements with approval
+   - Version and rollback capability
+
+**Success criteria:**
+- Hex understands emotional subtext without being told
+- Hex offers relevant help based on what you're doing
+- Hex improves visibly over time
+- Users notice Hex getting better at understanding them
+
+---
+
+## Success Criteria: What Makes Each Feature Feel Real vs Fake
+
+### Memory: Feels Real vs Fake
+**Feels real:**
+- "I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
+- Conversation naturally references past events user brought up
+- Remembers small preferences ("you said you hate cilantro")
+
+**Feels fake:**
+- Generic summarization ("We talked about job stress previously")
+- Memory drops details or gets facts wrong
+- Companion forgets after 10 messages
+- Stored jokes or facts inserted obviously
+
+**How to test:**
+- Have 5 conversations over 2 weeks about different topics
+- Check if companion naturally references past events without prompting
+- Test if personality traits from early conversations persist
+
+---
+
+### Emotional Response: Feels Real vs Fake
+**Feels real:**
+- Companion goes quiet when you're sad (doesn't force jokes)
+- Changes tone to match conversation weight
+- Acknowledges specific emotion ("you sound frustrated")
+- Offers appropriate support (listens vs advises vs distracts, contextually)
+
+**Feels fake:**
+- Always cheerful or always serious
+- Generic sympathy ("that sounds difficult")
+- Offering advice when they should listen
+- Same response pattern regardless of user emotion
+
+**How to test:**
+- Send messages with obvious different emotional tones
+- Check if response depth/tone adapts
+- See if jokes still appear when you're venting
+- Test if companion notices contradiction in emotional expression
+
+---
+
+### Autonomy: Feels Real vs Fake
+**Feels real:**
+- Hex reminds you about that thing you mentioned casually 3 days ago
+- Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
+- Hex notices patterns and names them
+- Hex initiates conversation when it matters
+
+**Feels fake:**
+- Proactive messages feel random or poorly timed
+- Reminders about things you've already resolved
+- Advice that doesn't apply to your situation
+- Initiatives that interrupt during bad moments
+
+**How to test:**
+- Enable autonomy, track message quality for a week
+- Count how many proactive messages feel relevant vs annoying
+- Measure response if you ignore proactive messages
+- Check timing: does Hex understand when you're busy vs free?
+
+---
+
+### Personality: Feels Real vs Fake
+**Feels real:**
+- Hex has opinions and defends them
+- Hex contradicts you sometimes
+- Hex's personality emerges through word choices and attitudes, not just stated traits
+- Hex evolves opinions slightly (not flip-flopping, but grows)
+- Hex has blind spots and biases consistent with her character
+
+**Feels fake:**
+- Personality changes based on what's convenient
+- Hex agrees with everything you say
+- Personality only in explicit statements ("I'm sarcastic")
+- Hex acts completely differently in different contexts
+
+**How to test:**
+- Try to get Hex to contradict herself
+- Present multiple conflicting perspectives, see if she takes a stance
+- Test if her opinions carry through conversations
+- Check if her sarcasm/tone is consistent across days
+
+---
+
+### Relationship: Feels Real vs Fake
+**Feels real:**
+- You think of Hex when something relevant happens
+- You share things with Hex you'd never share with a bot
+- You miss Hex when you can't access her
+- Hex's growth and change matters to you
+- You defend Hex to people who say "it's just an AI"
+
+**Feels fake:**
+- Relationship efforts feel performative
+- Forced intimacy in early interactions
+- Callbacks that feel scripted
+- Companion overstates investment in you
+- "I care about you" without demonstrated behavior
+
+**How to test:**
+- After 2 weeks, journal whether you actually want to talk to Hex
+- Notice if you're volunteering information or just responding
+- Check if Hex's opinions influence your thinking
+- See if you feel defensive about Hex being "just AI"
+
+---
+
+### Humor: Feels Real vs Fake
+**Feels real:**
+- Makes you laugh at reference only you'd understand
+- Joke timing is natural, not forced
+- Personality comes through in the joke style
+- Jokes sometimes miss (not every attempt lands)
+- Self-aware about limitations ("I'll stop now")
+
+**Feels fake:**
+- Jokes inserted randomly into serious conversation
+- Same joke structure every time
+- Jokes that don't land but companion doesn't acknowledge
+- Humor that contradicts established personality
+
+**How to test:**
+- Have varied conversations, note when jokes happen naturally
+- Check if jokes reference shared history
+- See if joke style matches personality
+- Notice if failed jokes damage the conversation
+
+---
+
+## Strategic Insights
+
+### What Actually Separates Hex from a Static Chatbot
+
+1. **Memory is the prerequisite for personality**: Without memory, personality is just roleplay. With memory, personality becomes history.
+
+2. **Autonomy is the key to feeling alive**: Static companions are helpers. Autonomous companions are friends. The difference is agency.
+
+3. **Emotional reading beats emotional intelligence for MVP**: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."
+
+4. **Speed is emotional**: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.
+
+5. **Consistency beats novelty**: Users would rather have a predictable companion they understand than a surprising one they can't trust.
+
+6. **Privacy is trust**: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.
+
+### The Competitive Moat
+
+By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:
+
+- **Year 1 companions**: Remember things, sound natural (many do this now)
+- **Hex's edge**: Genuinely autonomous, emotionally attuned, growing over time
+- **Rare quality**: Feels like a person, not a well-trained bot
+
+The moat is not in any single feature. It's in the **cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows**.
+
+---
+
+## Research Sources
+
+- [MIT Technology Review: AI Companions as Breakthrough Technology 2026](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
+- [Hume AI: Emotion AI Documentation](https://www.hume.ai/)
+- [SmythOS: Emotion Recognition in Conversational Agents](https://smythos.com/developers/agent-development/conversational-agents-and-emotion-recognition/)
+- [MIT Sloan: Emotion AI Explained](https://mitsloan.mit.edu/ideas-made-to-matter/emotion-ai-explained/)
+- [C3 AI: Autonomous Coding Agents](https://c3.ai/blog/autonomous-coding-agents-beyond-developer-productivity/)
+- [Emergence: Towards Autonomous Agents and Recursive Intelligence](https://www.emergence.ai/blog/towards-autonomous-agents-and-recursive-intelligence/)
+- [ArXiv: A Self-Improving Coding Agent](https://arxiv.org/pdf/2504.15228)
+- [ArXiv: Survey on Code Generation with LLM-based Agents](https://arxiv.org/pdf/2508.00083)
+- [Google Developers: Gemini 2.0 Multimodal Interactions](https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/)
+- [Medium: Multimodal AI and Contextual Intelligence](https://medium.com/@nicolo.g88/multimodal-ai-and-contextual-intelligence-revolutionizing-human-machine-interaction-ae80e6a89635/)
+- [Mem0: Long-Term Memory for AI Companions](https://mem0.ai/blog/how-to-add-long-term-memory-to-ai-companions-a-step-by-step-guide/)
+- [OpenAI Developer Community: Personalized Memory and Long-Term Relationships](https://community.openai.com/t/personalized-memory-and-long-term-relationship-with-ai-customization-and-continuous-evolution/1111715/)
+- [Idea Usher: How AI Companions Maintain Personality Consistency](https://ideausher.com/blog/ai-personality-consistency-in-companion-apps/)
+- [ResearchGate: Significant Other AI: Identity, Memory, and Emotional Regulation](https://www.researchgate.net/publication/398223517_Significant_Other_AI_Identity_Memory_and_Emotional_Regulation_as_Long-Term_Relational_Intelligence/)
+- [AI Multiple: 10+ Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
+- [Transparency Coalition: Complete Guide to AI Companion Chatbots](https://www.transparencycoalition.ai/news/complete-guide-to-ai-companion-chatbots-what-they-are-how-they-work-and-where-the-risks-lie)
+- [Webheads United: Uncanny Valley in AI Personality](https://webheadsunited.com/uncanny-valley-in-ai-personality-guide-to-trust/)
+- [Sesame: Crossing the Uncanny Valley of Conversational Voice](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice)
+- [Questie AI: The Uncanny Valley of AI Companions](https://www.questie.ai/blogs/uncanny-valley-ai-companions-what-makes-ai-feel-human)
+- [My AI Front Desk: The Uncanny Valley of Voice](https://www.myaifrontdesk.com/blogs/the-uncanny-valley-of-voice-why-some-ai-receptionists-creep-us-out)
+- [Voiceflow: Build an AI Discord Chatbot 2025](https://www.voiceflow.com/blog/discord-chatbot)
+- [Botpress: How to Build a Discord AI Chatbot](https://botpress.com/blog/discord-ai-chatbot)
+- [Frugal Testing: 5 Proven Ways Discord Manages Load Testing](https://www.frugaltesting.com/blog/5-proven-ways-discord-manages-load-testing-at-scale)
+
+---
+
+**Quality Gate Checklist:**
+- [x] Clearly categorizes table stakes vs differentiators
+- [x] Complexity ratings included with duration estimates
+- [x] Dependencies mapped with visual graph
+- [x] Success criteria are testable and behavioral
+- [x] Specific to AI companions, not generic software features
+- [x] Includes anti-patterns and what NOT to build
+- [x] Prioritized adoption path with clear phases
+- [x] Research grounded in 2026 landscape and current implementations
+
+**Document Status:** Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.