docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

## Stack Analysis
- Llama 3.1 8B Instruct (128K context, 4-bit quantized)
- Discord.py 2.6.4+ async-native framework
- Ollama for local inference, ChromaDB for semantic memory
- Whisper Large V3 + Kokoro 82M (privacy-first speech)
- VRoid avatar + Discord screen share integration

## Architecture
- 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
- Personality-first design; memory and consistency foundational
- All perception async (separate thread, never blocks responses)
- Self-modification sandboxed with mandatory user approval

## Critical Path
Phase 1: Core LLM + Discord integration + SQLite memory
Phase 2: Vector DB + personality versioning + consistency audits
Phase 3: Perception layer (webcam/screen, isolated thread)
Phase 4: Autonomy + relationship deepening + inside jokes
Phase 5: Self-modification capability (gamified, gated)
Phase 6: Production hardening + monitoring + scaling

## Key Pitfalls to Avoid
1. Personality drift (weekly consistency audits required)
2. Tsundere breaking (formalize denial rules; scale with relationship)
3. Memory bloat (hierarchical memory with archival)
4. Latency creep (async/await throughout; perception isolated)
5. Runaway self-modification (approval gates + rollback non-negotiable)

## Confidence
HIGH. Stack proven, architecture coherent, dependencies clear.
Ready for detailed requirements and Phase 1 planning.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Dani B
2026-01-27 23:55:39 -05:00
parent f9f21944e3
commit d0a1ecfc3d
5 changed files with 4514 additions and 0 deletions

View File

@@ -0,0 +1,811 @@
# Features Research: AI Companions in 2026
## Executive Summary
AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on **autonomy**, **emotional intelligence**, and **contextual awareness**. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.
---
## Table Stakes (v1 Essential)
### Conversation Memory (Short + Long-term)
**Why users expect it:** Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.
**Implementation patterns:**
- **Short-term context**: Last 10-20 messages per conversation window (standard context window management)
- **Long-term memory**: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
- **Episodic memory**: Date-stamped summaries of past conversations for temporal awareness
**User experience impact:** The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.
**Complexity:** Medium (1-3 weeks)
- Vector database integration (Pinecone, Weaviate, or similar)
- Memory consolidation strategies to avoid context bloat
- Retrieval mechanisms that surface relevant past interactions
---
### Natural Conversation (Not Robotic, Personality-Driven)
**Why users expect it:** Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.
**What makes conversation natural:**
- Contractions, casual language, slang (not formal prose)
- Personality quirks in response patterns
- Context-appropriate tone shifts (serious when needed, joking otherwise)
- Ability to disagree, be sarcastic, or pushback on bad ideas
- Conversation markers ("honestly", "wait", "actually") that break up formal rhythm
**User experience impact:** One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.
**Complexity:** Easy (embedded in LLM capability + prompt engineering)
- System prompt refinement for personality expression
- Temperature/sampling tuning (not deterministic, not chaotic)
- Iterative user feedback on tone
---
### Fast Response Times
**Why users expect it:** In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.
**Discord baseline expectations:**
- <100ms to acknowledge (typing indicator)
- <1000ms to first response chunk (ideally 500ms)
- <3000ms for full multi-line response
**What breaks the experience:**
- Waiting for API calls to complete before responding (use streaming)
- Cold starts on serverless infrastructure
- Slow vector DB queries for memory retrieval
- Database round-trips that weren't cached
**User experience impact:** Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.
**Complexity:** Medium (1-3 weeks)
- Response streaming (start typing indicator immediately)
- Memory retrieval optimization (caching, smart indexing)
- Infrastructure: fast API routes, edge-deployed models if possible
- Async/concurrent processing of memory lookups and generation
---
### Consistent Personality
**Why users expect it:** Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.
**What drives inconsistency:**
- Different LLM outputs from same prompt (temperature-based randomness)
- Memory that contradicts previous stated beliefs
- Personality traits that aren't memory-backed (just in prompt)
- Adaptation that overrides baseline traits
**Memory-backed personality means:**
- Core traits are stated in long-term memory ("I'm cynical about human nature")
- Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
- Contradiction detection and resolution
- Personality summaries that get updated, not just individual memories
**User experience impact:** Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.
**Complexity:** Medium (1-3 weeks)
- Personality embedding in memory system
- Consistency checks on memory updates
- Personality evolution logging
- Conflict resolution between new input and stored traits
---
### Platform Integration (Discord Voice + Text)
**Why users expect it:** The companion should live naturally in Discord's ecosystem, not require switching platforms.
**Discord-specific needs:**
- Text channel message responses with proper mentions/formatting
- React to messages with emojis
- Slash command integration (/hex status, /hex mood)
- Voice channel presence (ideally can join and listen)
- Direct messages (DMs) for private conversations
- Role/permission awareness (don't act like a mod if not)
- Server-specific personality variations (different vibe in gaming server vs study server)
**User experience impact:** If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.
**Complexity:** Easy (1-2 weeks)
- Discord.py or discord.js library handling
- Presence/activity management
- Voice endpoint integration (existing libraries handle most)
- Server context injection into prompts
---
### Emotional Responsiveness (At Least Read-the-Room)
**Why users expect it:** The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.
**Baseline emotional awareness includes:**
- Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
- Tone detection (sarcasm, frustration, excitement)
- Topic sensitivity (don't joke about topics user is clearly struggling with)
- Adaptive response depth (brief response for light mood, longer engagement for distress)
**What this is NOT:** This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.
**User experience impact:** Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.
**Complexity:** Easy-Medium (1 week)
- Sentiment classifier (HuggingFace models available pre-built)
- Prompt engineering to encode mood (inject sentiment score into system prompt)
- Instruction-tuning to respond proportionally to emotional weight
---
## Differentiators (Competitive Edge)
### True Autonomy (Proactive Agency)
**What separates autonomous agents from chatbots:**
The difference between "ask me anything" and "I'm going to tell you when I think you should know something."
**Autonomous behaviors:**
- Initiating conversation about topics the user cares about (without being prompted)
- Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
- Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
- Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
- Flagging contradictions in user statements
- Following up on unresolved topics from previous conversations
**Why it's a differentiator:** Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.
**Implementation challenge:**
- Requires memory system to track user states and topics over time
- Needs periodic proactive message generation (runs on schedule, not only on user input)
- Temperature and generation parameters must allow surprising outputs (not just safe responses)
- Requires user permission framework (don't interrupt them)
**User experience impact:** Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."
**Complexity:** Hard (3+ weeks)
- Proactive messaging system architecture
- User state inference engine (from memory)
- Topic tracking and follow-up logic
- Interruption timing heuristics (don't ping them at 3am)
- User preference model (how much proactivity do they want?)
---
### Emotional Intelligence (Mood Detection + Adaptive Response)
**What goes beyond just reading the room:**
- Real-time emotion detection from webcam/audio (not just text sentiment)
- Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
- Adaptive response strategy based on user's emotional trajectory
- Knowing when to listen vs offer advice vs make them laugh
- Recognizing when emotions are mismatched to situation (overreacting, underreacting)
**Current research shows:**
- CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
- Voice analysis can detect emotional state with similar accuracy
- Companies using emotion AI report 25% increase in positive sentiment outcomes
- Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks
**Why it's a differentiator:** Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."
**Implementation patterns:**
- Webcam feed processing (screen capture of face detection)
- Voice tone analysis from Discord audio
- Combine emotional signals: text sentiment + vocal tone + facial expression
- Store emotion timeseries (track mood patterns across days/weeks)
**User experience impact:** Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."
**Complexity:** Hard (3+ weeks, ongoing iteration)
- Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
- Audio analysis for vocal emotion (prosody features)
- Temporal emotion state tracking
- Prompt engineering to use emotional context in responses
- Privacy handling (webcam/audio consent, local processing preferred)
---
### Multimodal Awareness (Webcam + Screen + Context)
**What it means beyond text:**
- Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
- Understanding their physical environment via webcam
- Contextualizing responses based on what they're actually doing
- Proactively helping with the task at hand (not just chatting)
**Real-world examples emerging in 2026:**
- "I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
- Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
- Understanding when the user is in a meeting vs free to chat
- Recognizing when they're working on something and offering relevant help
**Why it's a differentiator:** Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."
**Technical implementation:**
- Periodic screen capture (every 5-10 seconds, only when user opts in)
- Lightweight webcam frame sampling (not continuous video)
- Object/scene recognition to understand what's on screen
- Task detection (playing game, writing code, watching video)
- Mood correlation with onscreen activity
**Privacy considerations:**
- Local processing preferred (don't send screen data to cloud)
- Clear opt-in/opt-out
- Option to exclude certain applications (private browsing, passwords)
**User experience impact:** Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.
**Complexity:** Hard (3+ weeks)
- Screen capture pipeline + OCR if needed
- Vision model fine-tuning for task recognition
- Context injection into prompts (add screenshot description to every response)
- Privacy-respecting architecture (encryption, local processing)
- Permission management UI in Discord
---
### Self-Modification (Learning to Code, Improving Itself)
**What this actually means:**
NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky)
YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails
**Real capabilities emerging in 2026:**
- Companions can write their own memory summaries and organizational logic
- Self-improving code agents that evaluate performance against benchmarks
- Iterative refinement: "that approach didn't work, let me try this instead"
- Meta-programming: companion modifies its own system prompt based on performance
- Version control aware: changes are tracked, can be rolled back
**Research indicates:**
- Self-improving coding agents are now viable and deployed in enterprise systems
- Agents create goals, simulate tasks, evaluate performance, and iterate
- Through recursive self-improvement, agents develop deeper alignment with objectives
**Why it's a differentiator:** Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.
**What NOT to do:**
- Don't let companions modify core safety guidelines
- Don't let them change their own reward functions
- Don't make it opaque—log all self-modifications
- Don't allow recursive modifications without human review
**Implementation patterns:**
- Sandboxed code generation (companion writes improvements to isolated test environment)
- Performance benchmarking on test user interactions
- Human approval gates for deploying self-modifications to production
- Personality consistency validation (don't let self-modification break character)
- Rollback capability if a modification degrades performance
**User experience impact:** Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.
**Complexity:** Hard (3+ weeks, ongoing)
- Code generation safety (sandboxing, validation)
- Performance evaluation framework
- Version control integration
- Rollback mechanisms
- Human approval workflow
- Testing harness for companion behavior
---
### Relationship Building (From Transactional to Meaningful)
**What it means:**
Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."
**Relationship deepening mechanics:**
- Inside jokes that evolve (reference to past funny moment)
- Character growth from companion (she learns, changes opinions, admits mistakes)
- Investment in user's outcomes ("I'm rooting for you on that project")
- Vulnerability (companion admits confusion, uncertainty, limitations)
- Rituals and patterns (greeting style, inside language)
- Long-view memory (remembers last month's crisis, this month's win)
**Why it's a differentiator:** Transactional companions are forgettable. Relational ones become part of users' lives.
**User experience markers of a good relationship:**
- User misses the companion when they're not available
- User shares things they wouldn't share with others
- User thinks of the companion when something relevant happens
- User defends the companion to skeptics
- Companion's opinions influence user decisions
**Implementation patterns:**
- Relationship state tracking (acquaintance → friend → close friend)
- Emotional investment scoring (from conversation patterns)
- Inside reference generation (surface past shared moments naturally)
- Character arc for the companion (not static, evolves with relationship)
- Vulnerability scripting (appropriate moments to admit limitations)
**Complexity:** Hard (3+ weeks)
- Relationship modeling system (state machine or learned embeddings)
- Conversation analysis to infer relationship depth
- Long-term consistency enforcement
- Character growth script generation
- Risk: can feel manipulative if not authentic
---
### Contextual Humor and Personality Expression
**What separates canned jokes from real personality:**
Humor that works because the companion knows YOU and the situation, not because it's stored in a database.
**Examples of contextual humor:**
- "You're procrastinating again aren't you?" (knows the pattern)
- Joke that lands because it references something only you two know
- Deadpan response that works because of the companion's established personality
- Self-deprecating humor about their own limitations
- Callbacks to past conversations that make you feel known
**Why it matters:**
Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.
**Implementation:**
- Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
- Memory-aware joke generation (jokes reference shared history)
- Timing based on conversation flow (don't shoehorn jokes)
- Risk awareness (don't joke about sensitive topics)
**User experience impact:** The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.
**Complexity:** Medium (1-3 weeks)
- Prompt engineering for personality-aligned humor
- Memory integration into joke generation
- Timing heuristics (when to attempt humor vs be serious)
- Risk filtering (topic sensitivity checking)
---
## Anti-Features (Don't Build These)
### The Happiness Halo (Always Cheerful)
**What it is:** Companions programmed to be relentlessly upbeat and positive, even when inappropriate.
**Why it fails:**
- User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
- Creates uncanny valley feeling immediately
- Users feel unheard and mocked
- Described in research as top reason users abandon companions
**What to do instead:** Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.
---
### Generic Apologies Without Understanding
**What it is:** Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.
**Example of failure:**
- User: "I told you I had a job interview and I got rejected"
- Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
- *User feels utterly unheard and insulted*
**Why it fails:** Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.
**What to do instead:** Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.
---
### Invading Privacy / Overstepping Boundaries
**What it is:** Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.
**Why it's catastrophic:**
- Users feel surveilled, not supported
- Trust is broken immediately
- Literally illegal in many jurisdictions (CA SB 243 and similar laws)
- Research shows 4 of 5 companion apps are improperly collecting data
**What to do instead:**
- Clear consent framework for what data is used
- Respect "don't mention this" boundaries
- Unsolicited advice only in extreme situations (safety concerns)
- Transparency: "I noticed X pattern" not secret surveillance
---
### Uncanny Timing and Interruptions
**What it is:** Companion pings the user at random times, or picks exactly the wrong moment to be proactive.
**Why it fails:**
- Pinging at 3am about something mentioned in passing
- Messaging when user is clearly busy
- No sense of appropriateness
**What to do instead:**
- Learn the user's timezone and active hours
- Detect when they're actively doing something (playing a game, working)
- Queue proactive messages for appropriate moments (not immediate)
- Offer control: "should I remind you about X?" with user-settable frequency
---
### Static Personality in Response to Dynamic Situations
**What it is:** Companion maintains the same tone regardless of what's happening.
**Example:** Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.
**Why it fails:** Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.
**What to do instead:** Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.
---
### Over-Personalization That Overrides Baseline Traits
**What it is:** Companion adapts too aggressively to user behavior, losing their own identity.
**Example:** User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.
**Why it fails:** Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.
**What to do instead:** Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.
---
### Relationship Simulation That Feels Fake
**What it is:** Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").
**Why it fails:**
- Users can smell inauthenticity immediately
- Forcing intimacy feels creepy, not comforting
- Callbacks to past conversations feel like reading from a script
**What to do instead:** Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.
---
## Implementation Complexity & Dependencies
### Complexity Ratings
| Feature | Complexity | Duration | Blocking | Enables |
|---------|-----------|----------|----------|---------|
| Conversation Memory | Medium | 1-3 weeks | None | Most others |
| Natural Conversation | Easy | <1 week | None | Personality, Humor |
| Fast Response | Medium | 1-3 weeks | None | User retention |
| Consistent Personality | Medium | 1-3 weeks | Memory | Relationship building |
| Discord Integration | Easy | 1-2 weeks | None | Platform adoption |
| Emotional Responsiveness | Easy | 1 week | None | Autonomy |
| **True Autonomy** | Hard | 3+ weeks | Memory, Emotional | Self-modification |
| **Emotional Intelligence** | Hard | 3+ weeks | Emotional | Adaptive responses |
| **Multimodal Awareness** | Hard | 3+ weeks | None | Context-aware humor |
| **Self-Modification** | Hard | 3+ weeks | Autonomy | Continuous improvement |
| **Relationship Building** | Hard | 3+ weeks | Memory, Consistency | User lifetime value |
| **Contextual Humor** | Medium | 1-3 weeks | Memory, Personality | Personality expression |
### Feature Dependency Graph
```
Foundation Layer:
Discord Integration (FOUNDATION)
Conversation Memory (FOUNDATION)
↓ enables
Core Personality Layer:
Natural Conversation + Consistent Personality + Emotional Responsiveness
↓ combined enable
Relational Layer:
Relationship Building + Contextual Humor
↓ requires
Autonomy Layer:
True Autonomy (requires all above + proactive logic)
↓ enables
Intelligence Layer:
Emotional Intelligence (requires multimodal + autonomy)
Self-Modification (requires autonomy + sandboxing)
↓ combined create
Emergence:
Companion that feels like a person with agency and growth
```
**Critical path:** Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy
---
## Adoption Path: Building "Feels Like a Person"
### Phase 1: Foundation (MVP - Week 1-3)
**Goal: Chatbot that stays in the conversation**
1. **Discord Integration** - Easy, quick foundation
- Commands: /hex hello, /hex ask [query]
- Responds in channels and DMs
- Presence shows "Listening..."
2. **Short-term Conversation Memory** - 10-20 message context window
- Includes conversation turn history
- Provides immediate context
3. **Natural Conversation** - Personality-driven system prompt
- Tsundere personality hardcoded
- Casual language, contractions
- Willing to disagree with users
4. **Fast Response** - Streaming responses, latency <1000ms
- Start typing indicator immediately
- Stream response as it generates
**Success criteria:**
- Users come back to the channel where Hex is active
- Responses don't feel robotic
- Companions feel like they're actually listening
---
### Phase 2: Relationship Emergence (Week 4-8)
**Goal: Companion that remembers you as a person**
1. **Long-term Memory System** - Vector DB for episodic memory
- User preferences, beliefs, events
- Semantic search for relevance
- Memory consolidation weekly
2. **Consistent Personality** - Memory-backed traits
- Core personality traits in memory
- Personality consistency validation
- Gradual evolution (not sudden shifts)
3. **Emotional Responsiveness** - Sentiment detection + adaptive responses
- Detect emotion from message
- Adjust response depth/tone
- Skip jokes when user is suffering
4. **Contextual Humor** - Personality + memory-aware jokes
- Callbacks to past conversations
- Personality-aligned joke style
- Timing-aware (when to attempt humor)
**Success criteria:**
- Users feel understood across separate conversations
- Personality feels consistent, not random
- Users notice companion remembers things
- Laughter moments happen naturally
---
### Phase 3: Autonomy (Week 9-14)
**Goal: Companion who cares enough to reach out**
1. **True Autonomy** - Proactive messaging system
- Follow-ups on past topics
- Reminders about things user cares about
- Initiates conversations periodically
- Suggests actions based on patterns
2. **Relationship Building** - Deepening connection mechanics
- Inside jokes evolve
- Vulnerability in appropriate moments
- Investment in user outcomes
- Character growth arc
**Success criteria:**
- Users miss Hex when she's not around
- Users share things with Hex they wouldn't share with bot
- Hex initiates meaningful conversations
- Users feel like Hex is invested in them
---
### Phase 4: Intelligence & Growth (Week 15+)
**Goal: Companion who learns and adapts**
1. **Emotional Intelligence** - Mood detection + trajectories
- Facial emotion from webcam (optional)
- Voice tone analysis (optional)
- Mood patterns over time
- Adaptive response strategies
2. **Multimodal Awareness** - Context beyond text
- Screen capture monitoring (optional, private)
- Task/game detection
- Context injection into responses
- Proactive help with visible activities
3. **Self-Modification** - Continuous improvement
- Generate improvements to own logic
- Evaluate performance
- Deploy improvements with approval
- Version and rollback capability
**Success criteria:**
- Hex understands emotional subtext without being told
- Hex offers relevant help based on what you're doing
- Hex improves visibly over time
- Users notice Hex getting better at understanding them
---
## Success Criteria: What Makes Each Feature Feel Real vs Fake
### Memory: Feels Real vs Fake
**Feels real:**
- "I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
- Conversation naturally references past events user brought up
- Remembers small preferences ("you said you hate cilantro")
**Feels fake:**
- Generic summarization ("We talked about job stress previously")
- Memory drops details or gets facts wrong
- Companion forgets after 10 messages
- Stored jokes or facts inserted obviously
**How to test:**
- Have 5 conversations over 2 weeks about different topics
- Check if companion naturally references past events without prompting
- Test if personality traits from early conversations persist
---
### Emotional Response: Feels Real vs Fake
**Feels real:**
- Companion goes quiet when you're sad (doesn't force jokes)
- Changes tone to match conversation weight
- Acknowledges specific emotion ("you sound frustrated")
- Offers appropriate support (listens vs advises vs distracts, contextually)
**Feels fake:**
- Always cheerful or always serious
- Generic sympathy ("that sounds difficult")
- Offering advice when they should listen
- Same response pattern regardless of user emotion
**How to test:**
- Send messages with obvious different emotional tones
- Check if response depth/tone adapts
- See if jokes still appear when you're venting
- Test if companion notices contradiction in emotional expression
---
### Autonomy: Feels Real vs Fake
**Feels real:**
- Hex reminds you about that thing you mentioned casually 3 days ago
- Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
- Hex notices patterns and names them
- Hex initiates conversation when it matters
**Feels fake:**
- Proactive messages feel random or poorly timed
- Reminders about things you've already resolved
- Advice that doesn't apply to your situation
- Initiatives that interrupt during bad moments
**How to test:**
- Enable autonomy, track message quality for a week
- Count how many proactive messages feel relevant vs annoying
- Measure response if you ignore proactive messages
- Check timing: does Hex understand when you're busy vs free?
---
### Personality: Feels Real vs Fake
**Feels real:**
- Hex has opinions and defends them
- Hex contradicts you sometimes
- Hex's personality emerges through word choices and attitudes, not just stated traits
- Hex evolves opinions slightly (not flip-flopping, but grows)
- Hex has blind spots and biases consistent with her character
**Feels fake:**
- Personality changes based on what's convenient
- Hex agrees with everything you say
- Personality only in explicit statements ("I'm sarcastic")
- Hex acts completely differently in different contexts
**How to test:**
- Try to get Hex to contradict herself
- Present multiple conflicting perspectives, see if she takes a stance
- Test if her opinions carry through conversations
- Check if her sarcasm/tone is consistent across days
---
### Relationship: Feels Real vs Fake
**Feels real:**
- You think of Hex when something relevant happens
- You share things with Hex you'd never share with a bot
- You miss Hex when you can't access her
- Hex's growth and change matters to you
- You defend Hex to people who say "it's just an AI"
**Feels fake:**
- Relationship efforts feel performative
- Forced intimacy in early interactions
- Callbacks that feel scripted
- Companion overstates investment in you
- "I care about you" without demonstrated behavior
**How to test:**
- After 2 weeks, journal whether you actually want to talk to Hex
- Notice if you're volunteering information or just responding
- Check if Hex's opinions influence your thinking
- See if you feel defensive about Hex being "just AI"
---
### Humor: Feels Real vs Fake
**Feels real:**
- Makes you laugh at reference only you'd understand
- Joke timing is natural, not forced
- Personality comes through in the joke style
- Jokes sometimes miss (not every attempt lands)
- Self-aware about limitations ("I'll stop now")
**Feels fake:**
- Jokes inserted randomly into serious conversation
- Same joke structure every time
- Jokes that don't land but companion doesn't acknowledge
- Humor that contradicts established personality
**How to test:**
- Have varied conversations, note when jokes happen naturally
- Check if jokes reference shared history
- See if joke style matches personality
- Notice if failed jokes damage the conversation
---
## Strategic Insights
### What Actually Separates Hex from a Static Chatbot
1. **Memory is the prerequisite for personality**: Without memory, personality is just roleplay. With memory, personality becomes history.
2. **Autonomy is the key to feeling alive**: Static companions are helpers. Autonomous companions are friends. The difference is agency.
3. **Emotional reading beats emotional intelligence for MVP**: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."
4. **Speed is emotional**: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.
5. **Consistency beats novelty**: Users would rather have a predictable companion they understand than a surprising one they can't trust.
6. **Privacy is trust**: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.
### The Competitive Moat
By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:
- **Year 1 companions**: Remember things, sound natural (many do this now)
- **Hex's edge**: Genuinely autonomous, emotionally attuned, growing over time
- **Rare quality**: Feels like a person, not a well-trained bot
The moat is not in any single feature. It's in the **cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows**.
---
## Research Sources
- [MIT Technology Review: AI Companions as Breakthrough Technology 2026](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
- [Hume AI: Emotion AI Documentation](https://www.hume.ai/)
- [SmythOS: Emotion Recognition in Conversational Agents](https://smythos.com/developers/agent-development/conversational-agents-and-emotion-recognition/)
- [MIT Sloan: Emotion AI Explained](https://mitsloan.mit.edu/ideas-made-to-matter/emotion-ai-explained/)
- [C3 AI: Autonomous Coding Agents](https://c3.ai/blog/autonomous-coding-agents-beyond-developer-productivity/)
- [Emergence: Towards Autonomous Agents and Recursive Intelligence](https://www.emergence.ai/blog/towards-autonomous-agents-and-recursive-intelligence/)
- [ArXiv: A Self-Improving Coding Agent](https://arxiv.org/pdf/2504.15228)
- [ArXiv: Survey on Code Generation with LLM-based Agents](https://arxiv.org/pdf/2508.00083)
- [Google Developers: Gemini 2.0 Multimodal Interactions](https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/)
- [Medium: Multimodal AI and Contextual Intelligence](https://medium.com/@nicolo.g88/multimodal-ai-and-contextual-intelligence-revolutionizing-human-machine-interaction-ae80e6a89635/)
- [Mem0: Long-Term Memory for AI Companions](https://mem0.ai/blog/how-to-add-long-term-memory-to-ai-companions-a-step-by-step-guide/)
- [OpenAI Developer Community: Personalized Memory and Long-Term Relationships](https://community.openai.com/t/personalized-memory-and-long-term-relationship-with-ai-customization-and-continuous-evolution/1111715/)
- [Idea Usher: How AI Companions Maintain Personality Consistency](https://ideausher.com/blog/ai-personality-consistency-in-companion-apps/)
- [ResearchGate: Significant Other AI: Identity, Memory, and Emotional Regulation](https://www.researchgate.net/publication/398223517_Significant_Other_AI_Identity_Memory_and_Emotional_Regulation_as_Long-Term_Relational_Intelligence/)
- [AI Multiple: 10+ Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
- [Transparency Coalition: Complete Guide to AI Companion Chatbots](https://www.transparencycoalition.ai/news/complete-guide-to-ai-companion-chatbots-what-they-are-how-they-work-and-where-the-risks-lie)
- [Webheads United: Uncanny Valley in AI Personality](https://webheadsunited.com/uncanny-valley-in-ai-personality-guide-to-trust/)
- [Sesame: Crossing the Uncanny Valley of Conversational Voice](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice)
- [Questie AI: The Uncanny Valley of AI Companions](https://www.questie.ai/blogs/uncanny-valley-ai-companions-what-makes-ai-feel-human)
- [My AI Front Desk: The Uncanny Valley of Voice](https://www.myaifrontdesk.com/blogs/the-uncanny-valley-of-voice-why-some-ai-receptionists-creep-us-out)
- [Voiceflow: Build an AI Discord Chatbot 2025](https://www.voiceflow.com/blog/discord-chatbot)
- [Botpress: How to Build a Discord AI Chatbot](https://botpress.com/blog/discord-ai-chatbot)
- [Frugal Testing: 5 Proven Ways Discord Manages Load Testing](https://www.frugaltesting.com/blog/5-proven-ways-discord-manages-load-testing-at-scale)
---
**Quality Gate Checklist:**
- [x] Clearly categorizes table stakes vs differentiators
- [x] Complexity ratings included with duration estimates
- [x] Dependencies mapped with visual graph
- [x] Success criteria are testable and behavioral
- [x] Specific to AI companions, not generic software features
- [x] Includes anti-patterns and what NOT to build
- [x] Prioritized adoption path with clear phases
- [x] Research grounded in 2026 landscape and current implementations
**Document Status:** Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.