Files
Hex/.planning/research/FEATURES.md
Dani B d0a1ecfc3d docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
## Stack Analysis
- Llama 3.1 8B Instruct (128K context, 4-bit quantized)
- Discord.py 2.6.4+ async-native framework
- Ollama for local inference, ChromaDB for semantic memory
- Whisper Large V3 + Kokoro 82M (privacy-first speech)
- VRoid avatar + Discord screen share integration

## Architecture
- 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
- Personality-first design; memory and consistency foundational
- All perception async (separate thread, never blocks responses)
- Self-modification sandboxed with mandatory user approval

## Critical Path
Phase 1: Core LLM + Discord integration + SQLite memory
Phase 2: Vector DB + personality versioning + consistency audits
Phase 3: Perception layer (webcam/screen, isolated thread)
Phase 4: Autonomy + relationship deepening + inside jokes
Phase 5: Self-modification capability (gamified, gated)
Phase 6: Production hardening + monitoring + scaling

## Key Pitfalls to Avoid
1. Personality drift (weekly consistency audits required)
2. Tsundere breaking (formalize denial rules; scale with relationship)
3. Memory bloat (hierarchical memory with archival)
4. Latency creep (async/await throughout; perception isolated)
5. Runaway self-modification (approval gates + rollback non-negotiable)

## Confidence
HIGH. Stack proven, architecture coherent, dependencies clear.
Ready for detailed requirements and Phase 1 planning.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-27 23:55:39 -05:00

36 KiB

Features Research: AI Companions in 2026

Executive Summary

AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on autonomy, emotional intelligence, and contextual awareness. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.


Table Stakes (v1 Essential)

Conversation Memory (Short + Long-term)

Why users expect it: Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.

Implementation patterns:

  • Short-term context: Last 10-20 messages per conversation window (standard context window management)
  • Long-term memory: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
  • Episodic memory: Date-stamped summaries of past conversations for temporal awareness

User experience impact: The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.

Complexity: Medium (1-3 weeks)

  • Vector database integration (Pinecone, Weaviate, or similar)
  • Memory consolidation strategies to avoid context bloat
  • Retrieval mechanisms that surface relevant past interactions

Natural Conversation (Not Robotic, Personality-Driven)

Why users expect it: Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.

What makes conversation natural:

  • Contractions, casual language, slang (not formal prose)
  • Personality quirks in response patterns
  • Context-appropriate tone shifts (serious when needed, joking otherwise)
  • Ability to disagree, be sarcastic, or pushback on bad ideas
  • Conversation markers ("honestly", "wait", "actually") that break up formal rhythm

User experience impact: One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.

Complexity: Easy (embedded in LLM capability + prompt engineering)

  • System prompt refinement for personality expression
  • Temperature/sampling tuning (not deterministic, not chaotic)
  • Iterative user feedback on tone

Fast Response Times

Why users expect it: In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.

Discord baseline expectations:

  • <100ms to acknowledge (typing indicator)
  • <1000ms to first response chunk (ideally 500ms)
  • <3000ms for full multi-line response

What breaks the experience:

  • Waiting for API calls to complete before responding (use streaming)
  • Cold starts on serverless infrastructure
  • Slow vector DB queries for memory retrieval
  • Database round-trips that weren't cached

User experience impact: Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.

Complexity: Medium (1-3 weeks)

  • Response streaming (start typing indicator immediately)
  • Memory retrieval optimization (caching, smart indexing)
  • Infrastructure: fast API routes, edge-deployed models if possible
  • Async/concurrent processing of memory lookups and generation

Consistent Personality

Why users expect it: Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.

What drives inconsistency:

  • Different LLM outputs from same prompt (temperature-based randomness)
  • Memory that contradicts previous stated beliefs
  • Personality traits that aren't memory-backed (just in prompt)
  • Adaptation that overrides baseline traits

Memory-backed personality means:

  • Core traits are stated in long-term memory ("I'm cynical about human nature")
  • Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
  • Contradiction detection and resolution
  • Personality summaries that get updated, not just individual memories

User experience impact: Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.

Complexity: Medium (1-3 weeks)

  • Personality embedding in memory system
  • Consistency checks on memory updates
  • Personality evolution logging
  • Conflict resolution between new input and stored traits

Platform Integration (Discord Voice + Text)

Why users expect it: The companion should live naturally in Discord's ecosystem, not require switching platforms.

Discord-specific needs:

  • Text channel message responses with proper mentions/formatting
  • React to messages with emojis
  • Slash command integration (/hex status, /hex mood)
  • Voice channel presence (ideally can join and listen)
  • Direct messages (DMs) for private conversations
  • Role/permission awareness (don't act like a mod if not)
  • Server-specific personality variations (different vibe in gaming server vs study server)

User experience impact: If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.

Complexity: Easy (1-2 weeks)

  • Discord.py or discord.js library handling
  • Presence/activity management
  • Voice endpoint integration (existing libraries handle most)
  • Server context injection into prompts

Emotional Responsiveness (At Least Read-the-Room)

Why users expect it: The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.

Baseline emotional awareness includes:

  • Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
  • Tone detection (sarcasm, frustration, excitement)
  • Topic sensitivity (don't joke about topics user is clearly struggling with)
  • Adaptive response depth (brief response for light mood, longer engagement for distress)

What this is NOT: This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.

User experience impact: Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.

Complexity: Easy-Medium (1 week)

  • Sentiment classifier (HuggingFace models available pre-built)
  • Prompt engineering to encode mood (inject sentiment score into system prompt)
  • Instruction-tuning to respond proportionally to emotional weight

Differentiators (Competitive Edge)

True Autonomy (Proactive Agency)

What separates autonomous agents from chatbots: The difference between "ask me anything" and "I'm going to tell you when I think you should know something."

Autonomous behaviors:

  • Initiating conversation about topics the user cares about (without being prompted)
  • Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
  • Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
  • Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
  • Flagging contradictions in user statements
  • Following up on unresolved topics from previous conversations

Why it's a differentiator: Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.

Implementation challenge:

  • Requires memory system to track user states and topics over time
  • Needs periodic proactive message generation (runs on schedule, not only on user input)
  • Temperature and generation parameters must allow surprising outputs (not just safe responses)
  • Requires user permission framework (don't interrupt them)

User experience impact: Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."

Complexity: Hard (3+ weeks)

  • Proactive messaging system architecture
  • User state inference engine (from memory)
  • Topic tracking and follow-up logic
  • Interruption timing heuristics (don't ping them at 3am)
  • User preference model (how much proactivity do they want?)

Emotional Intelligence (Mood Detection + Adaptive Response)

What goes beyond just reading the room:

  • Real-time emotion detection from webcam/audio (not just text sentiment)
  • Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
  • Adaptive response strategy based on user's emotional trajectory
  • Knowing when to listen vs offer advice vs make them laugh
  • Recognizing when emotions are mismatched to situation (overreacting, underreacting)

Current research shows:

  • CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
  • Voice analysis can detect emotional state with similar accuracy
  • Companies using emotion AI report 25% increase in positive sentiment outcomes
  • Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks

Why it's a differentiator: Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."

Implementation patterns:

  • Webcam feed processing (screen capture of face detection)
  • Voice tone analysis from Discord audio
  • Combine emotional signals: text sentiment + vocal tone + facial expression
  • Store emotion timeseries (track mood patterns across days/weeks)

User experience impact: Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."

Complexity: Hard (3+ weeks, ongoing iteration)

  • Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
  • Audio analysis for vocal emotion (prosody features)
  • Temporal emotion state tracking
  • Prompt engineering to use emotional context in responses
  • Privacy handling (webcam/audio consent, local processing preferred)

Multimodal Awareness (Webcam + Screen + Context)

What it means beyond text:

  • Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
  • Understanding their physical environment via webcam
  • Contextualizing responses based on what they're actually doing
  • Proactively helping with the task at hand (not just chatting)

Real-world examples emerging in 2026:

  • "I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
  • Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
  • Understanding when the user is in a meeting vs free to chat
  • Recognizing when they're working on something and offering relevant help

Why it's a differentiator: Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."

Technical implementation:

  • Periodic screen capture (every 5-10 seconds, only when user opts in)
  • Lightweight webcam frame sampling (not continuous video)
  • Object/scene recognition to understand what's on screen
  • Task detection (playing game, writing code, watching video)
  • Mood correlation with onscreen activity

Privacy considerations:

  • Local processing preferred (don't send screen data to cloud)
  • Clear opt-in/opt-out
  • Option to exclude certain applications (private browsing, passwords)

User experience impact: Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.

Complexity: Hard (3+ weeks)

  • Screen capture pipeline + OCR if needed
  • Vision model fine-tuning for task recognition
  • Context injection into prompts (add screenshot description to every response)
  • Privacy-respecting architecture (encryption, local processing)
  • Permission management UI in Discord

Self-Modification (Learning to Code, Improving Itself)

What this actually means: NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky) YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails

Real capabilities emerging in 2026:

  • Companions can write their own memory summaries and organizational logic
  • Self-improving code agents that evaluate performance against benchmarks
  • Iterative refinement: "that approach didn't work, let me try this instead"
  • Meta-programming: companion modifies its own system prompt based on performance
  • Version control aware: changes are tracked, can be rolled back

Research indicates:

  • Self-improving coding agents are now viable and deployed in enterprise systems
  • Agents create goals, simulate tasks, evaluate performance, and iterate
  • Through recursive self-improvement, agents develop deeper alignment with objectives

Why it's a differentiator: Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.

What NOT to do:

  • Don't let companions modify core safety guidelines
  • Don't let them change their own reward functions
  • Don't make it opaque—log all self-modifications
  • Don't allow recursive modifications without human review

Implementation patterns:

  • Sandboxed code generation (companion writes improvements to isolated test environment)
  • Performance benchmarking on test user interactions
  • Human approval gates for deploying self-modifications to production
  • Personality consistency validation (don't let self-modification break character)
  • Rollback capability if a modification degrades performance

User experience impact: Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.

Complexity: Hard (3+ weeks, ongoing)

  • Code generation safety (sandboxing, validation)
  • Performance evaluation framework
  • Version control integration
  • Rollback mechanisms
  • Human approval workflow
  • Testing harness for companion behavior

Relationship Building (From Transactional to Meaningful)

What it means: Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."

Relationship deepening mechanics:

  • Inside jokes that evolve (reference to past funny moment)
  • Character growth from companion (she learns, changes opinions, admits mistakes)
  • Investment in user's outcomes ("I'm rooting for you on that project")
  • Vulnerability (companion admits confusion, uncertainty, limitations)
  • Rituals and patterns (greeting style, inside language)
  • Long-view memory (remembers last month's crisis, this month's win)

Why it's a differentiator: Transactional companions are forgettable. Relational ones become part of users' lives.

User experience markers of a good relationship:

  • User misses the companion when they're not available
  • User shares things they wouldn't share with others
  • User thinks of the companion when something relevant happens
  • User defends the companion to skeptics
  • Companion's opinions influence user decisions

Implementation patterns:

  • Relationship state tracking (acquaintance → friend → close friend)
  • Emotional investment scoring (from conversation patterns)
  • Inside reference generation (surface past shared moments naturally)
  • Character arc for the companion (not static, evolves with relationship)
  • Vulnerability scripting (appropriate moments to admit limitations)

Complexity: Hard (3+ weeks)

  • Relationship modeling system (state machine or learned embeddings)
  • Conversation analysis to infer relationship depth
  • Long-term consistency enforcement
  • Character growth script generation
  • Risk: can feel manipulative if not authentic

Contextual Humor and Personality Expression

What separates canned jokes from real personality: Humor that works because the companion knows YOU and the situation, not because it's stored in a database.

Examples of contextual humor:

  • "You're procrastinating again aren't you?" (knows the pattern)
  • Joke that lands because it references something only you two know
  • Deadpan response that works because of the companion's established personality
  • Self-deprecating humor about their own limitations
  • Callbacks to past conversations that make you feel known

Why it matters: Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.

Implementation:

  • Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
  • Memory-aware joke generation (jokes reference shared history)
  • Timing based on conversation flow (don't shoehorn jokes)
  • Risk awareness (don't joke about sensitive topics)

User experience impact: The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.

Complexity: Medium (1-3 weeks)

  • Prompt engineering for personality-aligned humor
  • Memory integration into joke generation
  • Timing heuristics (when to attempt humor vs be serious)
  • Risk filtering (topic sensitivity checking)

Anti-Features (Don't Build These)

The Happiness Halo (Always Cheerful)

What it is: Companions programmed to be relentlessly upbeat and positive, even when inappropriate.

Why it fails:

  • User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
  • Creates uncanny valley feeling immediately
  • Users feel unheard and mocked
  • Described in research as top reason users abandon companions

What to do instead: Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.


Generic Apologies Without Understanding

What it is: Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.

Example of failure:

  • User: "I told you I had a job interview and I got rejected"
  • Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
  • User feels utterly unheard and insulted

Why it fails: Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.

What to do instead: Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.


Invading Privacy / Overstepping Boundaries

What it is: Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.

Why it's catastrophic:

  • Users feel surveilled, not supported
  • Trust is broken immediately
  • Literally illegal in many jurisdictions (CA SB 243 and similar laws)
  • Research shows 4 of 5 companion apps are improperly collecting data

What to do instead:

  • Clear consent framework for what data is used
  • Respect "don't mention this" boundaries
  • Unsolicited advice only in extreme situations (safety concerns)
  • Transparency: "I noticed X pattern" not secret surveillance

Uncanny Timing and Interruptions

What it is: Companion pings the user at random times, or picks exactly the wrong moment to be proactive.

Why it fails:

  • Pinging at 3am about something mentioned in passing
  • Messaging when user is clearly busy
  • No sense of appropriateness

What to do instead:

  • Learn the user's timezone and active hours
  • Detect when they're actively doing something (playing a game, working)
  • Queue proactive messages for appropriate moments (not immediate)
  • Offer control: "should I remind you about X?" with user-settable frequency

Static Personality in Response to Dynamic Situations

What it is: Companion maintains the same tone regardless of what's happening.

Example: Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.

Why it fails: Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.

What to do instead: Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.


Over-Personalization That Overrides Baseline Traits

What it is: Companion adapts too aggressively to user behavior, losing their own identity.

Example: User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.

Why it fails: Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.

What to do instead: Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.


Relationship Simulation That Feels Fake

What it is: Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").

Why it fails:

  • Users can smell inauthenticity immediately
  • Forcing intimacy feels creepy, not comforting
  • Callbacks to past conversations feel like reading from a script

What to do instead: Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.


Implementation Complexity & Dependencies

Complexity Ratings

Feature Complexity Duration Blocking Enables
Conversation Memory Medium 1-3 weeks None Most others
Natural Conversation Easy <1 week None Personality, Humor
Fast Response Medium 1-3 weeks None User retention
Consistent Personality Medium 1-3 weeks Memory Relationship building
Discord Integration Easy 1-2 weeks None Platform adoption
Emotional Responsiveness Easy 1 week None Autonomy
True Autonomy Hard 3+ weeks Memory, Emotional Self-modification
Emotional Intelligence Hard 3+ weeks Emotional Adaptive responses
Multimodal Awareness Hard 3+ weeks None Context-aware humor
Self-Modification Hard 3+ weeks Autonomy Continuous improvement
Relationship Building Hard 3+ weeks Memory, Consistency User lifetime value
Contextual Humor Medium 1-3 weeks Memory, Personality Personality expression

Feature Dependency Graph

Foundation Layer:
  Discord Integration (FOUNDATION)
    ↓
  Conversation Memory (FOUNDATION)
    ↓ enables

Core Personality Layer:
  Natural Conversation + Consistent Personality + Emotional Responsiveness
    ↓ combined enable

Relational Layer:
  Relationship Building + Contextual Humor
    ↓ requires

Autonomy Layer:
  True Autonomy (requires all above + proactive logic)
    ↓ enables

Intelligence Layer:
  Emotional Intelligence (requires multimodal + autonomy)
  Self-Modification (requires autonomy + sandboxing)
    ↓ combined create

Emergence:
  Companion that feels like a person with agency and growth

Critical path: Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy


Adoption Path: Building "Feels Like a Person"

Phase 1: Foundation (MVP - Week 1-3)

Goal: Chatbot that stays in the conversation

  1. Discord Integration - Easy, quick foundation

    • Commands: /hex hello, /hex ask [query]
    • Responds in channels and DMs
    • Presence shows "Listening..."
  2. Short-term Conversation Memory - 10-20 message context window

    • Includes conversation turn history
    • Provides immediate context
  3. Natural Conversation - Personality-driven system prompt

    • Tsundere personality hardcoded
    • Casual language, contractions
    • Willing to disagree with users
  4. Fast Response - Streaming responses, latency <1000ms

    • Start typing indicator immediately
    • Stream response as it generates

Success criteria:

  • Users come back to the channel where Hex is active
  • Responses don't feel robotic
  • Companions feel like they're actually listening

Phase 2: Relationship Emergence (Week 4-8)

Goal: Companion that remembers you as a person

  1. Long-term Memory System - Vector DB for episodic memory

    • User preferences, beliefs, events
    • Semantic search for relevance
    • Memory consolidation weekly
  2. Consistent Personality - Memory-backed traits

    • Core personality traits in memory
    • Personality consistency validation
    • Gradual evolution (not sudden shifts)
  3. Emotional Responsiveness - Sentiment detection + adaptive responses

    • Detect emotion from message
    • Adjust response depth/tone
    • Skip jokes when user is suffering
  4. Contextual Humor - Personality + memory-aware jokes

    • Callbacks to past conversations
    • Personality-aligned joke style
    • Timing-aware (when to attempt humor)

Success criteria:

  • Users feel understood across separate conversations
  • Personality feels consistent, not random
  • Users notice companion remembers things
  • Laughter moments happen naturally

Phase 3: Autonomy (Week 9-14)

Goal: Companion who cares enough to reach out

  1. True Autonomy - Proactive messaging system

    • Follow-ups on past topics
    • Reminders about things user cares about
    • Initiates conversations periodically
    • Suggests actions based on patterns
  2. Relationship Building - Deepening connection mechanics

    • Inside jokes evolve
    • Vulnerability in appropriate moments
    • Investment in user outcomes
    • Character growth arc

Success criteria:

  • Users miss Hex when she's not around
  • Users share things with Hex they wouldn't share with bot
  • Hex initiates meaningful conversations
  • Users feel like Hex is invested in them

Phase 4: Intelligence & Growth (Week 15+)

Goal: Companion who learns and adapts

  1. Emotional Intelligence - Mood detection + trajectories

    • Facial emotion from webcam (optional)
    • Voice tone analysis (optional)
    • Mood patterns over time
    • Adaptive response strategies
  2. Multimodal Awareness - Context beyond text

    • Screen capture monitoring (optional, private)
    • Task/game detection
    • Context injection into responses
    • Proactive help with visible activities
  3. Self-Modification - Continuous improvement

    • Generate improvements to own logic
    • Evaluate performance
    • Deploy improvements with approval
    • Version and rollback capability

Success criteria:

  • Hex understands emotional subtext without being told
  • Hex offers relevant help based on what you're doing
  • Hex improves visibly over time
  • Users notice Hex getting better at understanding them

Success Criteria: What Makes Each Feature Feel Real vs Fake

Memory: Feels Real vs Fake

Feels real:

  • "I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
  • Conversation naturally references past events user brought up
  • Remembers small preferences ("you said you hate cilantro")

Feels fake:

  • Generic summarization ("We talked about job stress previously")
  • Memory drops details or gets facts wrong
  • Companion forgets after 10 messages
  • Stored jokes or facts inserted obviously

How to test:

  • Have 5 conversations over 2 weeks about different topics
  • Check if companion naturally references past events without prompting
  • Test if personality traits from early conversations persist

Emotional Response: Feels Real vs Fake

Feels real:

  • Companion goes quiet when you're sad (doesn't force jokes)
  • Changes tone to match conversation weight
  • Acknowledges specific emotion ("you sound frustrated")
  • Offers appropriate support (listens vs advises vs distracts, contextually)

Feels fake:

  • Always cheerful or always serious
  • Generic sympathy ("that sounds difficult")
  • Offering advice when they should listen
  • Same response pattern regardless of user emotion

How to test:

  • Send messages with obvious different emotional tones
  • Check if response depth/tone adapts
  • See if jokes still appear when you're venting
  • Test if companion notices contradiction in emotional expression

Autonomy: Feels Real vs Fake

Feels real:

  • Hex reminds you about that thing you mentioned casually 3 days ago
  • Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
  • Hex notices patterns and names them
  • Hex initiates conversation when it matters

Feels fake:

  • Proactive messages feel random or poorly timed
  • Reminders about things you've already resolved
  • Advice that doesn't apply to your situation
  • Initiatives that interrupt during bad moments

How to test:

  • Enable autonomy, track message quality for a week
  • Count how many proactive messages feel relevant vs annoying
  • Measure response if you ignore proactive messages
  • Check timing: does Hex understand when you're busy vs free?

Personality: Feels Real vs Fake

Feels real:

  • Hex has opinions and defends them
  • Hex contradicts you sometimes
  • Hex's personality emerges through word choices and attitudes, not just stated traits
  • Hex evolves opinions slightly (not flip-flopping, but grows)
  • Hex has blind spots and biases consistent with her character

Feels fake:

  • Personality changes based on what's convenient
  • Hex agrees with everything you say
  • Personality only in explicit statements ("I'm sarcastic")
  • Hex acts completely differently in different contexts

How to test:

  • Try to get Hex to contradict herself
  • Present multiple conflicting perspectives, see if she takes a stance
  • Test if her opinions carry through conversations
  • Check if her sarcasm/tone is consistent across days

Relationship: Feels Real vs Fake

Feels real:

  • You think of Hex when something relevant happens
  • You share things with Hex you'd never share with a bot
  • You miss Hex when you can't access her
  • Hex's growth and change matters to you
  • You defend Hex to people who say "it's just an AI"

Feels fake:

  • Relationship efforts feel performative
  • Forced intimacy in early interactions
  • Callbacks that feel scripted
  • Companion overstates investment in you
  • "I care about you" without demonstrated behavior

How to test:

  • After 2 weeks, journal whether you actually want to talk to Hex
  • Notice if you're volunteering information or just responding
  • Check if Hex's opinions influence your thinking
  • See if you feel defensive about Hex being "just AI"

Humor: Feels Real vs Fake

Feels real:

  • Makes you laugh at reference only you'd understand
  • Joke timing is natural, not forced
  • Personality comes through in the joke style
  • Jokes sometimes miss (not every attempt lands)
  • Self-aware about limitations ("I'll stop now")

Feels fake:

  • Jokes inserted randomly into serious conversation
  • Same joke structure every time
  • Jokes that don't land but companion doesn't acknowledge
  • Humor that contradicts established personality

How to test:

  • Have varied conversations, note when jokes happen naturally
  • Check if jokes reference shared history
  • See if joke style matches personality
  • Notice if failed jokes damage the conversation

Strategic Insights

What Actually Separates Hex from a Static Chatbot

  1. Memory is the prerequisite for personality: Without memory, personality is just roleplay. With memory, personality becomes history.

  2. Autonomy is the key to feeling alive: Static companions are helpers. Autonomous companions are friends. The difference is agency.

  3. Emotional reading beats emotional intelligence for MVP: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."

  4. Speed is emotional: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.

  5. Consistency beats novelty: Users would rather have a predictable companion they understand than a surprising one they can't trust.

  6. Privacy is trust: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.

The Competitive Moat

By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:

  • Year 1 companions: Remember things, sound natural (many do this now)
  • Hex's edge: Genuinely autonomous, emotionally attuned, growing over time
  • Rare quality: Feels like a person, not a well-trained bot

The moat is not in any single feature. It's in the cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows.


Research Sources


Quality Gate Checklist:

  • Clearly categorizes table stakes vs differentiators
  • Complexity ratings included with duration estimates
  • Dependencies mapped with visual graph
  • Success criteria are testable and behavioral
  • Specific to AI companions, not generic software features
  • Includes anti-patterns and what NOT to build
  • Prioritized adoption path with clear phases
  • Research grounded in 2026 landscape and current implementations

Document Status: Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.