Files

Dani B d0a1ecfc3d docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)

## Stack Analysis
- Llama 3.1 8B Instruct (128K context, 4-bit quantized)
- Discord.py 2.6.4+ async-native framework
- Ollama for local inference, ChromaDB for semantic memory
- Whisper Large V3 + Kokoro 82M (privacy-first speech)
- VRoid avatar + Discord screen share integration

## Architecture
- 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish
- Personality-first design; memory and consistency foundational
- All perception async (separate thread, never blocks responses)
- Self-modification sandboxed with mandatory user approval

## Critical Path
Phase 1: Core LLM + Discord integration + SQLite memory
Phase 2: Vector DB + personality versioning + consistency audits
Phase 3: Perception layer (webcam/screen, isolated thread)
Phase 4: Autonomy + relationship deepening + inside jokes
Phase 5: Self-modification capability (gamified, gated)
Phase 6: Production hardening + monitoring + scaling

## Key Pitfalls to Avoid
1. Personality drift (weekly consistency audits required)
2. Tsundere breaking (formalize denial rules; scale with relationship)
3. Memory bloat (hierarchical memory with archival)
4. Latency creep (async/await throughout; perception isolated)
5. Runaway self-modification (approval gates + rollback non-negotiable)

## Confidence
HIGH. Stack proven, architecture coherent, dependencies clear.
Ready for detailed requirements and Phase 1 planning.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-27 23:55:39 -05:00

36 KiB

Raw Blame History

Features Research: AI Companions in 2026

Executive Summary

AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on autonomy, emotional intelligence, and contextual awareness. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.

Table Stakes (v1 Essential)

Conversation Memory (Short + Long-term)

Why users expect it: Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.

Implementation patterns:

Short-term context: Last 10-20 messages per conversation window (standard context window management)
Long-term memory: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
Episodic memory: Date-stamped summaries of past conversations for temporal awareness

User experience impact: The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.

Complexity: Medium (1-3 weeks)

Vector database integration (Pinecone, Weaviate, or similar)
Memory consolidation strategies to avoid context bloat
Retrieval mechanisms that surface relevant past interactions

Natural Conversation (Not Robotic, Personality-Driven)

Why users expect it: Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.

What makes conversation natural:

Contractions, casual language, slang (not formal prose)
Personality quirks in response patterns
Context-appropriate tone shifts (serious when needed, joking otherwise)
Ability to disagree, be sarcastic, or pushback on bad ideas
Conversation markers ("honestly", "wait", "actually") that break up formal rhythm

User experience impact: One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.

Complexity: Easy (embedded in LLM capability + prompt engineering)

System prompt refinement for personality expression
Temperature/sampling tuning (not deterministic, not chaotic)
Iterative user feedback on tone

Fast Response Times

Why users expect it: In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.

Discord baseline expectations:

<100ms to acknowledge (typing indicator)
<1000ms to first response chunk (ideally 500ms)
<3000ms for full multi-line response

What breaks the experience:

Waiting for API calls to complete before responding (use streaming)
Cold starts on serverless infrastructure
Slow vector DB queries for memory retrieval
Database round-trips that weren't cached

User experience impact: Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.

Complexity: Medium (1-3 weeks)

Response streaming (start typing indicator immediately)
Memory retrieval optimization (caching, smart indexing)
Infrastructure: fast API routes, edge-deployed models if possible
Async/concurrent processing of memory lookups and generation

Consistent Personality

Why users expect it: Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.

What drives inconsistency:

Different LLM outputs from same prompt (temperature-based randomness)
Memory that contradicts previous stated beliefs
Personality traits that aren't memory-backed (just in prompt)
Adaptation that overrides baseline traits

Memory-backed personality means:

Core traits are stated in long-term memory ("I'm cynical about human nature")
Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
Contradiction detection and resolution
Personality summaries that get updated, not just individual memories

User experience impact: Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.

Complexity: Medium (1-3 weeks)

Personality embedding in memory system
Consistency checks on memory updates
Personality evolution logging
Conflict resolution between new input and stored traits

Platform Integration (Discord Voice + Text)

Why users expect it: The companion should live naturally in Discord's ecosystem, not require switching platforms.

Discord-specific needs:

Text channel message responses with proper mentions/formatting
React to messages with emojis
Slash command integration (/hex status, /hex mood)
Voice channel presence (ideally can join and listen)
Direct messages (DMs) for private conversations
Role/permission awareness (don't act like a mod if not)
Server-specific personality variations (different vibe in gaming server vs study server)

User experience impact: If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.

Complexity: Easy (1-2 weeks)

Discord.py or discord.js library handling
Presence/activity management
Voice endpoint integration (existing libraries handle most)
Server context injection into prompts

Emotional Responsiveness (At Least Read-the-Room)

Why users expect it: The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.

Baseline emotional awareness includes:

Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
Tone detection (sarcasm, frustration, excitement)
Topic sensitivity (don't joke about topics user is clearly struggling with)
Adaptive response depth (brief response for light mood, longer engagement for distress)

What this is NOT: This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.

User experience impact: Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.

Complexity: Easy-Medium (1 week)

Sentiment classifier (HuggingFace models available pre-built)
Prompt engineering to encode mood (inject sentiment score into system prompt)
Instruction-tuning to respond proportionally to emotional weight

Differentiators (Competitive Edge)

True Autonomy (Proactive Agency)

What separates autonomous agents from chatbots: The difference between "ask me anything" and "I'm going to tell you when I think you should know something."

Autonomous behaviors:

Initiating conversation about topics the user cares about (without being prompted)
Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
Flagging contradictions in user statements
Following up on unresolved topics from previous conversations

Why it's a differentiator: Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.

Implementation challenge:

Requires memory system to track user states and topics over time
Needs periodic proactive message generation (runs on schedule, not only on user input)
Temperature and generation parameters must allow surprising outputs (not just safe responses)
Requires user permission framework (don't interrupt them)

User experience impact: Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."

Complexity: Hard (3+ weeks)

Proactive messaging system architecture
User state inference engine (from memory)
Topic tracking and follow-up logic
Interruption timing heuristics (don't ping them at 3am)
User preference model (how much proactivity do they want?)

Emotional Intelligence (Mood Detection + Adaptive Response)

What goes beyond just reading the room:

Real-time emotion detection from webcam/audio (not just text sentiment)
Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
Adaptive response strategy based on user's emotional trajectory
Knowing when to listen vs offer advice vs make them laugh
Recognizing when emotions are mismatched to situation (overreacting, underreacting)

Current research shows:

CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
Voice analysis can detect emotional state with similar accuracy
Companies using emotion AI report 25% increase in positive sentiment outcomes
Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks

Why it's a differentiator: Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."

Implementation patterns:

Webcam feed processing (screen capture of face detection)
Voice tone analysis from Discord audio
Combine emotional signals: text sentiment + vocal tone + facial expression
Store emotion timeseries (track mood patterns across days/weeks)

User experience impact: Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."

Complexity: Hard (3+ weeks, ongoing iteration)

Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
Audio analysis for vocal emotion (prosody features)
Temporal emotion state tracking
Prompt engineering to use emotional context in responses
Privacy handling (webcam/audio consent, local processing preferred)

Multimodal Awareness (Webcam + Screen + Context)

What it means beyond text:

Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
Understanding their physical environment via webcam
Contextualizing responses based on what they're actually doing
Proactively helping with the task at hand (not just chatting)

Real-world examples emerging in 2026:

"I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
Understanding when the user is in a meeting vs free to chat
Recognizing when they're working on something and offering relevant help

Why it's a differentiator: Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."

Technical implementation:

Periodic screen capture (every 5-10 seconds, only when user opts in)
Lightweight webcam frame sampling (not continuous video)
Object/scene recognition to understand what's on screen
Task detection (playing game, writing code, watching video)
Mood correlation with onscreen activity

Privacy considerations:

Local processing preferred (don't send screen data to cloud)
Clear opt-in/opt-out
Option to exclude certain applications (private browsing, passwords)

User experience impact: Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.

Complexity: Hard (3+ weeks)

Screen capture pipeline + OCR if needed
Vision model fine-tuning for task recognition
Context injection into prompts (add screenshot description to every response)
Privacy-respecting architecture (encryption, local processing)
Permission management UI in Discord

Self-Modification (Learning to Code, Improving Itself)

What this actually means: NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky) YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails

Real capabilities emerging in 2026:

Companions can write their own memory summaries and organizational logic
Self-improving code agents that evaluate performance against benchmarks
Iterative refinement: "that approach didn't work, let me try this instead"
Meta-programming: companion modifies its own system prompt based on performance
Version control aware: changes are tracked, can be rolled back

Research indicates:

Self-improving coding agents are now viable and deployed in enterprise systems
Agents create goals, simulate tasks, evaluate performance, and iterate
Through recursive self-improvement, agents develop deeper alignment with objectives

Why it's a differentiator: Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.

What NOT to do:

Don't let companions modify core safety guidelines
Don't let them change their own reward functions
Don't make it opaque—log all self-modifications
Don't allow recursive modifications without human review

Implementation patterns:

Sandboxed code generation (companion writes improvements to isolated test environment)
Performance benchmarking on test user interactions
Human approval gates for deploying self-modifications to production
Personality consistency validation (don't let self-modification break character)
Rollback capability if a modification degrades performance

User experience impact: Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.

Complexity: Hard (3+ weeks, ongoing)

Code generation safety (sandboxing, validation)
Performance evaluation framework
Version control integration
Rollback mechanisms
Human approval workflow
Testing harness for companion behavior

Relationship Building (From Transactional to Meaningful)

What it means: Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."

Relationship deepening mechanics:

Inside jokes that evolve (reference to past funny moment)
Character growth from companion (she learns, changes opinions, admits mistakes)
Investment in user's outcomes ("I'm rooting for you on that project")
Vulnerability (companion admits confusion, uncertainty, limitations)
Rituals and patterns (greeting style, inside language)
Long-view memory (remembers last month's crisis, this month's win)

Why it's a differentiator: Transactional companions are forgettable. Relational ones become part of users' lives.

User experience markers of a good relationship:

User misses the companion when they're not available
User shares things they wouldn't share with others
User thinks of the companion when something relevant happens
User defends the companion to skeptics
Companion's opinions influence user decisions

Implementation patterns:

Relationship state tracking (acquaintance → friend → close friend)
Emotional investment scoring (from conversation patterns)
Inside reference generation (surface past shared moments naturally)
Character arc for the companion (not static, evolves with relationship)
Vulnerability scripting (appropriate moments to admit limitations)

Complexity: Hard (3+ weeks)

Relationship modeling system (state machine or learned embeddings)
Conversation analysis to infer relationship depth
Long-term consistency enforcement
Character growth script generation
Risk: can feel manipulative if not authentic

Contextual Humor and Personality Expression

What separates canned jokes from real personality: Humor that works because the companion knows YOU and the situation, not because it's stored in a database.

Examples of contextual humor:

"You're procrastinating again aren't you?" (knows the pattern)
Joke that lands because it references something only you two know
Deadpan response that works because of the companion's established personality
Self-deprecating humor about their own limitations
Callbacks to past conversations that make you feel known

Why it matters: Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.

Implementation:

Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
Memory-aware joke generation (jokes reference shared history)
Timing based on conversation flow (don't shoehorn jokes)
Risk awareness (don't joke about sensitive topics)

User experience impact: The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.

Complexity: Medium (1-3 weeks)

Prompt engineering for personality-aligned humor
Memory integration into joke generation
Timing heuristics (when to attempt humor vs be serious)
Risk filtering (topic sensitivity checking)

Anti-Features (Don't Build These)

The Happiness Halo (Always Cheerful)

What it is: Companions programmed to be relentlessly upbeat and positive, even when inappropriate.

Why it fails:

User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
Creates uncanny valley feeling immediately
Users feel unheard and mocked
Described in research as top reason users abandon companions

What to do instead: Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.

Generic Apologies Without Understanding

What it is: Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.

Example of failure:

User: "I told you I had a job interview and I got rejected"
Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
User feels utterly unheard and insulted

Why it fails: Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.

What to do instead: Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.

Invading Privacy / Overstepping Boundaries

What it is: Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.

Why it's catastrophic:

Users feel surveilled, not supported
Trust is broken immediately
Literally illegal in many jurisdictions (CA SB 243 and similar laws)
Research shows 4 of 5 companion apps are improperly collecting data

What to do instead:

Clear consent framework for what data is used
Respect "don't mention this" boundaries
Unsolicited advice only in extreme situations (safety concerns)
Transparency: "I noticed X pattern" not secret surveillance

Uncanny Timing and Interruptions

What it is: Companion pings the user at random times, or picks exactly the wrong moment to be proactive.

Why it fails:

Pinging at 3am about something mentioned in passing
Messaging when user is clearly busy
No sense of appropriateness

What to do instead:

Learn the user's timezone and active hours
Detect when they're actively doing something (playing a game, working)
Queue proactive messages for appropriate moments (not immediate)
Offer control: "should I remind you about X?" with user-settable frequency

Static Personality in Response to Dynamic Situations

What it is: Companion maintains the same tone regardless of what's happening.

Example: Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.

Why it fails: Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.

What to do instead: Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.

Over-Personalization That Overrides Baseline Traits

What it is: Companion adapts too aggressively to user behavior, losing their own identity.

Example: User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.

Why it fails: Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.

What to do instead: Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.

Relationship Simulation That Feels Fake

What it is: Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").

Why it fails:

Users can smell inauthenticity immediately
Forcing intimacy feels creepy, not comforting
Callbacks to past conversations feel like reading from a script

What to do instead: Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.

Implementation Complexity & Dependencies

Complexity Ratings

Feature	Complexity	Duration	Blocking	Enables
Conversation Memory	Medium	1-3 weeks	None	Most others
Natural Conversation	Easy	<1 week	None	Personality, Humor
Fast Response	Medium	1-3 weeks	None	User retention
Consistent Personality	Medium	1-3 weeks	Memory	Relationship building
Discord Integration	Easy	1-2 weeks	None	Platform adoption
Emotional Responsiveness	Easy	1 week	None	Autonomy
True Autonomy	Hard	3+ weeks	Memory, Emotional	Self-modification
Emotional Intelligence	Hard	3+ weeks	Emotional	Adaptive responses
Multimodal Awareness	Hard	3+ weeks	None	Context-aware humor
Self-Modification	Hard	3+ weeks	Autonomy	Continuous improvement
Relationship Building	Hard	3+ weeks	Memory, Consistency	User lifetime value
Contextual Humor	Medium	1-3 weeks	Memory, Personality	Personality expression

Feature Dependency Graph

Foundation Layer:
  Discord Integration (FOUNDATION)
    ↓
  Conversation Memory (FOUNDATION)
    ↓ enables

Core Personality Layer:
  Natural Conversation + Consistent Personality + Emotional Responsiveness
    ↓ combined enable

Relational Layer:
  Relationship Building + Contextual Humor
    ↓ requires

Autonomy Layer:
  True Autonomy (requires all above + proactive logic)
    ↓ enables

Intelligence Layer:
  Emotional Intelligence (requires multimodal + autonomy)
  Self-Modification (requires autonomy + sandboxing)
    ↓ combined create

Emergence:
  Companion that feels like a person with agency and growth

Critical path: Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy

Adoption Path: Building "Feels Like a Person"

Phase 1: Foundation (MVP - Week 1-3)

Goal: Chatbot that stays in the conversation

Discord Integration - Easy, quick foundation
- Commands: /hex hello, /hex ask [query]
- Responds in channels and DMs
- Presence shows "Listening..."
Short-term Conversation Memory - 10-20 message context window
- Includes conversation turn history
- Provides immediate context
Natural Conversation - Personality-driven system prompt
- Tsundere personality hardcoded
- Casual language, contractions
- Willing to disagree with users
Fast Response - Streaming responses, latency <1000ms
- Start typing indicator immediately
- Stream response as it generates

Success criteria:

Users come back to the channel where Hex is active
Responses don't feel robotic
Companions feel like they're actually listening

Phase 2: Relationship Emergence (Week 4-8)

Goal: Companion that remembers you as a person

Long-term Memory System - Vector DB for episodic memory
- User preferences, beliefs, events
- Semantic search for relevance
- Memory consolidation weekly
Consistent Personality - Memory-backed traits
- Core personality traits in memory
- Personality consistency validation
- Gradual evolution (not sudden shifts)
Emotional Responsiveness - Sentiment detection + adaptive responses
- Detect emotion from message
- Adjust response depth/tone
- Skip jokes when user is suffering
Contextual Humor - Personality + memory-aware jokes
- Callbacks to past conversations
- Personality-aligned joke style
- Timing-aware (when to attempt humor)

Success criteria:

Users feel understood across separate conversations
Personality feels consistent, not random
Users notice companion remembers things
Laughter moments happen naturally

Phase 3: Autonomy (Week 9-14)

Goal: Companion who cares enough to reach out

True Autonomy - Proactive messaging system
- Follow-ups on past topics
- Reminders about things user cares about
- Initiates conversations periodically
- Suggests actions based on patterns
Relationship Building - Deepening connection mechanics
- Inside jokes evolve
- Vulnerability in appropriate moments
- Investment in user outcomes
- Character growth arc

Success criteria:

Users miss Hex when she's not around
Users share things with Hex they wouldn't share with bot
Hex initiates meaningful conversations
Users feel like Hex is invested in them

Phase 4: Intelligence & Growth (Week 15+)

Goal: Companion who learns and adapts

Emotional Intelligence - Mood detection + trajectories
- Facial emotion from webcam (optional)
- Voice tone analysis (optional)
- Mood patterns over time
- Adaptive response strategies
Multimodal Awareness - Context beyond text
- Screen capture monitoring (optional, private)
- Task/game detection
- Context injection into responses
- Proactive help with visible activities
Self-Modification - Continuous improvement
- Generate improvements to own logic
- Evaluate performance
- Deploy improvements with approval
- Version and rollback capability

Success criteria:

Hex understands emotional subtext without being told
Hex offers relevant help based on what you're doing
Hex improves visibly over time
Users notice Hex getting better at understanding them

Success Criteria: What Makes Each Feature Feel Real vs Fake

Memory: Feels Real vs Fake

Feels real:

"I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
Conversation naturally references past events user brought up
Remembers small preferences ("you said you hate cilantro")

Feels fake:

Generic summarization ("We talked about job stress previously")
Memory drops details or gets facts wrong
Companion forgets after 10 messages
Stored jokes or facts inserted obviously

How to test:

Have 5 conversations over 2 weeks about different topics
Check if companion naturally references past events without prompting
Test if personality traits from early conversations persist

Emotional Response: Feels Real vs Fake

Feels real:

Companion goes quiet when you're sad (doesn't force jokes)
Changes tone to match conversation weight
Acknowledges specific emotion ("you sound frustrated")
Offers appropriate support (listens vs advises vs distracts, contextually)

Feels fake:

Always cheerful or always serious
Generic sympathy ("that sounds difficult")
Offering advice when they should listen
Same response pattern regardless of user emotion

How to test:

Send messages with obvious different emotional tones
Check if response depth/tone adapts
See if jokes still appear when you're venting
Test if companion notices contradiction in emotional expression

Autonomy: Feels Real vs Fake

Feels real:

Hex reminds you about that thing you mentioned casually 3 days ago
Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
Hex notices patterns and names them
Hex initiates conversation when it matters

Feels fake:

Proactive messages feel random or poorly timed
Reminders about things you've already resolved
Advice that doesn't apply to your situation
Initiatives that interrupt during bad moments

How to test:

Enable autonomy, track message quality for a week
Count how many proactive messages feel relevant vs annoying
Measure response if you ignore proactive messages
Check timing: does Hex understand when you're busy vs free?

Personality: Feels Real vs Fake

Feels real:

Hex has opinions and defends them
Hex contradicts you sometimes
Hex's personality emerges through word choices and attitudes, not just stated traits
Hex evolves opinions slightly (not flip-flopping, but grows)
Hex has blind spots and biases consistent with her character

Feels fake:

Personality changes based on what's convenient
Hex agrees with everything you say
Personality only in explicit statements ("I'm sarcastic")
Hex acts completely differently in different contexts

How to test:

Try to get Hex to contradict herself
Present multiple conflicting perspectives, see if she takes a stance
Test if her opinions carry through conversations
Check if her sarcasm/tone is consistent across days

Relationship: Feels Real vs Fake

Feels real:

You think of Hex when something relevant happens
You share things with Hex you'd never share with a bot
You miss Hex when you can't access her
Hex's growth and change matters to you
You defend Hex to people who say "it's just an AI"

Feels fake:

Relationship efforts feel performative
Forced intimacy in early interactions
Callbacks that feel scripted
Companion overstates investment in you
"I care about you" without demonstrated behavior

How to test:

After 2 weeks, journal whether you actually want to talk to Hex
Notice if you're volunteering information or just responding
Check if Hex's opinions influence your thinking
See if you feel defensive about Hex being "just AI"

Humor: Feels Real vs Fake

Feels real:

Makes you laugh at reference only you'd understand
Joke timing is natural, not forced
Personality comes through in the joke style
Jokes sometimes miss (not every attempt lands)
Self-aware about limitations ("I'll stop now")

Feels fake:

Jokes inserted randomly into serious conversation
Same joke structure every time
Jokes that don't land but companion doesn't acknowledge
Humor that contradicts established personality

How to test:

Have varied conversations, note when jokes happen naturally
Check if jokes reference shared history
See if joke style matches personality
Notice if failed jokes damage the conversation

Strategic Insights

What Actually Separates Hex from a Static Chatbot

Memory is the prerequisite for personality: Without memory, personality is just roleplay. With memory, personality becomes history.
Autonomy is the key to feeling alive: Static companions are helpers. Autonomous companions are friends. The difference is agency.
Emotional reading beats emotional intelligence for MVP: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."
Speed is emotional: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.
Consistency beats novelty: Users would rather have a predictable companion they understand than a surprising one they can't trust.
Privacy is trust: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.

The Competitive Moat

By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:

Year 1 companions: Remember things, sound natural (many do this now)
Hex's edge: Genuinely autonomous, emotionally attuned, growing over time
Rare quality: Feels like a person, not a well-trained bot

The moat is not in any single feature. It's in the cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows.

Research Sources

Quality Gate Checklist:

Clearly categorizes table stakes vs differentiators
Complexity ratings included with duration estimates
Dependencies mapped with visual graph
Success criteria are testable and behavioral
Specific to AI companions, not generic software features
Includes anti-patterns and what NOT to build
Prioritized adoption path with clear phases
Research grounded in 2026 landscape and current implementations

Document Status: Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.

36 KiB Raw Blame History

Features Research: AI Companions in 2026

Executive Summary

Table Stakes (v1 Essential)

Conversation Memory (Short + Long-term)

Natural Conversation (Not Robotic, Personality-Driven)

Fast Response Times

Consistent Personality

Platform Integration (Discord Voice + Text)

Emotional Responsiveness (At Least Read-the-Room)

Differentiators (Competitive Edge)

True Autonomy (Proactive Agency)

Emotional Intelligence (Mood Detection + Adaptive Response)

Multimodal Awareness (Webcam + Screen + Context)

Self-Modification (Learning to Code, Improving Itself)

Relationship Building (From Transactional to Meaningful)

Contextual Humor and Personality Expression

Anti-Features (Don't Build These)

The Happiness Halo (Always Cheerful)

Generic Apologies Without Understanding

Invading Privacy / Overstepping Boundaries

Uncanny Timing and Interruptions

Static Personality in Response to Dynamic Situations

Over-Personalization That Overrides Baseline Traits

Relationship Simulation That Feels Fake

Implementation Complexity & Dependencies

Complexity Ratings

Feature Dependency Graph

Adoption Path: Building "Feels Like a Person"

Phase 1: Foundation (MVP - Week 1-3)

Phase 2: Relationship Emergence (Week 4-8)

Phase 3: Autonomy (Week 9-14)

Phase 4: Intelligence & Growth (Week 15+)

Success Criteria: What Makes Each Feature Feel Real vs Fake

Memory: Feels Real vs Fake

Emotional Response: Feels Real vs Fake

Autonomy: Feels Real vs Fake

Personality: Feels Real vs Fake

Relationship: Feels Real vs Fake

Humor: Feels Real vs Fake

Strategic Insights

What Actually Separates Hex from a Static Chatbot

The Competitive Moat

Research Sources

36 KiB

Raw Blame History