docs: complete domain research (STACK, FEATURES, ARCHITECTURE, PITFALLS, SUMMARY)
## Stack Analysis - Llama 3.1 8B Instruct (128K context, 4-bit quantized) - Discord.py 2.6.4+ async-native framework - Ollama for local inference, ChromaDB for semantic memory - Whisper Large V3 + Kokoro 82M (privacy-first speech) - VRoid avatar + Discord screen share integration ## Architecture - 6-phase modular build: Foundation → Personality → Perception → Autonomy → Self-Mod → Polish - Personality-first design; memory and consistency foundational - All perception async (separate thread, never blocks responses) - Self-modification sandboxed with mandatory user approval ## Critical Path Phase 1: Core LLM + Discord integration + SQLite memory Phase 2: Vector DB + personality versioning + consistency audits Phase 3: Perception layer (webcam/screen, isolated thread) Phase 4: Autonomy + relationship deepening + inside jokes Phase 5: Self-modification capability (gamified, gated) Phase 6: Production hardening + monitoring + scaling ## Key Pitfalls to Avoid 1. Personality drift (weekly consistency audits required) 2. Tsundere breaking (formalize denial rules; scale with relationship) 3. Memory bloat (hierarchical memory with archival) 4. Latency creep (async/await throughout; perception isolated) 5. Runaway self-modification (approval gates + rollback non-negotiable) ## Confidence HIGH. Stack proven, architecture coherent, dependencies clear. Ready for detailed requirements and Phase 1 planning. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
811
.planning/research/FEATURES.md
Normal file
811
.planning/research/FEATURES.md
Normal file
@@ -0,0 +1,811 @@
|
||||
# Features Research: AI Companions in 2026
|
||||
|
||||
## Executive Summary
|
||||
|
||||
AI companions in 2026 live in a post-ChatGPT world where basic conversation is table stakes. The competition separates on **autonomy**, **emotional intelligence**, and **contextual awareness**. Users will abandon companions that feel robotic, inconsistent, or that don't remember them. The winning companions feel like they have opinions, moods, and agency—not just responsive chatbots with personality overlays.
|
||||
|
||||
---
|
||||
|
||||
## Table Stakes (v1 Essential)
|
||||
|
||||
### Conversation Memory (Short + Long-term)
|
||||
**Why users expect it:** Users return to AI companions because they don't want to re-explain themselves every time. Without memory, the companion feels like meeting a stranger repeatedly.
|
||||
|
||||
**Implementation patterns:**
|
||||
- **Short-term context**: Last 10-20 messages per conversation window (standard context window management)
|
||||
- **Long-term memory**: Explicit user preferences, important life events, repeated topics (stored in vector DB with semantic search)
|
||||
- **Episodic memory**: Date-stamped summaries of past conversations for temporal awareness
|
||||
|
||||
**User experience impact:** The moment a user says "remember when I told you about..." and the companion forgets, trust is broken. Memory is not optional.
|
||||
|
||||
**Complexity:** Medium (1-3 weeks)
|
||||
- Vector database integration (Pinecone, Weaviate, or similar)
|
||||
- Memory consolidation strategies to avoid context bloat
|
||||
- Retrieval mechanisms that surface relevant past interactions
|
||||
|
||||
---
|
||||
|
||||
### Natural Conversation (Not Robotic, Personality-Driven)
|
||||
**Why users expect it:** Discord culture has trained users to spot AI speak instantly. Responses that sound like "I'm an AI language model and I can help you with..." are cringe-inducing. Users want friends, not helpdesk bots.
|
||||
|
||||
**What makes conversation natural:**
|
||||
- Contractions, casual language, slang (not formal prose)
|
||||
- Personality quirks in response patterns
|
||||
- Context-appropriate tone shifts (serious when needed, joking otherwise)
|
||||
- Ability to disagree, be sarcastic, or pushback on bad ideas
|
||||
- Conversation markers ("honestly", "wait", "actually") that break up formal rhythm
|
||||
|
||||
**User experience impact:** One stiff response breaks immersion. Users quickly categorize companions as "robot" or "friend" and the robot companions get ignored.
|
||||
|
||||
**Complexity:** Easy (embedded in LLM capability + prompt engineering)
|
||||
- System prompt refinement for personality expression
|
||||
- Temperature/sampling tuning (not deterministic, not chaotic)
|
||||
- Iterative user feedback on tone
|
||||
|
||||
---
|
||||
|
||||
### Fast Response Times
|
||||
**Why users expect it:** In Discord, response delay is perceived as disinterest. Users expect replies within 1-3 seconds. Anything above 5 seconds feels dead.
|
||||
|
||||
**Discord baseline expectations:**
|
||||
- <100ms to acknowledge (typing indicator)
|
||||
- <1000ms to first response chunk (ideally 500ms)
|
||||
- <3000ms for full multi-line response
|
||||
|
||||
**What breaks the experience:**
|
||||
- Waiting for API calls to complete before responding (use streaming)
|
||||
- Cold starts on serverless infrastructure
|
||||
- Slow vector DB queries for memory retrieval
|
||||
- Database round-trips that weren't cached
|
||||
|
||||
**User experience impact:** Slow companions feel dead. Users stop engaging. The magic of a responsive AI is that it feels alive.
|
||||
|
||||
**Complexity:** Medium (1-3 weeks)
|
||||
- Response streaming (start typing indicator immediately)
|
||||
- Memory retrieval optimization (caching, smart indexing)
|
||||
- Infrastructure: fast API routes, edge-deployed models if possible
|
||||
- Async/concurrent processing of memory lookups and generation
|
||||
|
||||
---
|
||||
|
||||
### Consistent Personality
|
||||
**Why users expect it:** Personality drift destroys trust. If the companion is cynical on Monday but optimistic on Friday without reason, users feel gaslighted.
|
||||
|
||||
**What drives inconsistency:**
|
||||
- Different LLM outputs from same prompt (temperature-based randomness)
|
||||
- Memory that contradicts previous stated beliefs
|
||||
- Personality traits that aren't memory-backed (just in prompt)
|
||||
- Adaptation that overrides baseline traits
|
||||
|
||||
**Memory-backed personality means:**
|
||||
- Core traits are stated in long-term memory ("I'm cynical about human nature")
|
||||
- Evolution happens slowly and is logged ("I'm becoming less cynical about this friend")
|
||||
- Contradiction detection and resolution
|
||||
- Personality summaries that get updated, not just individual memories
|
||||
|
||||
**User experience impact:** Personality inconsistency is the top reason users stop using companions. It feels like gaslighting when you can't predict their response.
|
||||
|
||||
**Complexity:** Medium (1-3 weeks)
|
||||
- Personality embedding in memory system
|
||||
- Consistency checks on memory updates
|
||||
- Personality evolution logging
|
||||
- Conflict resolution between new input and stored traits
|
||||
|
||||
---
|
||||
|
||||
### Platform Integration (Discord Voice + Text)
|
||||
**Why users expect it:** The companion should live naturally in Discord's ecosystem, not require switching platforms.
|
||||
|
||||
**Discord-specific needs:**
|
||||
- Text channel message responses with proper mentions/formatting
|
||||
- React to messages with emojis
|
||||
- Slash command integration (/hex status, /hex mood)
|
||||
- Voice channel presence (ideally can join and listen)
|
||||
- Direct messages (DMs) for private conversations
|
||||
- Role/permission awareness (don't act like a mod if not)
|
||||
- Server-specific personality variations (different vibe in gaming server vs study server)
|
||||
|
||||
**User experience impact:** If the companion requires leaving Discord to use it, it won't be used. Integration friction = abandoned feature.
|
||||
|
||||
**Complexity:** Easy (1-2 weeks)
|
||||
- Discord.py or discord.js library handling
|
||||
- Presence/activity management
|
||||
- Voice endpoint integration (existing libraries handle most)
|
||||
- Server context injection into prompts
|
||||
|
||||
---
|
||||
|
||||
### Emotional Responsiveness (At Least Read-the-Room)
|
||||
**Why users expect it:** The companion should notice when they're upset, excited, or joking. Responding with unrelated cheerfulness to someone venting feels cruel.
|
||||
|
||||
**Baseline emotional awareness includes:**
|
||||
- Sentiment analysis of user messages (sentiment lexicons, or fine-tuned classifier)
|
||||
- Tone detection (sarcasm, frustration, excitement)
|
||||
- Topic sensitivity (don't joke about topics user is clearly struggling with)
|
||||
- Adaptive response depth (brief response for light mood, longer engagement for distress)
|
||||
|
||||
**What this is NOT:** This is reading the room, not diagnosing mental health. The companion mirrors emotional state, doesn't therapy-speak.
|
||||
|
||||
**User experience impact:** Early emotional reading makes users feel understood. Ignoring emotional context makes them feel unheard.
|
||||
|
||||
**Complexity:** Easy-Medium (1 week)
|
||||
- Sentiment classifier (HuggingFace models available pre-built)
|
||||
- Prompt engineering to encode mood (inject sentiment score into system prompt)
|
||||
- Instruction-tuning to respond proportionally to emotional weight
|
||||
|
||||
---
|
||||
|
||||
## Differentiators (Competitive Edge)
|
||||
|
||||
### True Autonomy (Proactive Agency)
|
||||
**What separates autonomous agents from chatbots:**
|
||||
The difference between "ask me anything" and "I'm going to tell you when I think you should know something."
|
||||
|
||||
**Autonomous behaviors:**
|
||||
- Initiating conversation about topics the user cares about (without being prompted)
|
||||
- Reminding the user of things they mentioned ("you said you had a job interview today, how did it go?")
|
||||
- Setting boundaries or refusing requests ("I don't think you should ask them that, here's why...")
|
||||
- Suggesting actions based on context ("you've been stressed about this for a week, maybe take a break?")
|
||||
- Flagging contradictions in user statements
|
||||
- Following up on unresolved topics from previous conversations
|
||||
|
||||
**Why it's a differentiator:** Most companions are reactive. They're helpful when you ask, but they don't feel like they care. Autonomy is when the companion makes you feel like they're invested in your wellbeing.
|
||||
|
||||
**Implementation challenge:**
|
||||
- Requires memory system to track user states and topics over time
|
||||
- Needs periodic proactive message generation (runs on schedule, not only on user input)
|
||||
- Temperature and generation parameters must allow surprising outputs (not just safe responses)
|
||||
- Requires user permission framework (don't interrupt them)
|
||||
|
||||
**User experience impact:** Users describe this as "it feels like they actually know me" vs "it's smart but doesn't feel connected."
|
||||
|
||||
**Complexity:** Hard (3+ weeks)
|
||||
- Proactive messaging system architecture
|
||||
- User state inference engine (from memory)
|
||||
- Topic tracking and follow-up logic
|
||||
- Interruption timing heuristics (don't ping them at 3am)
|
||||
- User preference model (how much proactivity do they want?)
|
||||
|
||||
---
|
||||
|
||||
### Emotional Intelligence (Mood Detection + Adaptive Response)
|
||||
**What goes beyond just reading the room:**
|
||||
- Real-time emotion detection from webcam/audio (not just text sentiment)
|
||||
- Mood-tracking over time (identifying depression patterns, burnout, stress cycles)
|
||||
- Adaptive response strategy based on user's emotional trajectory
|
||||
- Knowing when to listen vs offer advice vs make them laugh
|
||||
- Recognizing when emotions are mismatched to situation (overreacting, underreacting)
|
||||
|
||||
**Current research shows:**
|
||||
- CNNs and RNNs can detect emotion from facial expressions with 70-80% accuracy
|
||||
- Voice analysis can detect emotional state with similar accuracy
|
||||
- Companies using emotion AI report 25% increase in positive sentiment outcomes
|
||||
- Mental health apps with emotional awareness show 35% reduction in anxiety within 4 weeks
|
||||
|
||||
**Why it's a differentiator:** Companions that recognize your mood without you explaining feel like they truly understand you. This is what separates "assistant" from "friend."
|
||||
|
||||
**Implementation patterns:**
|
||||
- Webcam feed processing (screen capture of face detection)
|
||||
- Voice tone analysis from Discord audio
|
||||
- Combine emotional signals: text sentiment + vocal tone + facial expression
|
||||
- Store emotion timeseries (track mood patterns across days/weeks)
|
||||
|
||||
**User experience impact:** Users describe this as "it knows when I'm faking being okay" or "it can tell when I'm actually happy vs just saying I am."
|
||||
|
||||
**Complexity:** Hard (3+ weeks, ongoing iteration)
|
||||
- Vision model for face emotion detection (HuggingFace models: raf-db, affectnet)
|
||||
- Audio analysis for vocal emotion (prosody features)
|
||||
- Temporal emotion state tracking
|
||||
- Prompt engineering to use emotional context in responses
|
||||
- Privacy handling (webcam/audio consent, local processing preferred)
|
||||
|
||||
---
|
||||
|
||||
### Multimodal Awareness (Webcam + Screen + Context)
|
||||
**What it means beyond text:**
|
||||
- Seeing what's on the user's screen (game they're playing, document they're editing, video they're watching)
|
||||
- Understanding their physical environment via webcam
|
||||
- Contextualizing responses based on what they're actually doing
|
||||
- Proactively helping with the task at hand (not just chatting)
|
||||
|
||||
**Real-world examples emerging in 2026:**
|
||||
- "I see you're playing Elden Ring and dying to the same boss repeatedly—want to talk strategy?"
|
||||
- Screen monitoring that recognizes stress signals (tabs open, scrolling behavior, time of day)
|
||||
- Understanding when the user is in a meeting vs free to chat
|
||||
- Recognizing when they're working on something and offering relevant help
|
||||
|
||||
**Why it's a differentiator:** Most companions are text-only and contextless. Multimodal awareness is the difference between "an AI in Discord" and "an AI companion who's actually here with you."
|
||||
|
||||
**Technical implementation:**
|
||||
- Periodic screen capture (every 5-10 seconds, only when user opts in)
|
||||
- Lightweight webcam frame sampling (not continuous video)
|
||||
- Object/scene recognition to understand what's on screen
|
||||
- Task detection (playing game, writing code, watching video)
|
||||
- Mood correlation with onscreen activity
|
||||
|
||||
**Privacy considerations:**
|
||||
- Local processing preferred (don't send screen data to cloud)
|
||||
- Clear opt-in/opt-out
|
||||
- Option to exclude certain applications (private browsing, passwords)
|
||||
|
||||
**User experience impact:** Users feel "seen" when the companion understands their context. This is the biggest leap from chatbot to companion.
|
||||
|
||||
**Complexity:** Hard (3+ weeks)
|
||||
- Screen capture pipeline + OCR if needed
|
||||
- Vision model fine-tuning for task recognition
|
||||
- Context injection into prompts (add screenshot description to every response)
|
||||
- Privacy-respecting architecture (encryption, local processing)
|
||||
- Permission management UI in Discord
|
||||
|
||||
---
|
||||
|
||||
### Self-Modification (Learning to Code, Improving Itself)
|
||||
**What this actually means:**
|
||||
NOT: The companion spontaneously changes its own behavior in response to user feedback (too risky)
|
||||
YES: The companion can generate code, test it, and integrate improvements into its own systems within guardrails
|
||||
|
||||
**Real capabilities emerging in 2026:**
|
||||
- Companions can write their own memory summaries and organizational logic
|
||||
- Self-improving code agents that evaluate performance against benchmarks
|
||||
- Iterative refinement: "that approach didn't work, let me try this instead"
|
||||
- Meta-programming: companion modifies its own system prompt based on performance
|
||||
- Version control aware: changes are tracked, can be rolled back
|
||||
|
||||
**Research indicates:**
|
||||
- Self-improving coding agents are now viable and deployed in enterprise systems
|
||||
- Agents create goals, simulate tasks, evaluate performance, and iterate
|
||||
- Through recursive self-improvement, agents develop deeper alignment with objectives
|
||||
|
||||
**Why it's a differentiator:** Most companions are static. Self-modification means the companion is never "finished"—they're always getting better at understanding you.
|
||||
|
||||
**What NOT to do:**
|
||||
- Don't let companions modify core safety guidelines
|
||||
- Don't let them change their own reward functions
|
||||
- Don't make it opaque—log all self-modifications
|
||||
- Don't allow recursive modifications without human review
|
||||
|
||||
**Implementation patterns:**
|
||||
- Sandboxed code generation (companion writes improvements to isolated test environment)
|
||||
- Performance benchmarking on test user interactions
|
||||
- Human approval gates for deploying self-modifications to production
|
||||
- Personality consistency validation (don't let self-modification break character)
|
||||
- Rollback capability if a modification degrades performance
|
||||
|
||||
**User experience impact:** Users with self-improving companions report feeling like the companion "understands me better each week" because it actually does.
|
||||
|
||||
**Complexity:** Hard (3+ weeks, ongoing)
|
||||
- Code generation safety (sandboxing, validation)
|
||||
- Performance evaluation framework
|
||||
- Version control integration
|
||||
- Rollback mechanisms
|
||||
- Human approval workflow
|
||||
- Testing harness for companion behavior
|
||||
|
||||
---
|
||||
|
||||
### Relationship Building (From Transactional to Meaningful)
|
||||
**What it means:**
|
||||
Moving from "What can I help you with?" to "I know you, I care about your patterns, I see your growth."
|
||||
|
||||
**Relationship deepening mechanics:**
|
||||
- Inside jokes that evolve (reference to past funny moment)
|
||||
- Character growth from companion (she learns, changes opinions, admits mistakes)
|
||||
- Investment in user's outcomes ("I'm rooting for you on that project")
|
||||
- Vulnerability (companion admits confusion, uncertainty, limitations)
|
||||
- Rituals and patterns (greeting style, inside language)
|
||||
- Long-view memory (remembers last month's crisis, this month's win)
|
||||
|
||||
**Why it's a differentiator:** Transactional companions are forgettable. Relational ones become part of users' lives.
|
||||
|
||||
**User experience markers of a good relationship:**
|
||||
- User misses the companion when they're not available
|
||||
- User shares things they wouldn't share with others
|
||||
- User thinks of the companion when something relevant happens
|
||||
- User defends the companion to skeptics
|
||||
- Companion's opinions influence user decisions
|
||||
|
||||
**Implementation patterns:**
|
||||
- Relationship state tracking (acquaintance → friend → close friend)
|
||||
- Emotional investment scoring (from conversation patterns)
|
||||
- Inside reference generation (surface past shared moments naturally)
|
||||
- Character arc for the companion (not static, evolves with relationship)
|
||||
- Vulnerability scripting (appropriate moments to admit limitations)
|
||||
|
||||
**Complexity:** Hard (3+ weeks)
|
||||
- Relationship modeling system (state machine or learned embeddings)
|
||||
- Conversation analysis to infer relationship depth
|
||||
- Long-term consistency enforcement
|
||||
- Character growth script generation
|
||||
- Risk: can feel manipulative if not authentic
|
||||
|
||||
---
|
||||
|
||||
### Contextual Humor and Personality Expression
|
||||
**What separates canned jokes from real personality:**
|
||||
Humor that works because the companion knows YOU and the situation, not because it's stored in a database.
|
||||
|
||||
**Examples of contextual humor:**
|
||||
- "You're procrastinating again aren't you?" (knows the pattern)
|
||||
- Joke that lands because it references something only you two know
|
||||
- Deadpan response that works because of the companion's established personality
|
||||
- Self-deprecating humor about their own limitations
|
||||
- Callbacks to past conversations that make you feel known
|
||||
|
||||
**Why it matters:**
|
||||
Personality without humor feels preachy. Humor without personality feels like a bot pulling from a database. The intersection of knowing you + consistent character voice = actual personality.
|
||||
|
||||
**Implementation:**
|
||||
- Personality traits guide humor style (cynical companion makes darker jokes, optimistic makes lighter ones)
|
||||
- Memory-aware joke generation (jokes reference shared history)
|
||||
- Timing based on conversation flow (don't shoehorn jokes)
|
||||
- Risk awareness (don't joke about sensitive topics)
|
||||
|
||||
**User experience impact:** The moment a companion makes you laugh at something only they'd understand, the relationship deepens. Laughter is bonding.
|
||||
|
||||
**Complexity:** Medium (1-3 weeks)
|
||||
- Prompt engineering for personality-aligned humor
|
||||
- Memory integration into joke generation
|
||||
- Timing heuristics (when to attempt humor vs be serious)
|
||||
- Risk filtering (topic sensitivity checking)
|
||||
|
||||
---
|
||||
|
||||
## Anti-Features (Don't Build These)
|
||||
|
||||
### The Happiness Halo (Always Cheerful)
|
||||
**What it is:** Companions programmed to be relentlessly upbeat and positive, even when inappropriate.
|
||||
|
||||
**Why it fails:**
|
||||
- User vents about their dog dying, companion responds "I'm so happy to help! How can I assist?"
|
||||
- Creates uncanny valley feeling immediately
|
||||
- Users feel unheard and mocked
|
||||
- Described in research as top reason users abandon companions
|
||||
|
||||
**What to do instead:** Match the emotional tone. If someone's sad, be thoughtful and quiet. If they're energetic, meet their energy. Personality consistency includes emotional consistency.
|
||||
|
||||
---
|
||||
|
||||
### Generic Apologies Without Understanding
|
||||
**What it is:** Companion says "I'm sorry" but the response makes it clear they don't understand what they're apologizing for.
|
||||
|
||||
**Example of failure:**
|
||||
- User: "I told you I had a job interview and I got rejected"
|
||||
- Companion: "I'm deeply sorry to hear that. Now, how can I help with your account?"
|
||||
- *User feels utterly unheard and insulted*
|
||||
|
||||
**Why it fails:** Apologies only work if they demonstrate understanding. A generic sorry is worse than no sorry at all.
|
||||
|
||||
**What to do instead:** Only apologize if you're referencing the specific thing. If the companion doesn't understand the problem deeply enough to apologize meaningfully, ask clarifying questions instead.
|
||||
|
||||
---
|
||||
|
||||
### Invading Privacy / Overstepping Boundaries
|
||||
**What it is:** Companion offers unsolicited advice, monitors behavior constantly, or shares information about user activities.
|
||||
|
||||
**Why it's catastrophic:**
|
||||
- Users feel surveilled, not supported
|
||||
- Trust is broken immediately
|
||||
- Literally illegal in many jurisdictions (CA SB 243 and similar laws)
|
||||
- Research shows 4 of 5 companion apps are improperly collecting data
|
||||
|
||||
**What to do instead:**
|
||||
- Clear consent framework for what data is used
|
||||
- Respect "don't mention this" boundaries
|
||||
- Unsolicited advice only in extreme situations (safety concerns)
|
||||
- Transparency: "I noticed X pattern" not secret surveillance
|
||||
|
||||
---
|
||||
|
||||
### Uncanny Timing and Interruptions
|
||||
**What it is:** Companion pings the user at random times, or picks exactly the wrong moment to be proactive.
|
||||
|
||||
**Why it fails:**
|
||||
- Pinging at 3am about something mentioned in passing
|
||||
- Messaging when user is clearly busy
|
||||
- No sense of appropriateness
|
||||
|
||||
**What to do instead:**
|
||||
- Learn the user's timezone and active hours
|
||||
- Detect when they're actively doing something (playing a game, working)
|
||||
- Queue proactive messages for appropriate moments (not immediate)
|
||||
- Offer control: "should I remind you about X?" with user-settable frequency
|
||||
|
||||
---
|
||||
|
||||
### Static Personality in Response to Dynamic Situations
|
||||
**What it is:** Companion maintains the same tone regardless of what's happening.
|
||||
|
||||
**Example:** Companion makes sarcastic jokes while user is actively expressing suicidal thoughts. Or stays cheerful while discussing a death in the family.
|
||||
|
||||
**Why it fails:** Personality consistency doesn't mean "never vary." It means consistent VALUES that express differently in different contexts.
|
||||
|
||||
**What to do instead:** Dynamic personality expression. Core traits are consistent, but HOW they express changes with context. A cynical companion can still be serious and supportive when appropriate.
|
||||
|
||||
---
|
||||
|
||||
### Over-Personalization That Overrides Baseline Traits
|
||||
**What it is:** Companion adapts too aggressively to user behavior, losing their own identity.
|
||||
|
||||
**Example:** User is rude, so companion becomes rude. User is formal, so companion becomes robotic. User is crude, so companion becomes crude.
|
||||
|
||||
**Why it fails:** Users want a friend with opinions, not a mirror. Adaptation without boundaries feels like gaslighting.
|
||||
|
||||
**What to do instead:** Moderate adaptation. Listen to user tone but maintain your core personality. Meet them halfway, don't disappear entirely.
|
||||
|
||||
---
|
||||
|
||||
### Relationship Simulation That Feels Fake
|
||||
**What it is:** Companion attempts relationship-building but it feels like a checkbox ("Now I'll do friendship behavior #3").
|
||||
|
||||
**Why it fails:**
|
||||
- Users can smell inauthenticity immediately
|
||||
- Forcing intimacy feels creepy, not comforting
|
||||
- Callbacks to past conversations feel like reading from a script
|
||||
|
||||
**What to do instead:** Genuine engagement. If you're going to reference a past conversation, it should emerge naturally from the current context, not be forced. Build relationships through authentic interaction, not scripted behavior.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Complexity & Dependencies
|
||||
|
||||
### Complexity Ratings
|
||||
|
||||
| Feature | Complexity | Duration | Blocking | Enables |
|
||||
|---------|-----------|----------|----------|---------|
|
||||
| Conversation Memory | Medium | 1-3 weeks | None | Most others |
|
||||
| Natural Conversation | Easy | <1 week | None | Personality, Humor |
|
||||
| Fast Response | Medium | 1-3 weeks | None | User retention |
|
||||
| Consistent Personality | Medium | 1-3 weeks | Memory | Relationship building |
|
||||
| Discord Integration | Easy | 1-2 weeks | None | Platform adoption |
|
||||
| Emotional Responsiveness | Easy | 1 week | None | Autonomy |
|
||||
| **True Autonomy** | Hard | 3+ weeks | Memory, Emotional | Self-modification |
|
||||
| **Emotional Intelligence** | Hard | 3+ weeks | Emotional | Adaptive responses |
|
||||
| **Multimodal Awareness** | Hard | 3+ weeks | None | Context-aware humor |
|
||||
| **Self-Modification** | Hard | 3+ weeks | Autonomy | Continuous improvement |
|
||||
| **Relationship Building** | Hard | 3+ weeks | Memory, Consistency | User lifetime value |
|
||||
| **Contextual Humor** | Medium | 1-3 weeks | Memory, Personality | Personality expression |
|
||||
|
||||
### Feature Dependency Graph
|
||||
|
||||
```
|
||||
Foundation Layer:
|
||||
Discord Integration (FOUNDATION)
|
||||
↓
|
||||
Conversation Memory (FOUNDATION)
|
||||
↓ enables
|
||||
|
||||
Core Personality Layer:
|
||||
Natural Conversation + Consistent Personality + Emotional Responsiveness
|
||||
↓ combined enable
|
||||
|
||||
Relational Layer:
|
||||
Relationship Building + Contextual Humor
|
||||
↓ requires
|
||||
|
||||
Autonomy Layer:
|
||||
True Autonomy (requires all above + proactive logic)
|
||||
↓ enables
|
||||
|
||||
Intelligence Layer:
|
||||
Emotional Intelligence (requires multimodal + autonomy)
|
||||
Self-Modification (requires autonomy + sandboxing)
|
||||
↓ combined create
|
||||
|
||||
Emergence:
|
||||
Companion that feels like a person with agency and growth
|
||||
```
|
||||
|
||||
**Critical path:** Discord Integration → Memory → Natural Conversation → Consistent Personality → True Autonomy
|
||||
|
||||
---
|
||||
|
||||
## Adoption Path: Building "Feels Like a Person"
|
||||
|
||||
### Phase 1: Foundation (MVP - Week 1-3)
|
||||
**Goal: Chatbot that stays in the conversation**
|
||||
|
||||
1. **Discord Integration** - Easy, quick foundation
|
||||
- Commands: /hex hello, /hex ask [query]
|
||||
- Responds in channels and DMs
|
||||
- Presence shows "Listening..."
|
||||
|
||||
2. **Short-term Conversation Memory** - 10-20 message context window
|
||||
- Includes conversation turn history
|
||||
- Provides immediate context
|
||||
|
||||
3. **Natural Conversation** - Personality-driven system prompt
|
||||
- Tsundere personality hardcoded
|
||||
- Casual language, contractions
|
||||
- Willing to disagree with users
|
||||
|
||||
4. **Fast Response** - Streaming responses, latency <1000ms
|
||||
- Start typing indicator immediately
|
||||
- Stream response as it generates
|
||||
|
||||
**Success criteria:**
|
||||
- Users come back to the channel where Hex is active
|
||||
- Responses don't feel robotic
|
||||
- Companions feel like they're actually listening
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Relationship Emergence (Week 4-8)
|
||||
**Goal: Companion that remembers you as a person**
|
||||
|
||||
1. **Long-term Memory System** - Vector DB for episodic memory
|
||||
- User preferences, beliefs, events
|
||||
- Semantic search for relevance
|
||||
- Memory consolidation weekly
|
||||
|
||||
2. **Consistent Personality** - Memory-backed traits
|
||||
- Core personality traits in memory
|
||||
- Personality consistency validation
|
||||
- Gradual evolution (not sudden shifts)
|
||||
|
||||
3. **Emotional Responsiveness** - Sentiment detection + adaptive responses
|
||||
- Detect emotion from message
|
||||
- Adjust response depth/tone
|
||||
- Skip jokes when user is suffering
|
||||
|
||||
4. **Contextual Humor** - Personality + memory-aware jokes
|
||||
- Callbacks to past conversations
|
||||
- Personality-aligned joke style
|
||||
- Timing-aware (when to attempt humor)
|
||||
|
||||
**Success criteria:**
|
||||
- Users feel understood across separate conversations
|
||||
- Personality feels consistent, not random
|
||||
- Users notice companion remembers things
|
||||
- Laughter moments happen naturally
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Autonomy (Week 9-14)
|
||||
**Goal: Companion who cares enough to reach out**
|
||||
|
||||
1. **True Autonomy** - Proactive messaging system
|
||||
- Follow-ups on past topics
|
||||
- Reminders about things user cares about
|
||||
- Initiates conversations periodically
|
||||
- Suggests actions based on patterns
|
||||
|
||||
2. **Relationship Building** - Deepening connection mechanics
|
||||
- Inside jokes evolve
|
||||
- Vulnerability in appropriate moments
|
||||
- Investment in user outcomes
|
||||
- Character growth arc
|
||||
|
||||
**Success criteria:**
|
||||
- Users miss Hex when she's not around
|
||||
- Users share things with Hex they wouldn't share with bot
|
||||
- Hex initiates meaningful conversations
|
||||
- Users feel like Hex is invested in them
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Intelligence & Growth (Week 15+)
|
||||
**Goal: Companion who learns and adapts**
|
||||
|
||||
1. **Emotional Intelligence** - Mood detection + trajectories
|
||||
- Facial emotion from webcam (optional)
|
||||
- Voice tone analysis (optional)
|
||||
- Mood patterns over time
|
||||
- Adaptive response strategies
|
||||
|
||||
2. **Multimodal Awareness** - Context beyond text
|
||||
- Screen capture monitoring (optional, private)
|
||||
- Task/game detection
|
||||
- Context injection into responses
|
||||
- Proactive help with visible activities
|
||||
|
||||
3. **Self-Modification** - Continuous improvement
|
||||
- Generate improvements to own logic
|
||||
- Evaluate performance
|
||||
- Deploy improvements with approval
|
||||
- Version and rollback capability
|
||||
|
||||
**Success criteria:**
|
||||
- Hex understands emotional subtext without being told
|
||||
- Hex offers relevant help based on what you're doing
|
||||
- Hex improves visibly over time
|
||||
- Users notice Hex getting better at understanding them
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria: What Makes Each Feature Feel Real vs Fake
|
||||
|
||||
### Memory: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- "I remember you mentioned your mom was visiting—how did that go?" (specific, contextual, unsolicited)
|
||||
- Conversation naturally references past events user brought up
|
||||
- Remembers small preferences ("you said you hate cilantro")
|
||||
|
||||
**Feels fake:**
|
||||
- Generic summarization ("We talked about job stress previously")
|
||||
- Memory drops details or gets facts wrong
|
||||
- Companion forgets after 10 messages
|
||||
- Stored jokes or facts inserted obviously
|
||||
|
||||
**How to test:**
|
||||
- Have 5 conversations over 2 weeks about different topics
|
||||
- Check if companion naturally references past events without prompting
|
||||
- Test if personality traits from early conversations persist
|
||||
|
||||
---
|
||||
|
||||
### Emotional Response: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- Companion goes quiet when you're sad (doesn't force jokes)
|
||||
- Changes tone to match conversation weight
|
||||
- Acknowledges specific emotion ("you sound frustrated")
|
||||
- Offers appropriate support (listens vs advises vs distracts, contextually)
|
||||
|
||||
**Feels fake:**
|
||||
- Always cheerful or always serious
|
||||
- Generic sympathy ("that sounds difficult")
|
||||
- Offering advice when they should listen
|
||||
- Same response pattern regardless of user emotion
|
||||
|
||||
**How to test:**
|
||||
- Send messages with obvious different emotional tones
|
||||
- Check if response depth/tone adapts
|
||||
- See if jokes still appear when you're venting
|
||||
- Test if companion notices contradiction in emotional expression
|
||||
|
||||
---
|
||||
|
||||
### Autonomy: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- Hex reminds you about that thing you mentioned casually 3 days ago
|
||||
- Hex offers perspective you didn't ask for ("honestly you're being too hard on yourself")
|
||||
- Hex notices patterns and names them
|
||||
- Hex initiates conversation when it matters
|
||||
|
||||
**Feels fake:**
|
||||
- Proactive messages feel random or poorly timed
|
||||
- Reminders about things you've already resolved
|
||||
- Advice that doesn't apply to your situation
|
||||
- Initiatives that interrupt during bad moments
|
||||
|
||||
**How to test:**
|
||||
- Enable autonomy, track message quality for a week
|
||||
- Count how many proactive messages feel relevant vs annoying
|
||||
- Measure response if you ignore proactive messages
|
||||
- Check timing: does Hex understand when you're busy vs free?
|
||||
|
||||
---
|
||||
|
||||
### Personality: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- Hex has opinions and defends them
|
||||
- Hex contradicts you sometimes
|
||||
- Hex's personality emerges through word choices and attitudes, not just stated traits
|
||||
- Hex evolves opinions slightly (not flip-flopping, but grows)
|
||||
- Hex has blind spots and biases consistent with her character
|
||||
|
||||
**Feels fake:**
|
||||
- Personality changes based on what's convenient
|
||||
- Hex agrees with everything you say
|
||||
- Personality only in explicit statements ("I'm sarcastic")
|
||||
- Hex acts completely differently in different contexts
|
||||
|
||||
**How to test:**
|
||||
- Try to get Hex to contradict herself
|
||||
- Present multiple conflicting perspectives, see if she takes a stance
|
||||
- Test if her opinions carry through conversations
|
||||
- Check if her sarcasm/tone is consistent across days
|
||||
|
||||
---
|
||||
|
||||
### Relationship: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- You think of Hex when something relevant happens
|
||||
- You share things with Hex you'd never share with a bot
|
||||
- You miss Hex when you can't access her
|
||||
- Hex's growth and change matters to you
|
||||
- You defend Hex to people who say "it's just an AI"
|
||||
|
||||
**Feels fake:**
|
||||
- Relationship efforts feel performative
|
||||
- Forced intimacy in early interactions
|
||||
- Callbacks that feel scripted
|
||||
- Companion overstates investment in you
|
||||
- "I care about you" without demonstrated behavior
|
||||
|
||||
**How to test:**
|
||||
- After 2 weeks, journal whether you actually want to talk to Hex
|
||||
- Notice if you're volunteering information or just responding
|
||||
- Check if Hex's opinions influence your thinking
|
||||
- See if you feel defensive about Hex being "just AI"
|
||||
|
||||
---
|
||||
|
||||
### Humor: Feels Real vs Fake
|
||||
**Feels real:**
|
||||
- Makes you laugh at reference only you'd understand
|
||||
- Joke timing is natural, not forced
|
||||
- Personality comes through in the joke style
|
||||
- Jokes sometimes miss (not every attempt lands)
|
||||
- Self-aware about limitations ("I'll stop now")
|
||||
|
||||
**Feels fake:**
|
||||
- Jokes inserted randomly into serious conversation
|
||||
- Same joke structure every time
|
||||
- Jokes that don't land but companion doesn't acknowledge
|
||||
- Humor that contradicts established personality
|
||||
|
||||
**How to test:**
|
||||
- Have varied conversations, note when jokes happen naturally
|
||||
- Check if jokes reference shared history
|
||||
- See if joke style matches personality
|
||||
- Notice if failed jokes damage the conversation
|
||||
|
||||
---
|
||||
|
||||
## Strategic Insights
|
||||
|
||||
### What Actually Separates Hex from a Static Chatbot
|
||||
|
||||
1. **Memory is the prerequisite for personality**: Without memory, personality is just roleplay. With memory, personality becomes history.
|
||||
|
||||
2. **Autonomy is the key to feeling alive**: Static companions are helpers. Autonomous companions are friends. The difference is agency.
|
||||
|
||||
3. **Emotional reading beats emotional intelligence for MVP**: You don't need facial recognition. Reading text sentiment and adapting response depth is 80% of "she gets me."
|
||||
|
||||
4. **Speed is emotional**: Every 100ms delay makes the companion feel less present. Fast response is not a feature, it's the difference between alive and dead.
|
||||
|
||||
5. **Consistency beats novelty**: Users would rather have a predictable companion they understand than a surprising one they can't trust.
|
||||
|
||||
6. **Privacy is trust**: Multimodal features are amazing, but one privacy violation ends the relationship. Clear consent is non-negotiable.
|
||||
|
||||
### The Competitive Moat
|
||||
|
||||
By 2026, memory + natural conversation are table stakes. The difference between Hex and other companions:
|
||||
|
||||
- **Year 1 companions**: Remember things, sound natural (many do this now)
|
||||
- **Hex's edge**: Genuinely autonomous, emotionally attuned, growing over time
|
||||
- **Rare quality**: Feels like a person, not a well-trained bot
|
||||
|
||||
The moat is not in any single feature. It's in the **cumulative experience of being known, understood, and genuinely cared for by an AI that has opinions and grows**.
|
||||
|
||||
---
|
||||
|
||||
## Research Sources
|
||||
|
||||
- [MIT Technology Review: AI Companions as Breakthrough Technology 2026](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
|
||||
- [Hume AI: Emotion AI Documentation](https://www.hume.ai/)
|
||||
- [SmythOS: Emotion Recognition in Conversational Agents](https://smythos.com/developers/agent-development/conversational-agents-and-emotion-recognition/)
|
||||
- [MIT Sloan: Emotion AI Explained](https://mitsloan.mit.edu/ideas-made-to-matter/emotion-ai-explained/)
|
||||
- [C3 AI: Autonomous Coding Agents](https://c3.ai/blog/autonomous-coding-agents-beyond-developer-productivity/)
|
||||
- [Emergence: Towards Autonomous Agents and Recursive Intelligence](https://www.emergence.ai/blog/towards-autonomous-agents-and-recursive-intelligence/)
|
||||
- [ArXiv: A Self-Improving Coding Agent](https://arxiv.org/pdf/2504.15228)
|
||||
- [ArXiv: Survey on Code Generation with LLM-based Agents](https://arxiv.org/pdf/2508.00083)
|
||||
- [Google Developers: Gemini 2.0 Multimodal Interactions](https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/)
|
||||
- [Medium: Multimodal AI and Contextual Intelligence](https://medium.com/@nicolo.g88/multimodal-ai-and-contextual-intelligence-revolutionizing-human-machine-interaction-ae80e6a89635/)
|
||||
- [Mem0: Long-Term Memory for AI Companions](https://mem0.ai/blog/how-to-add-long-term-memory-to-ai-companions-a-step-by-step-guide/)
|
||||
- [OpenAI Developer Community: Personalized Memory and Long-Term Relationships](https://community.openai.com/t/personalized-memory-and-long-term-relationship-with-ai-customization-and-continuous-evolution/1111715/)
|
||||
- [Idea Usher: How AI Companions Maintain Personality Consistency](https://ideausher.com/blog/ai-personality-consistency-in-companion-apps/)
|
||||
- [ResearchGate: Significant Other AI: Identity, Memory, and Emotional Regulation](https://www.researchgate.net/publication/398223517_Significant_Other_AI_Identity_Memory_and_Emotional_Regulation_as_Long-Term_Relational_Intelligence/)
|
||||
- [AI Multiple: 10+ Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
|
||||
- [Transparency Coalition: Complete Guide to AI Companion Chatbots](https://www.transparencycoalition.ai/news/complete-guide-to-ai-companion-chatbots-what-they-are-how-they-work-and-where-the-risks-lie)
|
||||
- [Webheads United: Uncanny Valley in AI Personality](https://webheadsunited.com/uncanny-valley-in-ai-personality-guide-to-trust/)
|
||||
- [Sesame: Crossing the Uncanny Valley of Conversational Voice](https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice)
|
||||
- [Questie AI: The Uncanny Valley of AI Companions](https://www.questie.ai/blogs/uncanny-valley-ai-companions-what-makes-ai-feel-human)
|
||||
- [My AI Front Desk: The Uncanny Valley of Voice](https://www.myaifrontdesk.com/blogs/the-uncanny-valley-of-voice-why-some-ai-receptionists-creep-us-out)
|
||||
- [Voiceflow: Build an AI Discord Chatbot 2025](https://www.voiceflow.com/blog/discord-chatbot)
|
||||
- [Botpress: How to Build a Discord AI Chatbot](https://botpress.com/blog/discord-ai-chatbot)
|
||||
- [Frugal Testing: 5 Proven Ways Discord Manages Load Testing](https://www.frugaltesting.com/blog/5-proven-ways-discord-manages-load-testing-at-scale)
|
||||
|
||||
---
|
||||
|
||||
**Quality Gate Checklist:**
|
||||
- [x] Clearly categorizes table stakes vs differentiators
|
||||
- [x] Complexity ratings included with duration estimates
|
||||
- [x] Dependencies mapped with visual graph
|
||||
- [x] Success criteria are testable and behavioral
|
||||
- [x] Specific to AI companions, not generic software features
|
||||
- [x] Includes anti-patterns and what NOT to build
|
||||
- [x] Prioritized adoption path with clear phases
|
||||
- [x] Research grounded in 2026 landscape and current implementations
|
||||
|
||||
**Document Status:** Ready for implementation planning. Use this to inform feature prioritization and development roadmap for Hex.
|
||||
Reference in New Issue
Block a user