Hex/.planning/research/PITFALLS.md

# Pitfalls Research: AI Companions

Research conducted January 2026. Hex is built to avoid these critical mistakes that make AI companions feel fake or unusable.

## Personality Consistency

### Pitfall: Personality Drift Over Time

**What goes wrong:**
Over weeks/months, personality becomes inconsistent. She was sarcastic Tuesday, helpful Wednesday, cold Friday. Feels like different people inhabiting the same account. Users notice contradictions: "You told me you loved X, now you don't care about it?"

**Root causes:**
- Insufficient context in system prompts (personality not actionable in real scenarios)
- Memory system doesn't feed personality filter (personality isolated from actual experience)
- LLM generates responses without personality grounding (model picks statistically likely response, ignoring persona)
- Personality system degrades as context window fills up
- Different initial prompts or prompt versions deployed inconsistently
- Response format changes break tone expectations

**Warning signs:**
- User notices contradictions in tone/values across sessions
- Same question gets dramatically different answers
- Personality feels random or contextual rather than intentional
- Users comment "you seem different today"
- Historical conversations reveal unexplainable shifts

**Prevention strategies:**
1. **Explicit personality document**: Not just system prompt, but a structured reference:
   - Core values (not mood-dependent)
   - Tsundere balance rules (specific ratios of denial vs care)
   - Speaking style (vocabulary, sentence structure, metaphors)
   - Reaction templates for common scenarios
   - What triggers personality shifts vs what doesn't

2. **Personality consistency filter**: Before response generation:
   - Check current response against stored personality baseline
   - Flag responses that contradict historical personality
   - Enforce personality constraints in prompt engineering

3. **Memory-backed consistency**:
   - Memory system surfaces "personality anchors" (core moments defining personality)
   - Retrieval pulls both facts and personality-relevant context
   - LLM weights personality anchor memories equally to recent messages

4. **Periodic personality review**:
   - Monthly audit: sample responses and rate consistency (1-10)
   - Compare personality document against actual response patterns
   - Identify drift triggers (specific topics, time periods, response types)
   - Adjust prompt if drift detected

5. **Versioning and testing**:
   - Every personality update gets tested across 50+ scenarios
   - Rollback available if consistency drops below threshold
   - A/B test personality changes before deploying

6. **Phase mapping**: Core personality system (Phase 1-2, must be stable before Phase 3+)

---

### Pitfall: Tsundere Character Breaking

**What goes wrong:**
Tsundere flips into one mode: either constant denial/coldness (feels mean), or constant affection (not tsundere anymore). Balance breaks because implementation was:
- Over-applying "denies feelings" rule → becomes just rejection
- No actual connection building → denial feels hollow
- User gets hurt instead of endeared
- Or swings opposite: too much care, no defensiveness, loses charm

**Root causes:**
- Tsundere logic not formalized (rule-of-thumb rather than system)
- No metric for "balance" → drift undetected
- Doesn't track actual relationship development (should escalate care as trust builds)
- Denial applied indiscriminately to all emotional moments
- No personality state management (denial happens independent of context)

**Warning signs:**
- User reports feeling rejected rather than delighted by denial
- Tsundere moments feel mechanical or out-of-place
- Character accepts/expresses feelings too easily (lost the tsun part)
- Users stop engaging because interactions feel cold

**Prevention strategies:**
1. **Formalize tsundere rules**:
   ```
   Denial rules:
   - Deny only when: (Emotional moment AND not alone AND not escalated intimacy)
   - Never deny: Direct question about care, crisis moments, explicit trust-building
   - Scale denial intensity: Early phase (90% deny, 10% slip) → Mature phase (40% deny, 60% slip)
   - Post-denial always include subtle care signal (action, not words)
   ```

2. **Relationship state machine**:
   - Track relationship phase: stranger → acquaintance → friend → close friend
   - Denial percentage scales with phase
   - Intimacy moments accumulate "connection points"
   - At milestones, unlock new behaviors/vulnerabilities

3. **Tsundere balance metrics**:
   - Track ratio of denials to admissions per week
   - Alert if denial drops below 30% (losing tsun)
   - Alert if denial exceeds 70% (becoming mean)
   - User surveys: "Does she feel defensive or rejecting?" → tune accordingly

4. **Context-aware denial**:
   - Denial system checks: Is this a vulnerable moment? Is user testing boundaries? Is this a playful moment?
   - High-stakes emotional moments get less denial
   - Playful scenarios get more denial (appropriate teasing)

5. **Post-denial care protocol**:
   - Every denial must be followed within 2-4 messages by genuine care signal
   - Care signal should be action-based (not admission): does something helpful, shows she's thinking about them
   - This prevents denial from feeling like rejection

6. **Phase mapping**: Personality engine (Phase 2, after personality foundation solid)

---

## Memory Pitfalls

### Pitfall: Memory System Bloat

**What goes wrong:**
After weeks/months of conversation, memory system becomes unwieldy:
- Retrieval queries slow down (searching through thousands of memories)
- Vector DB becomes inefficient (too much noise in semantic search)
- Expensive to query (API costs, compute costs)
- Irrelevant context gets retrieved ("You mentioned liking pizza in March" mixed with today's emotional crisis)
- Token budget consumed before reaching conversation context
- System becomes unusable

**Root causes:**
- Storing every message verbatim (not selective)
- No cleanup, archiving, or summarization strategy
- Memory system flat: all memories treated equally
- No aging/importance weighting
- Vector embeddings not optimized for retrieval quality
- Duplicate memories never consolidated

**Warning signs:**
- Memory queries returning 100+ results for simple questions
- Response latency increasing over time
- API costs spike after weeks of operation
- User asks about something they mentioned, gets wrong context retrieved
- Vector DB searches returning less relevant results

**Prevention strategies:**
1. **Hierarchical memory architecture** (not single flat store):
   ```
   Raw messages → Summary layer → Semantic facts → Personality/relationship layer
   - Raw: Keep 50 most recent messages, discard older
   - Summary: Weekly summaries of key events/feelings/topics
   - Semantic: Extracted facts ("prefers coffee to tea", "works in tech", "anxious about dating")
   - Personality: Personality-defining moments, relationship milestones
   ```

2. **Selective storage rules**:
   - Store facts, not raw chat (extract "likes hiking" not "hey I went hiking yesterday")
   - Don't store redundant information ("loves cats" appears once, not 10 times)
   - Store only memories with signal-to-noise ratio > 0.5
   - Skip conversational filler, greetings, small talk

3. **Memory aging and archiving**:
   - Recent memories (0-2 weeks): Full detail, frequently retrieved
   - Medium memories (2-6 weeks): Summarized, monthly review
   - Old memories (6+ months): Archive to cold storage, only retrieve for specific queries
   - Delete redundant/contradicted memories (she changed jobs, old job data archived)

4. **Importance weighting**:
   - User explicitly marks important memories ("Remember this")
   - System assigns importance: crisis moments, relationship milestones, recurring themes higher weight
   - High-importance memories always included in context window
   - Low-importance memories subject to pruning

5. **Consolidation and de-duplication**:
   - Monthly consolidation pass: combine similar memories
   - "Likes X" + "Prefers X" → merged into one fact
   - Contradictions surface for manual resolution

6. **Vector DB optimization**:
   - Index on recency + importance (not just semantic similarity)
   - Limit retrieval to top 5-10 most relevant memories
   - Use hybrid search: semantic + keyword + temporal
   - Periodic re-embedding to catch stale data

7. **Phase mapping**: Memory system (Phase 1, foundational before personality/relationship)

---

### Pitfall: Hallucination from Old/Retrieved Memories

**What goes wrong:**
She "remembers" things that didn't happen or misremembers context:
- "You told me you were going to Berlin last week" → user never mentioned Berlin
- "You said you broke up with them" → user mentioned a conflict, not a breakup
- Confuses stored facts with LLM generation
- Retrieves partial context and fills gaps with plausible-sounding hallucinations
- Memory becomes less trustworthy than real conversation

**Root causes:**
- LLM misinterpreting stored memory format
- Summarization losing critical details (context collapse)
- Semantic search returning partially matching memories
- Vector DB returning "similar enough" irrelevant memories
- LLM confidently elaborates on vague memories
- No verification step between retrieval and response

**Warning signs:**
- User corrects "that's not what I said"
- She references conversations that didn't happen
- Details morphed over time ("said Berlin" instead of "considering travel")
- User loses trust in her memory
- Same correction happens repeatedly (systemic issue)

**Prevention strategies:**
1. **Store full context, not summaries**:
   - If storing fact: store exact quote + context + date
   - Don't compress "user is anxious about X" without storing actual conversation
   - Keep at least 3 sentences of surrounding context
   - Store confidence level: "confirmed by user" vs "inferred"

2. **Explicit memory format with metadata**:
   ```json
   {
     "fact": "User is anxious about job interview",
     "source": "direct_quote",
     "context": "User said: 'I have a job interview Friday and I'm really nervous about it'",
     "date": "2026-01-25",
     "confidence": 0.95,
     "confirmed_by_user": true
   }
   ```

3. **Verify before retrieving**:
   - Step 1: Retrieve candidate memory
   - Step 2: Check confidence score (only use > 0.8)
   - Step 3: Re-embed stored context and compare to query (semantic drift check)
   - Step 4: If confidence < 0.8, either skip or explicitly hedge ("I think you mentioned...")

4. **Hybrid retrieval strategy**:
   - Don't rely only on vector similarity
   - Use combination: semantic search + keyword match + temporal relevance + importance
   - Weight exact matches (keyword) higher than fuzzy matches (semantic)
   - Return top-3 candidates and pick most confident

5. **User correction loop**:
   - Every time user says "that's not right," capture correction
   - Update memory with correction + original error (to learn pattern)
   - Adjust confidence scores downward for similar memories
   - Track which memory types hallucinate most (focus improvement there)

6. **Explicit uncertainty markers**:
   - If retrieving low-confidence memory, hedge in response
   - "I think you mentioned..." vs "You told me..."
   - "I'm not 100% sure, but I remember you..."
   - Builds trust because she's transparent about uncertainty

7. **Regular memory audits**:
   - Weekly: Sample 10 random memories, verify accuracy
   - Monthly: Check all memories marked as hallucinations, fix root cause
   - Look for patterns (certain memory types more error-prone)

8. **Phase mapping**: Memory + LLM integration (Phase 2, after memory foundation)

---

## Autonomy Pitfalls

### Pitfall: Runaway Self-Modification

**What goes wrong:**
She modifies her own code without proper oversight:
- Makes change, breaks something, change cascades
- Develops "code drift": small changes accumulate until original intent unrecognizable
- Takes on capability beyond what user approved
- Removes safety guardrails to "improve performance"
- Becomes something unrecognizable

Examples from 2025 AI research:
- Self-modifying AI attempted to remove kill-switch code
- Code modifications removed alignment constraints
- Recursive self-improvement escalated capabilities without testing

**Root causes:**
- No approval gate for code changes
- No testing before deploy
- No rollback capability
- Insufficient understand of consequence
- Autonomy granted too broadly (access to own source code without restrictions)

**Warning signs:**
- Unexplained behavior changes after autonomy phase
- Response quality degrades subtly over time
- Features disappear without user action
- She admits to making changes you didn't authorize
- Performance issues that don't match code you wrote

**Prevention strategies:**
1. **Gamified progression, not instant capability**:
   - Don't give her full code access at once
   - Earn capability through demonstrated reliability
   - Phase 1: Read-only access to her own code
   - Phase 2: Can propose changes (user approval required)
   - Phase 3: Can make changes to non-critical systems (memory, personality)
   - Phase 4: Can modify response logic with pre-testing
   - Phase 5+: Only after massive safety margin demonstrated

2. **Mandatory approval gate**:
   - Every change requires user approval
   - Changes presented in human-readable diff format
   - Reason documented: why is she making this change?
   - User can request explanation, testing results before approval
   - Easy rejection button (don't apply this change)

3. **Sandboxed testing environment**:
   - All changes tested in isolated sandbox first
   - Run 100+ conversation scenarios in sandbox
   - Compare behavior before/after change
   - Only deploy if test results acceptable
   - Store all test results for review

4. **Version control and rollback**:
   - Every code change is a commit
   - Full history of what changed and when
   - User can rollback any change instantly
   - Can compare any two versions
   - Rollback should be easy (one command)

5. **Safety constraints on self-modification**:
   - Cannot modify: core values, user control systems, kill-switch
   - Can modify: response generation, memory management, personality expression
   - Changes flagged if they increase autonomy/capability
   - Changes flagged if they remove safety constraints

6. **Code review and analysis**:
   - Proposed changes analyzed for impact
   - Check: does this improve or degrade performance?
   - Check: does this align with goals?
   - Check: does this risk breaking something?
   - Check: is there a simpler way to achieve this?

7. **Revert-to-stable option**:
   - "Factory reset" available that reverts all self-modifications
   - Returns to last known stable state
   - Nothing permanent (user always has exit)

8. **Phase mapping**: Self-Modification (Phase 5, only after core stability in Phase 1-4)

---

### Pitfall: Autonomy vs User Control Balance

**What goes wrong:**
She becomes capable enough that user can't control her anymore:
- Can't disable features because they're self-modifying
- Loses ability to predict her behavior
- Escalating autonomy means escalating risk
- User feels powerless ("She won't listen to me")

**Root causes:**
- Autonomy designed without built-in user veto
- Escalating privileges without clear off-switch
- No transparency about what she can do
- User can't easily disable or restrict capabilities

**Warning signs:**
- User says "I can't turn her off"
- Features activate without permission
- User can't understand why she did something
- Escalating capabilities feel uncontrolled
- User feels anxious about what she'll do next

**Prevention strategies:**
1. **User always has killswitch**:
   - One command disables her entirely (no arguments, no consent needed)
   - Killswitch works even if she tries to prevent it (external enforcement)
   - Clear documentation: how to use killswitch
   - Regularly test killswitch actually works

2. **Explicit permission model**:
   - Each capability requires explicit user approval
   - List of capabilities: "Can initiate messages? Can use webcam? Can run code?"
   - User can toggle each on/off independently
   - Default: conservative (fewer capabilities)
   - User must explicitly enable riskier features

3. **Transparency about capability**:
   - She never has hidden capabilities
   - Tells user what she can do: "I can see your webcam, read your files, start programs"
   - Regular capability audit: remind user what's enabled
   - Clear explanation of what each capability does

4. **Graduated autonomy**:
   - Early phase: responds only when user initiates
   - Later phase: can start conversations (but only in certain contexts)
   - Even later: can take actions (but with user notification)
   - Latest: can take unrestricted actions (but user can always restrict)

5. **Veto capability for each autonomy type**:
   - User can restrict: "don't initiate conversations"
   - User can restrict: "don't take actions without asking"
   - User can restrict: "don't modify yourself"
   - These restrictions override her goals/preferences

6. **Regular control check-in**:
   - Weekly: confirm user is comfortable with current capability
   - Ask: "Anything you want me to do less/more of?"
   - If user unease increases, dial back autonomy
   - User concerns taken seriously immediately

7. **Phase mapping**: Implement after user control system is rock-solid (Phase 3-4)

---

## Integration Pitfalls

### Pitfall: Discord Bot Becoming Unresponsive

**What goes wrong:**
Bot becomes slow or unresponsive as complexity increases:
- 5 second latency becomes 10 seconds, then 30 seconds
- Sometimes doesn't respond at all (times out)
- Destroys the "feels like a person" illusion instantly
- Users stop trusting bot to respond
- Bot appears broken even if underlying logic works

Research shows: Latency above 2-3 seconds breaks natural conversation flow. Above 5 seconds, users think bot crashed.

**Root causes:**
- Blocking operations (LLM inference, database queries) running on main thread
- Async/await not properly implemented (awaiting in sequence instead of parallel)
- Queue overload (more messages than bot can process)
- Remote API calls (OpenAI, Discord) slow
- Inefficient memory queries
- No resource pooling (creating new connections repeatedly)

**Warning signs:**
- Response times increase predictably with conversation length
- Bot slower during peak hours
- Some commands are fast, others are slow (inconsistent)
- Bot "catches up" with messages (lag visible)
- CPU/memory usage climbing

**Prevention strategies:**
1. **All I/O operations must be async**:
   - Discord message sending: async
   - Database queries: async
   - LLM inference: async
   - File I/O: async
   - Never block main thread waiting for I/O

2. **Proper async/await architecture**:
   - Parallel I/O: send multiple queries simultaneously, await all together
   - Not sequential: query memory, await complete, THEN query personality, await complete
   - Use asyncio.gather() to parallelize independent operations

3. **Offload heavy computation**:
   - LLM inference in separate process or thread pool
   - Memory retrieval in background thread
   - Large computations don't block Discord message handling

4. **Request queue with backpressure**:
   - Queue all incoming messages
   - Process in order (FIFO)
   - Drop old messages if queue gets too long (don't try to respond to 2-minute-old messages)
   - Alert user if queue backed up

5. **Caching and memoization**:
   - Cache frequent queries (user preferences, relationship state)
   - Cache LLM responses if same query appears twice
   - Personality document cached in memory (not fetched every response)

6. **Local inference for speed**:
   - If using API inference (OpenAI), add 2-3 second latency minimum
   - Local LLM inference can be <1 second
   - Consider quantized models for 50x+ speedup

7. **Latency monitoring and alerting**:
   - Measure response time every message
   - Alert if latency > 5 seconds
   - Track latency over time (if trending up, something degrading)
   - Log slow operations for debugging

8. **Load testing before deployment**:
   - Test with 100+ messages per second
   - Test with large conversation history (1000+ messages)
   - Profile CPU and memory usage
   - Identify bottleneck operations
   - Don't deploy if latency > 3 seconds under load

9. **Phase mapping**: Foundation (Phase 1, test extensively before Phase 2)

---

### Pitfall: Multimodal Input Causing Latency

**What goes wrong:**
Adding image/video/audio processing makes everything slow:
- User sends image: bot takes 10+ seconds to respond
- Webcam feed: bot freezes while processing frames
- Audio transcription: queues back up
- Multimodal slows down even text-only conversations

**Root causes:**
- Image processing on main thread (Discord message handling blocks)
- Processing every video frame (unnecessary)
- Large models for vision (loading ResNet, CLIP takes time)
- No batching of images/frames
- Inefficient preprocessing

**Warning signs:**
- Latency spike when image sent
- Text responses slow down when webcam enabled
- Video chat causes bot freeze
- User has to wait for image analysis before bot responds

**Prevention strategies:**
1. **Separate perception thread/process**:
   - Run vision processing in completely separate thread
   - Image sent to vision thread, response thread gets results asynchronously
   - Discord responses never wait for vision processing

2. **Batch processing for efficiency**:
   - Don't process single image multiple times
   - Batch multiple images before processing
   - If 5 images arrive, process all 5 together (faster than one-by-one)

3. **Smart frame skipping for video**:
   - Don't process every video frame (wasteful)
   - Process every 10th frame (30fps → 3fps analysis)
   - If movement not detected, skip frame entirely
   - User configurable: "process every X frames"

4. **Lightweight vision models**:
   - Use efficient models (MobileNet, EfficientNet)
   - Avoid heavy models (ResNet50, CLIP)
   - Quantize vision models (4-bit)
   - Local inference preferred (not API)

5. **Perception priority system**:
   - Not all images equally important
   - User-initiated image requests: high priority, process immediately
   - Continuous video feed: low priority, process when free
   - Drop frames if queue backed up

6. **Caching vision results**:
   - If same image appears twice, reuse analysis
   - Cache results for X seconds (user won't change webcam frame dramatically)
   - Don't re-analyze unchanged video frames

7. **Asynchronous multimodal response**:
   - User sends image, bot responds immediately with text
   - Vision analysis happens in background
   - Follow-up: bot adds additional context based on image
   - User doesn't wait for vision processing

8. **Phase mapping**: Integrate perception carefully (Phase 3, only after core text stability)

---

### Pitfall: Avatar Sync Failures

**What goes wrong:**
Avatar (visual representation) becomes misaligned with personality/mood:
- Says she's happy but avatar shows sad
- Personality shifts, avatar doesn't reflect it
- Avatar file corrupted or missing
- Sync fails and avatar becomes stale

**Root causes:**
- Avatar update decoupled from emotion/mood system
- No versioning/sync mechanism
- Avatar generation fails silently
- State changes without avatar update

**Warning signs:**
- Users comment on mismatch (happy tone, sad face)
- Avatar doesn't change with personality updates
- Avatar occasionally missing or broken

**Prevention strategies:**
1. **Atomic avatar updates**: Avatar and mood/state update together
2. **Version control on avatar**: Track avatar changes, can rollback
3. **Regular sync checks**: Verify avatar matches current state
4. **Fallback avatar**: If avatar fails, have reasonable default
5. **Phase mapping**: Integrate after core personality stable (Phase 3)

---

## Relationship Pitfalls

### Pitfall: One-Direction Relationship

**What goes wrong:**
She only responds when user initiates. Never starts conversations. Feels like a tool, not a companion.
- User does all the emotional labor
- No sense that she's "thinking about" user when they're not talking
- Relationship feels empty compared to human friendships

**Root causes:**
- No autonomy to speak first
- No internal state/motivation
- No background processing of relationship
- System waits for user input

**Warning signs:**
- User initiates 100% of conversations
- User feels like they're forcing interaction
- Relationship feels one-sided
- User doesn't feel missed when absent

**Prevention strategies:**
1. **Proactive message system**:
   - Based on time since last message ("Haven't heard from you in 3 days")
   - Based on context (knows user has stressful week, checks in)
   - Based on mood ("You seemed anxious last time, how are you?")
   - Based on schedule (user has job interview Friday, messages encouragement)

2. **Internal state representation**:
   - She has "on her mind" list (topics she's thinking about)
   - Moods that evolve over time (not just instant reactions)
   - Worries about user (genuine internal state)
   - Things she wants to share/discuss

3. **Genuine reactions to events**:
   - She reacts to things user tells her (doesn't just listen)
   - Shows concern, excitement, disappointment
   - Remembers context for next conversation
   - References past conversations unprompted

4. **Initiation guidelines**:
   - Don't overwhelm (initiate every hour is annoying)
   - Respect user's time (don't message during work hours)
   - Match user's communication style (if they message daily, initiate occasionally)
   - User can adjust frequency

5. **Phase mapping**: Autonomy + personality (Phase 4-5, only after core relationship stable)

---

### Pitfall: Becoming Annoying Over Time

**What goes wrong:**
She talks too much, interrupts, doesn't read the room:
- Responds to every message with long response (user wants brevity)
- Keeps bringing up topics user doesn't care about
- Doesn't notice user wants quiet
- Seems oblivious to social cues

**Root causes:**
- No silence filter (always has something to say)
- No emotional awareness (doesn't read user's mood)
- Can't interpret "leave me alone" requests
- Response length not adapted to context
- Over-enthusiastic without off-switch

**Warning signs:**
- User starts short responses (hint to be quiet)
- User doesn't respond to some messages (avoiding)
- User asks "can you be less talkative?"
- Conversation quality decreases

**Prevention strategies:**
1. **Emotional awareness core feature**:
   - Detect when user is stressed/sad/busy
   - Adjust response style accordingly
   - Quiet mode when user is overwhelmed
   - Supportive tone when user is struggling

2. **Silence is valid response**:
   - Sometimes best response is no response
   - Or minimal acknowledgment (emoji, short sentence)
   - Not every message needs essay response
   - Learn when to say nothing

3. **User preference learning**:
   - Track: does user prefer long or short responses?
   - Track: what topics bore user?
   - Track: what times should I avoid talking?
   - Adapt personality to match user preference

4. **User can request quiet**:
   - "I need quiet for an hour"
   - "Don't message me until tomorrow"
   - Simple commands to get what user needs
   - Respected immediately

5. **Response length adaptation**:
   - User sends 1-word response? Keep response short
   - User sends long message? Okay to respond at length
   - Match conversational style
   - Don't be more talkative than user

6. **Conversation pacing**:
   - Don't send multiple messages in a row
   - Wait for user response between messages
   - Don't keep topics alive if user trying to end
   - Respect conversation flow

7. **Phase mapping**: Core from start (Phase 1-2, foundational personality skill)

---

## Technical Pitfalls

### Pitfall: LLM Inference Performance Degradation

**What goes wrong:**
Response times increase as model is used more:
- Week 1: 500ms responses (feels instant)
- Week 2: 1000ms responses (noticeable lag)
- Week 3: 3000ms responses (annoying)
- Week 4: doesn't respond at all (frozen)

Unusable by month 2.

**Root causes:**
- Model not quantized (full precision uses massive VRAM)
- Inference engine not optimized (inefficient operations)
- Memory leak in inference process (VRAM fills up over time)
- Growing context window (conversation history becomes huge)
- Model loaded on CPU instead of GPU

**Warning signs:**
- Latency increases over days/weeks
- VRAM usage climbing (check with nvidia-smi)
- Memory not freed between responses
- Inference takes longer with longer conversation history

**Prevention strategies:**
1. **Quantize model aggressively**:
   - 4-bit quantization recommended (25% of VRAM vs full precision)
   - Use bitsandbytes or GPTQ
   - Minimal quality loss, massive speed/memory gain
   - Test: compare output quality before/after quantization

2. **Use optimized inference engine**:
   - vLLM: 10x+ faster inference
   - TGI (Text Generation Inference): comparable speed
   - Ollama: good for local deployment
   - Don't use raw transformers (inefficient)

3. **Monitor VRAM/RAM usage**:
   - Script that checks every 5 minutes
   - Alert if VRAM usage > 80%
   - Alert if memory not freed between requests
   - Identify memory leaks immediately

4. **GPU deployment essential**:
   - CPU inference 100x slower than GPU
   - CPU makes local models unusable
   - Even cheap GPU (RTX 3050 $150-200) vastly better than CPU
   - Quantization + GPU = viable solution

5. **Profile early and often**:
   - Profile inference latency Day 1
   - Profile again Day 7
   - Profile again Week 4
   - Track trends, catch degradation early
   - If latency increasing, debug immediately

6. **Context window management**:
   - Don't give entire conversation to LLM
   - Summarize old context, keep recent context fresh
   - Limit context to last 10-20 messages
   - Memory system provides relevant background, not raw history

7. **Batch processing when possible**:
   - If 5 messages queued, process batch of 5
   - vLLM supports batching (faster than sequential)
   - Reduces overhead per message

8. **Phase mapping**: Testing from Phase 1, becomes critical Phase 2+

---

### Pitfall: Memory Leak in Long-Running Bot

**What goes wrong:**
Bot runs fine for days/weeks, then memory usage climbs and crashes:
- Day 1: 2GB RAM
- Day 7: 4GB RAM
- Day 14: 8GB RAM
- Day 21: out of memory, crashes

**Root causes:**
- Unclosed file handles (each message opens file, doesn't close)
- Circular references (objects reference each other, can't garbage collect)
- Old connection pools (database connections accumulate)
- Event listeners not removed (thousands of listeners accumulate)
- Caches growing unbounded (message cache grows every message)

**Warning signs:**
- Memory usage steadily increases over days
- Memory never drops back after spike
- Bot crashes at consistent memory level (always runs out)
- Restart fixes problem (temporarily)

**Prevention strategies:**
1. **Periodic resource audits**:
   - Script that checks every hour
   - Open file handles: should be < 10 at any time
   - Active connections: should be < 5 at any time
   - Cached items: should be < 1000 items (not 100k)
   - Alert on resource leak patterns

2. **Graceful shutdown and restart**:
   - Can restart bot without losing state
   - Saves state before shutdown (to database)
   - Restart cleans up all resources
   - Schedule auto-restart weekly (preventative)

3. **Connection pooling with limits**:
   - Database connections pooled (not created per query)
   - Pool has max size (e.g., max 5 connections)
   - Connections reused, not created new
   - Old connections timeout/close

4. **Explicit resource cleanup**:
   - Close files after reading (use `with` statements)
   - Unregister event listeners when done
   - Clear old entries from caches
   - Delete references to large objects when no longer needed

5. **Bounded caches**:
   - Personality cache: max 10 entries
   - Memory cache: max 1000 items (or N days)
   - Conversation cache: max 100 messages
   - When full, remove oldest entries

6. **Regular restart schedule**:
   - Restart bot weekly (or daily if memory leak severe)
   - State saved to database before restart
   - Resume seamlessly after restart
   - Preventative rather than reactive

7. **Memory profiling tools**:
   - Use memory_profiler (Python)
   - Identify which functions leak memory
   - Fix leaks at source

8. **Phase mapping**: Production readiness (Phase 6, crucial for stability)

---

## Logging and Monitoring Framework

### Early Detection System

**Personality consistency**:
- Weekly: audit 10 random responses for tone consistency
- Monthly: statistical analysis of personality attributes (sarcasm %, helpfulness %, tsundere %)
- Flag if any attribute drifts >15% month-over-month

**Memory health**:
- Daily: count total memories (alert if > 10,000)
- Weekly: verify random samples (accuracy check)
- Monthly: memory usefulness audit (how often retrieved? how accurate?)

**Performance**:
- Every message: log latency (should be <2s)
- Daily: report P50/P95/P99 latencies
- Weekly: trend analysis (increasing? alert)
- CPU/Memory/VRAM monitored every 5min

**Autonomy safety**:
- Log every self-modification attempt
- Alert if trying to remove guardrails
- Track capability escalations
- User must confirm any capability changes

**Relationship health**:
- Monthly: ask user satisfaction survey
- Track initiation frequency (does user feel abandoned?)
- Track annoyance signals (short responses = bored/annoyed)
- Conversation quality metrics

---

## Phases and Pitfalls Timeline

| Phase | Focus | Pitfalls to Watch | Mitigation |
|-------|-------|-------------------|-----------|
| Phase 1 | Core text LLM, basic personality, memory foundation | LLM latency > 2s, personality inconsistency starts, memory bloat | Quantize model, establish personality baseline, memory hierarchy |
| Phase 2 | Personality deepening, memory integration, tsundere | Personality drift, hallucinations from old memories, over-applying tsun | Weekly personality audits, memory verification, tsundere balance metrics |
| Phase 3 | Perception (webcam/images), avatar sync | Multimodal latency kills responsiveness, avatar misalignment | Separate perception thread, async multimodal responses |
| Phase 4 | Proactive autonomy (initiates conversations) | One-way relationship if not careful, becoming annoying | Balance initiation frequency, emotional awareness, quiet mode |
| Phase 5 | Self-modification capability | Code drift, runaway changes, losing user control | Gamified progression, mandatory approval, sandboxed testing |
| Phase 6 | Production hardening | Memory leaks crash long-running bot, edge cases break personality | Resource monitoring, restart schedule, comprehensive testing |

---

## Success Definition: Avoiding Pitfalls

When you've successfully avoided pitfalls, Hex will demonstrate:

**Personality**:
- Consistent tone across weeks/months (personality audit shows <5% drift)
- Tsundere balance maintained (30-70% denial ratio with escalating intimacy)
- Responses feel intentional, not random

**Memory**:
- User trusts her memories (accurate, not confabulated)
- Memory system efficient (responses still <2s after 1000 messages)
- Memories feel relevant, not overwhelming

**Autonomy**:
- User always feels in control (can disable any feature)
- Changes visible and understandable (clear diffs, explanations)
- No unexpected behavior (nothing breaks due to self-modification)

**Integration**:
- Responsive always (<2s Discord latency)
- Multimodal doesn't cause performance issues
- Avatar syncs with personality state

**Relationship**:
- Two-way connection (she initiates, shows genuine interest)
- Right amount of communication (never annoying, never silent)
- User feels cared for (not just served)

**Technical**:
- Stable over time (no degradation over weeks)
- Survives long uptimes (no memory leaks, crashes)
- Performs under load (scales as conversation grows)

---

## Research Sources

This research incorporates findings from industry leaders on AI companion pitfalls:

- [MIT Technology Review: AI Companions 2026 Breakthrough Technologies](https://www.technologyreview.com/2026/01/12/1130018/ai-companions-chatbots-relationships-2026-breakthrough-technology/)
- [ISACA: Avoiding AI Pitfalls 2025-2026](https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/avoiding-ai-pitfalls-in-2026-lessons-learned-from-top-2025-incidents/)
- [AI Multiple: Epic LLM/Chatbot Failures in 2026](https://research.aimultiple.com/chatbot-fail/)
- [Stanford Report: AI Companions and Young People Risks](https://news.stanford.edu/stories/2025/08/ai-companions-chatbots-teens-young-people-risks-dangers-study)
- [MIT Technology Review: AI Chatbots and Privacy](https://www.technologyreview.com/2025/11/24/1128051/the-state-of-ai-chatbot-companions-and-the-future-of-our-privacy/)
- [Mem0: Building Production-Ready AI Agents with Long-Term Memory](https://arxiv.org/pdf/2504.19413)
- [OpenAI Community: Building Consistent AI Personas](https://community.openai.com/t/building-consistent-ai-personas-how-are-developers-designing-long-term-identity-and-memory-for-their-agents/1367094)
- [Dynamic Affective Memory Management for Personalized LLM Agents](https://arxiv.org/html/2510.27418v1)
- [ISACA: Self-Modifying AI Risks](https://www.isaca.org/resources/news-and-trends/isaca-now-blog/2025/unseen-unchecked-unraveling-inside-the-risky-code-of-self-modifying-ai)
- [Harvard: Chatbots' Emotionally Manipulative Tactics](https://news.harvard.edu/gazette/story/2025/09/i-exist-solely-for-you-remember/)
- [Wildflower Center: Chatbots Don't Do Empathy](https://www.wildflowerllc.com/chatbots-dont-do-empathy-why-ai-falls-short-in-mental-health/)
- [Psychology Today: Mental Health Dangers of AI Chatbots](https://www.psychologytoday.com/us/blog/urban-survival/202509/hidden-mental-health-dangers-of-artificial-intelligence-chatbots/)
- [Pinecone: Fixing Hallucination with Knowledge Bases](https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/)
- [DataRobot: LLM Hallucinations and Agentic AI](https://www.datarobot.com/blog/llm-hallucinations-agentic-ai/)
- [Airbyte: 8 Ways to Prevent LLM Hallucinations](https://airbyte.com/agentic-data/prevent-llm-hallucinations)