Mai/.planning/phases/05-conversation-engine/05-RESEARCH.md

# Phase 05: Conversation Engine - Research

**Researched:** 2026-01-29
**Domain:** Conversational AI with multi-turn dialogue management
**Confidence:** HIGH

## Summary

This research focused on implementing Mai's conversational intelligence - how she handles multi-turn conversations, thinks through problems, and communicates naturally. The research revealed that **LangGraph** is the established industry standard for conversation state management, providing robust solutions for multi-turn context preservation, streaming responses, and persistence.

Key findings show that the ecosystem has matured significantly since early 2025, with LangGraph v0.5+ providing production-ready patterns for conversation state management, checkpointing, and streaming that directly align with Mai's requirements. The combination of **LangGraph's StateGraph** with **MessagesState** and **MemorySaver** provides exactly what Mai needs for multi-turn conversations, thinking transparency, and response timing.

**Primary recommendation:** Use LangGraph StateGraph with MessagesState for conversation management, MemorySaver for persistence, and async generators for streaming responses.

## Standard Stack

### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| **langgraph** | 0.5+ | Conversation state management | Industry standard for stateful agents, built-in checkpointing and streaming |
| **langchain-core** | 0.3+ | Message types and abstractions | Provides MessagesState, message role definitions |
| **asyncio** | Python 3.10+ | Async response streaming | Native Python async patterns for real-time streaming |

### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| **pydantic** | 2.0+ | Data validation | Already in project, use for state schemas |
| **typing-extensions** | 4.0+ | TypedDict support | Required for LangGraph state definitions |
| **asyncio-mqtt** | 0.16+ | Real-time events | For future real-time features |

### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| LangGraph | Custom conversation manager | LangGraph provides proven patterns, custom would be complex to maintain |
| MessagesState | Custom TypedDict | MessagesState has built-in message aggregation via `add_messages` |
| MemorySaver | SQLite custom checkpointer | MemorySaver is production-tested with built-in serialization |

**Installation:**
```bash
pip install langgraph>=0.5 langchain-core>=0.3 typing-extensions
```

## Architecture Patterns

### Recommended Project Structure
```
src/conversation/
├── __init__.py              # Main ConversationEngine class
├── state.py                 # State schema definitions
├── nodes.py                 # LangGraph node functions
├── streaming.py              # Async streaming utilities
├── clarity.py               # Clarification handling
└── timing.py               # Response timing management
```

### Pattern 1: LangGraph StateGraph with MessagesState
**What:** Use StateGraph with MessagesState for conversation state management
**When to use:** All conversation flows requiring multi-turn context
**Example:**
```python
# Source: https://python.langchain.com/docs/langgraph
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import TypedDict

class ConversationState(MessagesState):
    user_id: str
    needs_clarification: bool
    response_type: str  # "direct", "clarifying", "breakdown"

def process_message(state: ConversationState):
    # Process user message and determine response type
    messages = state["messages"]
    last_message = messages[-1]

    # Check for ambiguity
    if is_ambiguous(last_message.content):
        return {
            "needs_clarification": True,
            "response_type": "clarifying"
        }

    return {"response_type": "direct"}

# Build conversation graph
builder = StateGraph(ConversationState)
builder.add_node("process", process_message)
builder.add_node("respond", generate_response)
builder.add_edge(START, "process")
builder.add_conditional_edges("process", route_response)
builder.add_edge("respond", END)

# Add memory for persistence
checkpointer = MemorySaver()
conversation_graph = builder.compile(checkpointer=checkpointer)
```

### Pattern 2: Async Streaming with Response Timing
**What:** Use async generators with variable timing for natural response flow
**When to use:** All response generation to provide natural conversational pacing
**Example:**
```python
# Source: Async streaming patterns research
import asyncio
from typing import AsyncGenerator
import time

async def stream_response_with_timing(
    content: str,
    response_type: str = "direct"
) -> AsyncGenerator[str, None]:
    """Stream response with natural timing based on context."""

    chunks = split_into_chunks(content, chunk_size=10)

    for i, chunk in enumerate(chunks):
        # Variable timing based on response type and position
        if response_type == "thinking":
            # Longer pauses for "thinking" responses
            await asyncio.sleep(0.3 + (i * 0.1))
        elif response_type == "clarifying":
            # Shorter, more frequent chunks for questions
            await asyncio.sleep(0.1)
        else:
            # Normal conversational timing
            await asyncio.sleep(0.2)

        yield chunk

async def generate_response(state: ConversationState):
    """Generate response with appropriate timing and streaming."""
    messages = state["messages"]
    response_type = state.get("response_type", "direct")

    # Generate response content
    response_content = await llm.ainvoke(messages)

    # Return streaming-ready response
    return {
        "messages": [{"role": "assistant", "content": response_content}],
        "response_stream": stream_response_with_timing(response_content, response_type)
    }
```

### Pattern 3: Clarification Detection and Handling
**What:** Proactive ambiguity detection with gentle clarification requests
**When to use:** When user input is unclear or multiple interpretations exist
**Example:**
```python
# Based on context decisions for clarification handling
from typing import List, Optional
import re

class AmbiguityDetector:
    def __init__(self):
        self.ambiguous_patterns = [
            r"it", r"that", r"this", r"thing",  # Pronouns without context
            r"do that", r"make it", r"fix it"    # Vague instructions
        ]

    def detect_ambiguity(self, message: str, context: List[str]) -> Optional[str]:
        """Detect if message is ambiguous and suggest clarification."""

        # Check for ambiguous pronouns
        for pattern in self.ambiguous_patterns:
            if re.search(pattern, message, re.IGNORECASE):
                if not self.has_context_for_pronoun(message, context):
                    return "gentle_pronoun"

        # Check for vague instructions
        if self.is_vague_instruction(message):
            return "gentle_specificity"

        return None

    def generate_clarification(self, ambiguity_type: str, original_message: str) -> str:
        """Generate gentle clarification question."""

        if ambiguity_type == "gentle_pronoun":
            return f"I want to make sure I understand correctly. When you say '{original_message}', could you clarify what specific thing you're referring to?"

        elif ambiguity_type == "gentle_specificity":
            return f"I'd love to help with that! Could you provide a bit more detail about what specifically you'd like me to do?"

        return "Could you tell me a bit more about what you have in mind?"
```

### Anti-Patterns to Avoid
- **Fixed response timing:** Don't use fixed delays between chunks - timing should vary based on context
- **Explicit "thinking..." messages:** Avoid explicit status messages in favor of natural timing
- **Assumption-based responses:** Never proceed without clarification when ambiguity is detected
- **Memory-less conversations:** Every conversation node must maintain state through checkpointing

## Don't Hand-Roll

Problems that look simple but have existing solutions:

| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Conversation state management | Custom message list + manual tracking | LangGraph StateGraph + MessagesState | Built-in message aggregation, checkpointing, and serialization |
| Async streaming responses | Manual chunk generation with sleep() | LangGraph streaming + async generators | Proper async context handling, backpressure management |
| Conversation persistence | Custom SQLite schema | LangGraph checkpointer (MemorySaver/Redis) | Thread-safe state snapshots, time-travel debugging |
| Message importance scoring | Custom heuristics | LangGraph's message metadata + context | Built-in message prioritization and compression support |

**Key insight:** Building custom conversation state management is notoriously error-prone. State consistency, concurrent access, and proper serialization are solved problems in LangGraph.

## Common Pitfalls

### Pitfall 1: State Mutation Instead of Updates
**What goes wrong:** Directly modifying state objects instead of returning new state
**Why it happens:** Python's object reference model encourages mutation
**How to avoid:** Always return new state dictionaries from nodes, never modify existing state
**Warning signs:** State changes not persisting between graph invocations

```python
# WRONG - Mutates state directly
def bad_node(state):
    state["messages"].append(new_message)  # Mutates list
    return state

# CORRECT - Returns new state
def good_node(state):
    return {"messages": [new_message]}  # LangGraph handles aggregation
```

### Pitfall 2: Missing Thread Configuration
**What goes wrong:** Multiple conversations sharing the same state
**Why it happens:** Forgetting to set thread_id in graph configuration
**How to avoid:** Always pass config with thread_id for each conversation
**Warning signs:** Cross-contamination between different user conversations

```python
# REQUIRED - Configure thread for each conversation
config = {"configurable": {"thread_id": conversation_id}}
result = graph.invoke({"messages": [user_message]}, config=config)
```

### Pitfall 3: Blocking Operations in Async Context
**What goes wrong:** Synchronous LLM calls blocking the event loop
**Why it happens:** Using sync LLM clients in async graph nodes
**How to avoid:** Use async LLM clients (ainvoke, astream) throughout
**Warning signs:** Poor responsiveness, CPU blocking during LLM calls

### Pitfall 4: Inadequate Error Handling in Streams
**What goes wrong:** Stream errors crashing the entire conversation
**Why it happens:** Not wrapping async generators in try-catch blocks
**How to avoid:** Use proper error handling with graceful degradation
**Warning signs:** Conversation termination on network issues

## Code Examples

### Multi-Turn Conversation with Memory
```python
# Source: https://python.langchain.com/docs/langgraph/persistence
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_core.messages import HumanMessage, AIMessage

class ConversationEngine:
    def __init__(self):
        self.checkpointer = MemorySaver()
        self.graph = self._build_graph()

    def _build_graph(self):
        builder = StateGraph(MessagesState)

        def chat_node(state: MessagesState):
            # Process conversation with full context
            messages = state["messages"]
            response = self.llm.ainvoke(messages)
            return {"messages": [response]}

        builder.add_node("chat", chat_node)
        builder.add_edge(START, "chat")
        builder.add_edge("chat", END)

        return builder.compile(checkpointer=self.checkpointer)

    async def chat(self, message: str, conversation_id: str):
        config = {"configurable": {"thread_id": conversation_id}}

        # Add user message and get response
        async for event in self.graph.astream_events(
            {"messages": [HumanMessage(content=message)]},
            config,
            version="v1"
        ):
            # Stream response in real-time
            if event["event"] == "on_chat_model_stream":
                chunk = event["data"]["chunk"]
                if hasattr(chunk, "content") and chunk.content:
                    yield chunk.content

    def get_conversation_history(self, conversation_id: str):
        config = {"configurable": {"thread_id": conversation_id}}
        state = self.graph.get_state(config)
        return state.values.get("messages", [])
```

### Complex Request Breakdown
```python
# Based on context decisions for breaking down complex requests
class RequestBreakdown:
    def analyze_complexity(self, message: str) -> tuple[bool, List[str]]:
        """Analyze if request is complex and break it down."""

        complexity_indicators = [
            "and then", "after that", "also", "in addition",
            "finally", "first", "second", "third"
        ]

        # Check for multi-step indicators
        is_complex = any(indicator in message.lower()
                       for indicator in complexity_indicators)

        if not is_complex:
            return False, [message]

        # Break down into steps
        steps = self._extract_steps(message)
        return True, steps

    def confirm_breakdown(self, steps: List[str]) -> str:
        """Generate confirmation message for breakdown."""
        steps_text = "\n".join(f"{i+1}. {step}" for i, step in enumerate(steps))
        return f"I understand you want me to:\n{steps_text}\n\nShould I proceed with these steps in order?"
```

## State of the Art

| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Custom conversation state | LangGraph StateGraph + MessagesState | Late 2025 (v0.3) | Dramatic reduction in conversation management bugs |
| Manual memory management | Built-in checkpointing with MemorySaver | Early 2026 (v0.5) | Thread-safe persistence with time-travel debugging |
| Fixed response streaming | Variable timing with async generators | Throughout 2025 | More natural conversation flow |
| Separate tools for streaming | Integrated streaming in LangGraph core | Late 2025 | Unified streaming and state management |

**Deprecated/outdated:**
- **LangChain Memory classes:** Deprecated in v0.3, replaced by LangGraph state management
- **Custom message aggregation:** No longer needed with MessagesState and `add_messages`
- **Manual persistence threading:** Replaced by thread_id configuration in LangGraph
- **Sync streaming patterns:** Async generators are now standard for all streaming

## Open Questions

1. **LLM Integration Timing:** Should Mai switch to smaller/faster models for clarification requests vs. complex responses? (Context suggests model switching exists, but timing algorithms are Claude's discretion)
2. **Conversation Session Limits:** What's the optimal checkpoint retention period for balance between memory usage and conversation history? (Research didn't reveal clear best practices)
3. **Real-time Collaboration:** How should concurrent access to the same conversation be handled? (Multiple users collaborating on same conversation)

Recommendation: Start with conservative defaults (1 week retention, single user per conversation) and iterate based on usage patterns.

## Sources

### Primary (HIGH confidence)
- **LangGraph Documentation** - State management, checkpointing, and streaming patterns
- **LangChain Core Messages** - Message types and MessagesState implementation
- **AsyncIO Python Documentation** - Async generator patterns and event loop management

### Secondary (MEDIUM confidence)
- **"Persistence in LangGraph — Deep, Practical Guide" (Jan 2026)** - Verified patterns with official docs
- **"Streaming APIs for Beginners" (Oct 2025)** - Async streaming patterns confirmed with Python docs
- **Multiple Medium articles on LangGraph conversation patterns** - Cross-verified with official sources

### Tertiary (LOW confidence)
- **Individual GitHub repositories** - Various conversation engine implementations (marked for validation)

## Metadata

**Confidence breakdown:**
- Standard stack: HIGH - LangGraph is clearly documented and widely adopted
- Architecture: HIGH - Official patterns are well-established and tested
- Pitfalls: HIGH - Common issues are documented in official guides with solutions

**Research date:** 2026-01-29
**Valid until:** 2026-03-01 (LangGraph ecosystem is stable, but new features may emerge)

---

*Phase: 05-conversation-engine*
*Research completed: 2026-01-29*