Phase 05: Conversation Engine - Standard stack identified: LangGraph + MessagesState + MemorySaver - Architecture patterns documented for multi-turn conversations - Common pitfalls catalogued with async streaming and state management - Code examples provided for conversation breakdown and clarification handling
380 lines
17 KiB
Markdown
380 lines
17 KiB
Markdown
# Phase 05: Conversation Engine - Research
|
|
|
|
**Researched:** 2026-01-29
|
|
**Domain:** Conversational AI with multi-turn dialogue management
|
|
**Confidence:** HIGH
|
|
|
|
## Summary
|
|
|
|
This research focused on implementing Mai's conversational intelligence - how she handles multi-turn conversations, thinks through problems, and communicates naturally. The research revealed that **LangGraph** is the established industry standard for conversation state management, providing robust solutions for multi-turn context preservation, streaming responses, and persistence.
|
|
|
|
Key findings show that the ecosystem has matured significantly since early 2025, with LangGraph v0.5+ providing production-ready patterns for conversation state management, checkpointing, and streaming that directly align with Mai's requirements. The combination of **LangGraph's StateGraph** with **MessagesState** and **MemorySaver** provides exactly what Mai needs for multi-turn conversations, thinking transparency, and response timing.
|
|
|
|
**Primary recommendation:** Use LangGraph StateGraph with MessagesState for conversation management, MemorySaver for persistence, and async generators for streaming responses.
|
|
|
|
## Standard Stack
|
|
|
|
### Core
|
|
| Library | Version | Purpose | Why Standard |
|
|
|---------|---------|---------|--------------|
|
|
| **langgraph** | 0.5+ | Conversation state management | Industry standard for stateful agents, built-in checkpointing and streaming |
|
|
| **langchain-core** | 0.3+ | Message types and abstractions | Provides MessagesState, message role definitions |
|
|
| **asyncio** | Python 3.10+ | Async response streaming | Native Python async patterns for real-time streaming |
|
|
|
|
### Supporting
|
|
| Library | Version | Purpose | When to Use |
|
|
|---------|---------|---------|-------------|
|
|
| **pydantic** | 2.0+ | Data validation | Already in project, use for state schemas |
|
|
| **typing-extensions** | 4.0+ | TypedDict support | Required for LangGraph state definitions |
|
|
| **asyncio-mqtt** | 0.16+ | Real-time events | For future real-time features |
|
|
|
|
### Alternatives Considered
|
|
| Instead of | Could Use | Tradeoff |
|
|
|------------|-----------|----------|
|
|
| LangGraph | Custom conversation manager | LangGraph provides proven patterns, custom would be complex to maintain |
|
|
| MessagesState | Custom TypedDict | MessagesState has built-in message aggregation via `add_messages` |
|
|
| MemorySaver | SQLite custom checkpointer | MemorySaver is production-tested with built-in serialization |
|
|
|
|
**Installation:**
|
|
```bash
|
|
pip install langgraph>=0.5 langchain-core>=0.3 typing-extensions
|
|
```
|
|
|
|
## Architecture Patterns
|
|
|
|
### Recommended Project Structure
|
|
```
|
|
src/conversation/
|
|
├── __init__.py # Main ConversationEngine class
|
|
├── state.py # State schema definitions
|
|
├── nodes.py # LangGraph node functions
|
|
├── streaming.py # Async streaming utilities
|
|
├── clarity.py # Clarification handling
|
|
└── timing.py # Response timing management
|
|
```
|
|
|
|
### Pattern 1: LangGraph StateGraph with MessagesState
|
|
**What:** Use StateGraph with MessagesState for conversation state management
|
|
**When to use:** All conversation flows requiring multi-turn context
|
|
**Example:**
|
|
```python
|
|
# Source: https://python.langchain.com/docs/langgraph
|
|
from langgraph.graph import StateGraph, MessagesState, START, END
|
|
from langgraph.checkpoint.memory import MemorySaver
|
|
from typing_extensions import TypedDict
|
|
|
|
class ConversationState(MessagesState):
|
|
user_id: str
|
|
needs_clarification: bool
|
|
response_type: str # "direct", "clarifying", "breakdown"
|
|
|
|
def process_message(state: ConversationState):
|
|
# Process user message and determine response type
|
|
messages = state["messages"]
|
|
last_message = messages[-1]
|
|
|
|
# Check for ambiguity
|
|
if is_ambiguous(last_message.content):
|
|
return {
|
|
"needs_clarification": True,
|
|
"response_type": "clarifying"
|
|
}
|
|
|
|
return {"response_type": "direct"}
|
|
|
|
# Build conversation graph
|
|
builder = StateGraph(ConversationState)
|
|
builder.add_node("process", process_message)
|
|
builder.add_node("respond", generate_response)
|
|
builder.add_edge(START, "process")
|
|
builder.add_conditional_edges("process", route_response)
|
|
builder.add_edge("respond", END)
|
|
|
|
# Add memory for persistence
|
|
checkpointer = MemorySaver()
|
|
conversation_graph = builder.compile(checkpointer=checkpointer)
|
|
```
|
|
|
|
### Pattern 2: Async Streaming with Response Timing
|
|
**What:** Use async generators with variable timing for natural response flow
|
|
**When to use:** All response generation to provide natural conversational pacing
|
|
**Example:**
|
|
```python
|
|
# Source: Async streaming patterns research
|
|
import asyncio
|
|
from typing import AsyncGenerator
|
|
import time
|
|
|
|
async def stream_response_with_timing(
|
|
content: str,
|
|
response_type: str = "direct"
|
|
) -> AsyncGenerator[str, None]:
|
|
"""Stream response with natural timing based on context."""
|
|
|
|
chunks = split_into_chunks(content, chunk_size=10)
|
|
|
|
for i, chunk in enumerate(chunks):
|
|
# Variable timing based on response type and position
|
|
if response_type == "thinking":
|
|
# Longer pauses for "thinking" responses
|
|
await asyncio.sleep(0.3 + (i * 0.1))
|
|
elif response_type == "clarifying":
|
|
# Shorter, more frequent chunks for questions
|
|
await asyncio.sleep(0.1)
|
|
else:
|
|
# Normal conversational timing
|
|
await asyncio.sleep(0.2)
|
|
|
|
yield chunk
|
|
|
|
async def generate_response(state: ConversationState):
|
|
"""Generate response with appropriate timing and streaming."""
|
|
messages = state["messages"]
|
|
response_type = state.get("response_type", "direct")
|
|
|
|
# Generate response content
|
|
response_content = await llm.ainvoke(messages)
|
|
|
|
# Return streaming-ready response
|
|
return {
|
|
"messages": [{"role": "assistant", "content": response_content}],
|
|
"response_stream": stream_response_with_timing(response_content, response_type)
|
|
}
|
|
```
|
|
|
|
### Pattern 3: Clarification Detection and Handling
|
|
**What:** Proactive ambiguity detection with gentle clarification requests
|
|
**When to use:** When user input is unclear or multiple interpretations exist
|
|
**Example:**
|
|
```python
|
|
# Based on context decisions for clarification handling
|
|
from typing import List, Optional
|
|
import re
|
|
|
|
class AmbiguityDetector:
|
|
def __init__(self):
|
|
self.ambiguous_patterns = [
|
|
r"it", r"that", r"this", r"thing", # Pronouns without context
|
|
r"do that", r"make it", r"fix it" # Vague instructions
|
|
]
|
|
|
|
def detect_ambiguity(self, message: str, context: List[str]) -> Optional[str]:
|
|
"""Detect if message is ambiguous and suggest clarification."""
|
|
|
|
# Check for ambiguous pronouns
|
|
for pattern in self.ambiguous_patterns:
|
|
if re.search(pattern, message, re.IGNORECASE):
|
|
if not self.has_context_for_pronoun(message, context):
|
|
return "gentle_pronoun"
|
|
|
|
# Check for vague instructions
|
|
if self.is_vague_instruction(message):
|
|
return "gentle_specificity"
|
|
|
|
return None
|
|
|
|
def generate_clarification(self, ambiguity_type: str, original_message: str) -> str:
|
|
"""Generate gentle clarification question."""
|
|
|
|
if ambiguity_type == "gentle_pronoun":
|
|
return f"I want to make sure I understand correctly. When you say '{original_message}', could you clarify what specific thing you're referring to?"
|
|
|
|
elif ambiguity_type == "gentle_specificity":
|
|
return f"I'd love to help with that! Could you provide a bit more detail about what specifically you'd like me to do?"
|
|
|
|
return "Could you tell me a bit more about what you have in mind?"
|
|
```
|
|
|
|
### Anti-Patterns to Avoid
|
|
- **Fixed response timing:** Don't use fixed delays between chunks - timing should vary based on context
|
|
- **Explicit "thinking..." messages:** Avoid explicit status messages in favor of natural timing
|
|
- **Assumption-based responses:** Never proceed without clarification when ambiguity is detected
|
|
- **Memory-less conversations:** Every conversation node must maintain state through checkpointing
|
|
|
|
## Don't Hand-Roll
|
|
|
|
Problems that look simple but have existing solutions:
|
|
|
|
| Problem | Don't Build | Use Instead | Why |
|
|
|---------|-------------|-------------|-----|
|
|
| Conversation state management | Custom message list + manual tracking | LangGraph StateGraph + MessagesState | Built-in message aggregation, checkpointing, and serialization |
|
|
| Async streaming responses | Manual chunk generation with sleep() | LangGraph streaming + async generators | Proper async context handling, backpressure management |
|
|
| Conversation persistence | Custom SQLite schema | LangGraph checkpointer (MemorySaver/Redis) | Thread-safe state snapshots, time-travel debugging |
|
|
| Message importance scoring | Custom heuristics | LangGraph's message metadata + context | Built-in message prioritization and compression support |
|
|
|
|
**Key insight:** Building custom conversation state management is notoriously error-prone. State consistency, concurrent access, and proper serialization are solved problems in LangGraph.
|
|
|
|
## Common Pitfalls
|
|
|
|
### Pitfall 1: State Mutation Instead of Updates
|
|
**What goes wrong:** Directly modifying state objects instead of returning new state
|
|
**Why it happens:** Python's object reference model encourages mutation
|
|
**How to avoid:** Always return new state dictionaries from nodes, never modify existing state
|
|
**Warning signs:** State changes not persisting between graph invocations
|
|
|
|
```python
|
|
# WRONG - Mutates state directly
|
|
def bad_node(state):
|
|
state["messages"].append(new_message) # Mutates list
|
|
return state
|
|
|
|
# CORRECT - Returns new state
|
|
def good_node(state):
|
|
return {"messages": [new_message]} # LangGraph handles aggregation
|
|
```
|
|
|
|
### Pitfall 2: Missing Thread Configuration
|
|
**What goes wrong:** Multiple conversations sharing the same state
|
|
**Why it happens:** Forgetting to set thread_id in graph configuration
|
|
**How to avoid:** Always pass config with thread_id for each conversation
|
|
**Warning signs:** Cross-contamination between different user conversations
|
|
|
|
```python
|
|
# REQUIRED - Configure thread for each conversation
|
|
config = {"configurable": {"thread_id": conversation_id}}
|
|
result = graph.invoke({"messages": [user_message]}, config=config)
|
|
```
|
|
|
|
### Pitfall 3: Blocking Operations in Async Context
|
|
**What goes wrong:** Synchronous LLM calls blocking the event loop
|
|
**Why it happens:** Using sync LLM clients in async graph nodes
|
|
**How to avoid:** Use async LLM clients (ainvoke, astream) throughout
|
|
**Warning signs:** Poor responsiveness, CPU blocking during LLM calls
|
|
|
|
### Pitfall 4: Inadequate Error Handling in Streams
|
|
**What goes wrong:** Stream errors crashing the entire conversation
|
|
**Why it happens:** Not wrapping async generators in try-catch blocks
|
|
**How to avoid:** Use proper error handling with graceful degradation
|
|
**Warning signs:** Conversation termination on network issues
|
|
|
|
## Code Examples
|
|
|
|
### Multi-Turn Conversation with Memory
|
|
```python
|
|
# Source: https://python.langchain.com/docs/langgraph/persistence
|
|
from langgraph.graph import StateGraph, MessagesState, START, END
|
|
from langgraph.checkpoint.memory import MemorySaver
|
|
from langchain_core.messages import HumanMessage, AIMessage
|
|
|
|
class ConversationEngine:
|
|
def __init__(self):
|
|
self.checkpointer = MemorySaver()
|
|
self.graph = self._build_graph()
|
|
|
|
def _build_graph(self):
|
|
builder = StateGraph(MessagesState)
|
|
|
|
def chat_node(state: MessagesState):
|
|
# Process conversation with full context
|
|
messages = state["messages"]
|
|
response = self.llm.ainvoke(messages)
|
|
return {"messages": [response]}
|
|
|
|
builder.add_node("chat", chat_node)
|
|
builder.add_edge(START, "chat")
|
|
builder.add_edge("chat", END)
|
|
|
|
return builder.compile(checkpointer=self.checkpointer)
|
|
|
|
async def chat(self, message: str, conversation_id: str):
|
|
config = {"configurable": {"thread_id": conversation_id}}
|
|
|
|
# Add user message and get response
|
|
async for event in self.graph.astream_events(
|
|
{"messages": [HumanMessage(content=message)]},
|
|
config,
|
|
version="v1"
|
|
):
|
|
# Stream response in real-time
|
|
if event["event"] == "on_chat_model_stream":
|
|
chunk = event["data"]["chunk"]
|
|
if hasattr(chunk, "content") and chunk.content:
|
|
yield chunk.content
|
|
|
|
def get_conversation_history(self, conversation_id: str):
|
|
config = {"configurable": {"thread_id": conversation_id}}
|
|
state = self.graph.get_state(config)
|
|
return state.values.get("messages", [])
|
|
```
|
|
|
|
### Complex Request Breakdown
|
|
```python
|
|
# Based on context decisions for breaking down complex requests
|
|
class RequestBreakdown:
|
|
def analyze_complexity(self, message: str) -> tuple[bool, List[str]]:
|
|
"""Analyze if request is complex and break it down."""
|
|
|
|
complexity_indicators = [
|
|
"and then", "after that", "also", "in addition",
|
|
"finally", "first", "second", "third"
|
|
]
|
|
|
|
# Check for multi-step indicators
|
|
is_complex = any(indicator in message.lower()
|
|
for indicator in complexity_indicators)
|
|
|
|
if not is_complex:
|
|
return False, [message]
|
|
|
|
# Break down into steps
|
|
steps = self._extract_steps(message)
|
|
return True, steps
|
|
|
|
def confirm_breakdown(self, steps: List[str]) -> str:
|
|
"""Generate confirmation message for breakdown."""
|
|
steps_text = "\n".join(f"{i+1}. {step}" for i, step in enumerate(steps))
|
|
return f"I understand you want me to:\n{steps_text}\n\nShould I proceed with these steps in order?"
|
|
```
|
|
|
|
## State of the Art
|
|
|
|
| Old Approach | Current Approach | When Changed | Impact |
|
|
|--------------|------------------|--------------|--------|
|
|
| Custom conversation state | LangGraph StateGraph + MessagesState | Late 2025 (v0.3) | Dramatic reduction in conversation management bugs |
|
|
| Manual memory management | Built-in checkpointing with MemorySaver | Early 2026 (v0.5) | Thread-safe persistence with time-travel debugging |
|
|
| Fixed response streaming | Variable timing with async generators | Throughout 2025 | More natural conversation flow |
|
|
| Separate tools for streaming | Integrated streaming in LangGraph core | Late 2025 | Unified streaming and state management |
|
|
|
|
**Deprecated/outdated:**
|
|
- **LangChain Memory classes:** Deprecated in v0.3, replaced by LangGraph state management
|
|
- **Custom message aggregation:** No longer needed with MessagesState and `add_messages`
|
|
- **Manual persistence threading:** Replaced by thread_id configuration in LangGraph
|
|
- **Sync streaming patterns:** Async generators are now standard for all streaming
|
|
|
|
## Open Questions
|
|
|
|
1. **LLM Integration Timing:** Should Mai switch to smaller/faster models for clarification requests vs. complex responses? (Context suggests model switching exists, but timing algorithms are Claude's discretion)
|
|
2. **Conversation Session Limits:** What's the optimal checkpoint retention period for balance between memory usage and conversation history? (Research didn't reveal clear best practices)
|
|
3. **Real-time Collaboration:** How should concurrent access to the same conversation be handled? (Multiple users collaborating on same conversation)
|
|
|
|
Recommendation: Start with conservative defaults (1 week retention, single user per conversation) and iterate based on usage patterns.
|
|
|
|
## Sources
|
|
|
|
### Primary (HIGH confidence)
|
|
- **LangGraph Documentation** - State management, checkpointing, and streaming patterns
|
|
- **LangChain Core Messages** - Message types and MessagesState implementation
|
|
- **AsyncIO Python Documentation** - Async generator patterns and event loop management
|
|
|
|
### Secondary (MEDIUM confidence)
|
|
- **"Persistence in LangGraph — Deep, Practical Guide" (Jan 2026)** - Verified patterns with official docs
|
|
- **"Streaming APIs for Beginners" (Oct 2025)** - Async streaming patterns confirmed with Python docs
|
|
- **Multiple Medium articles on LangGraph conversation patterns** - Cross-verified with official sources
|
|
|
|
### Tertiary (LOW confidence)
|
|
- **Individual GitHub repositories** - Various conversation engine implementations (marked for validation)
|
|
|
|
## Metadata
|
|
|
|
**Confidence breakdown:**
|
|
- Standard stack: HIGH - LangGraph is clearly documented and widely adopted
|
|
- Architecture: HIGH - Official patterns are well-established and tested
|
|
- Pitfalls: HIGH - Common issues are documented in official guides with solutions
|
|
|
|
**Research date:** 2026-01-29
|
|
**Valid until:** 2026-03-01 (LangGraph ecosystem is stable, but new features may emerge)
|
|
|
|
---
|
|
|
|
*Phase: 05-conversation-engine*
|
|
*Research completed: 2026-01-29* |