diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
index 47f5c0b..493ce70 100644
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -15,6 +15,11 @@ Mai's development is organized into three major milestones, each delivering dist
- Intelligently switch between models based on task and availability
- Manage model context efficiently (conversation history, system prompt, token budget)
+**Plans:** 3 plans in 2 waves
+- [ ] 01-01-PLAN.md — LM Studio connectivity and resource monitoring foundation
+- [ ] 01-02-PLAN.md — Conversation context management and memory system
+- [ ] 01-03-PLAN.md — Intelligent model switching integration
+
### Phase 2: Safety & Sandboxing
- Implement sandbox execution environment for generated code
- Multi-level security assessment (LOW/MEDIUM/HIGH/BLOCKED)
diff --git a/.planning/phases/01-model-interface/01-01-PLAN.md b/.planning/phases/01-model-interface/01-01-PLAN.md
new file mode 100644
index 0000000..069cdff
--- /dev/null
+++ b/.planning/phases/01-model-interface/01-01-PLAN.md
@@ -0,0 +1,188 @@
+---
+phase: 01-model-interface
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified: ["src/models/__init__.py", "src/models/lmstudio_adapter.py", "src/models/resource_monitor.py", "config/models.yaml", "requirements.txt", "pyproject.toml"]
+autonomous: true
+
+must_haves:
+ truths:
+ - "LM Studio client can connect and list available models"
+ - "System resources (CPU/RAM/GPU) are monitored in real-time"
+ - "Configuration defines models and their resource requirements"
+ artifacts:
+ - path: "src/models/lmstudio_adapter.py"
+ provides: "LM Studio client and model discovery"
+ min_lines: 50
+ - path: "src/models/resource_monitor.py"
+ provides: "System resource monitoring"
+ min_lines: 40
+ - path: "config/models.yaml"
+ provides: "Model definitions and resource profiles"
+ contains: "models:"
+ key_links:
+ - from: "src/models/lmstudio_adapter.py"
+ to: "LM Studio server"
+ via: "lmstudio-python SDK"
+ pattern: "import lmstudio"
+ - from: "src/models/resource_monitor.py"
+ to: "system APIs"
+ via: "psutil library"
+ pattern: "import psutil"
+---
+
+
+Establish LM Studio connectivity and resource monitoring foundation.
+
+Purpose: Create the core infrastructure for model discovery and system resource tracking, enabling intelligent model selection in later plans.
+Output: Working LM Studio client, resource monitor, and model configuration system.
+
+
+
+@~/.opencode/get-shit-done/workflows/execute-plan.md
+@~/.opencode/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/phases/01-model-interface/01-RESEARCH.md
+@.planning/phases/01-model-interface/01-CONTEXT.md
+@.planning/codebase/ARCHITECTURE.md
+@.planning/codebase/STRUCTURE.md
+@.planning/codebase/STACK.md
+
+
+
+
+
+ Task 1: Create project foundation and dependencies
+ requirements.txt, pyproject.toml, src/models/__init__.py
+
+Create Python project structure with required dependencies:
+1. Create pyproject.toml with project metadata and lmstudio, psutil, pydantic dependencies
+2. Create requirements.txt as fallback for pip install
+3. Create src/models/__init__.py with proper imports and version info
+4. Create basic src/ directory structure if not exists
+5. Set up Python package structure following PEP 518
+
+Dependencies from research:
+- lmstudio >= 1.0.1 (official LM Studio SDK)
+- psutil >= 6.1.0 (system resource monitoring)
+- pydantic >= 2.10 (configuration validation)
+- gpu-tracker >= 5.0.1 (GPU monitoring, optional)
+
+Follow packaging best practices with proper metadata, authors, and optional dependencies.
+
+ pip install -e . succeeds and imports work: python -c "import lmstudio, psutil, pydantic"
+ Project structure created with all dependencies installable via pip
+
+
+
+ Task 2: Implement LM Studio adapter and model discovery
+ src/models/lmstudio_adapter.py
+
+Create LM Studio client following research patterns:
+1. Implement LMStudioAdapter class using lmstudio-python SDK
+2. Add context manager for safe client handling: get_client()
+3. Implement list_available_models() using lms.list_downloaded_models()
+4. Add load_model() method with error handling and fallback logic
+5. Include model validation and capability detection
+6. Follow Pattern 1 from research: Model Client Factory
+
+Key methods:
+- __init__: Initialize client configuration
+- list_models(): Return list of (model_key, display_name, size_gb)
+- load_model(model_key): Load model with timeout and error handling
+- unload_model(model_key): Clean up model resources
+- get_model_info(model_key): Get model metadata and context window
+
+Use proper error handling for LM Studio not running, model loading failures, and network issues.
+
+ Unit test passes: python -c "from src.models.lmstudio_adapter import LMStudioAdapter; adapter = LMStudioAdapter(); print(len(adapter.list_models()) >= 0)"
+ LM Studio adapter can connect and list available models, handles errors gracefully
+
+
+
+ Task 3: Implement system resource monitoring
+ src/models/resource_monitor.py
+
+Create ResourceMonitor class following research patterns:
+1. Monitor CPU usage (psutil.cpu_percent)
+2. Track available memory (psutil.virtual_memory)
+3. GPU VRAM monitoring if available (gpu-tracker library)
+4. Provide resource snapshot with current usage and availability
+5. Add resource trend tracking for load prediction
+6. Implement should_switch_model() logic based on thresholds
+
+Key methods:
+- get_current_resources(): Return dict with memory_percent, cpu_percent, available_memory_gb, gpu_vram_gb
+- get_resource_trend(window_minutes=5): Return resource usage trend
+- can_load_model(model_size_gb): Check if enough resources available
+- is_system_overloaded(): Return True if resources exceed thresholds
+
+Follow Pattern 2 from research: Resource-Aware Model Selection
+Set sensible thresholds: 80% memory/CPU usage triggers model downgrading.
+
+ python -c "from src.models.resource_monitor import ResourceMonitor; monitor = ResourceMonitor(); print('memory' in monitor.get_current_resources())"
+ Resource monitor provides real-time system metrics and trend analysis
+
+
+
+ Task 4: Create model configuration system
+ config/models.yaml
+
+Create model configuration following research architecture:
+1. Define model categories by capability tier (small, medium, large)
+2. Specify resource requirements for each model
+3. Set context window sizes and token limits
+4. Define model switching rules and fallback chains
+5. Include model metadata (display names, descriptions)
+
+Example structure:
+models:
+ - key: "qwen/qwen3-4b-2507"
+ display_name: "Qwen3 4B"
+ category: "medium"
+ min_memory_gb: 4
+ min_vram_gb: 2
+ context_window: 8192
+ capabilities: ["chat", "reasoning"]
+ - key: "qwen/qwen2.5-7b-instruct"
+ display_name: "Qwen2.5 7B Instruct"
+ category: "large"
+ min_memory_gb: 8
+ min_vram_gb: 4
+ context_window: 32768
+ capabilities: ["chat", "reasoning", "analysis"]
+
+Include fallback chains for graceful degradation when resources are constrained.
+
+ YAML validation passes: python -c "import yaml; yaml.safe_load(open('config/models.yaml'))"
+ Model configuration defines available models with resource requirements and fallback chains
+
+
+
+
+
+Verify core connectivity and monitoring:
+1. LM Studio adapter can list available models
+2. Resource monitor returns valid system metrics
+3. Model configuration loads without errors
+4. All dependencies import correctly
+5. Error handling works when LM Studio is not running
+
+
+
+Core infrastructure ready for model management:
+- LM Studio client connects and discovers models
+- System resources are monitored in real-time
+- Model configuration defines resource requirements
+- Foundation supports intelligent model switching
+
+
+
\ No newline at end of file
diff --git a/.planning/phases/01-model-interface/01-02-PLAN.md b/.planning/phases/01-model-interface/01-02-PLAN.md
new file mode 100644
index 0000000..3f4bbfb
--- /dev/null
+++ b/.planning/phases/01-model-interface/01-02-PLAN.md
@@ -0,0 +1,126 @@
+---
+phase: 01-model-interface
+plan: 02
+type: execute
+wave: 1
+depends_on: []
+files_modified: ["src/models/context_manager.py", "src/models/conversation.py"]
+autonomous: true
+
+must_haves:
+ truths:
+ - "Conversation history is stored and retrieved correctly"
+ - "Context window is managed to prevent overflow"
+ - "Old messages are compressed when approaching limits"
+ artifacts:
+ - path: "src/models/context_manager.py"
+ provides: "Conversation context and memory management"
+ min_lines: 60
+ - path: "src/models/conversation.py"
+ provides: "Message data structures and types"
+ min_lines: 30
+ key_links:
+ - from: "src/models/context_manager.py"
+ to: "src/models/conversation.py"
+ via: "import conversation types"
+ pattern: "from.*conversation import"
+ - from: "src/models/context_manager.py"
+ to: "future model manager"
+ via: "context passing interface"
+ pattern: "def get_context_for_model"
+---
+
+
+Implement conversation context management and memory system.
+
+Purpose: Create the foundation for managing conversation history, context windows, and memory compression before model switching logic is added.
+Output: Working context manager with message storage, compression, and token budget management.
+
+
+
+@~/.opencode/get-shit-done/workflows/execute-plan.md
+@~/.opencode/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/phases/01-model-interface/01-RESEARCH.md
+@.planning/phases/01-model-interface/01-CONTEXT.md
+@.planning/codebase/ARCHITECTURE.md
+@.planning/codebase/STRUCTURE.md
+
+
+
+
+
+ Task 1: Create conversation data structures
+ src/models/conversation.py
+
+Create conversation data models following research architecture:
+1. Define Message class with role, content, timestamp, metadata
+2. Define Conversation class to manage message sequence
+3. Define ContextWindow class for token budget tracking
+4. Include message importance scoring for compression decisions
+5. Add Pydantic models for validation and serialization
+6. Support message types: user, assistant, system, tool_call
+
+Key classes:
+- Message: role, content, timestamp, token_count, importance_score
+- Conversation: messages list, metadata, total_tokens
+- ContextBudget: max_tokens, used_tokens, available_tokens
+- MessageMetadata: source, context, priority flags
+
+Use dataclasses or Pydantic BaseModel for type safety and validation. Include proper type hints throughout.
+
+ python -c "from src.models.conversation import Message, Conversation; msg = Message(role='user', content='test'); print(msg.role)"
+ Conversation data structures support message creation and management
+
+
+
+ Task 2: Implement context manager with compression
+ src/models/context_manager.py
+
+Create ContextManager class following research patterns:
+1. Implement sliding window context management
+2. Add hybrid compression: summarize old messages, preserve recent ones
+3. Trigger compression at 70% of context window (from CONTEXT.md)
+4. Prioritize user instructions and explicit requests during compression
+5. Implement semantic importance scoring for message retention
+6. Support different model context sizes (adaptive based on model)
+
+Key methods:
+- add_message(message): Add message to conversation, check compression need
+- get_context_for_model(model_key): Return context within model's token limit
+- compress_conversation(target_ratio): Apply hybrid compression strategy
+- estimate_tokens(text): Estimate token count for text (approximate)
+- get_conversation_summary(): Generate summary of compressed messages
+
+Follow research anti-patterns: Don't ignore context window overflow, use proven compression algorithms.
+
+ python -c "from src.models.context_manager import ContextManager; cm = ContextManager(); print(cm.add_message) and hasattr(cm, 'compress_conversation')"
+ Context manager handles conversation history with intelligent compression
+
+
+
+
+
+Verify conversation management:
+1. Messages can be added and retrieved from conversation
+2. Context compression triggers at correct thresholds
+3. Important messages are preserved during compression
+4. Token estimation works reasonably well
+5. Context adapts to different model window sizes
+
+
+
+Conversation context system operational:
+- Message storage and retrieval works correctly
+- Context window management prevents overflow
+- Intelligent compression preserves important information
+- System ready for integration with model switching
+
+
+
\ No newline at end of file
diff --git a/.planning/phases/01-model-interface/01-03-PLAN.md b/.planning/phases/01-model-interface/01-03-PLAN.md
new file mode 100644
index 0000000..49abb8a
--- /dev/null
+++ b/.planning/phases/01-model-interface/01-03-PLAN.md
@@ -0,0 +1,178 @@
+---
+phase: 01-model-interface
+plan: 03
+type: execute
+wave: 2
+depends_on: ["01-01", "01-02"]
+files_modified: ["src/models/model_manager.py", "src/mai.py", "src/__main__.py"]
+autonomous: true
+
+must_haves:
+ truths:
+ - "Model can be selected and loaded based on available resources"
+ - "System automatically switches models when resources constrained"
+ - "Conversation context is preserved during model switching"
+ - "Basic Mai class can generate responses using the model system"
+ artifacts:
+ - path: "src/models/model_manager.py"
+ provides: "Intelligent model selection and switching logic"
+ min_lines: 80
+ - path: "src/mai.py"
+ provides: "Core Mai orchestration class"
+ min_lines: 40
+ - path: "src/__main__.py"
+ provides: "CLI entry point for testing"
+ min_lines: 20
+ key_links:
+ - from: "src/models/model_manager.py"
+ to: "src/models/lmstudio_adapter.py"
+ via: "model loading operations"
+ pattern: "from.*lmstudio_adapter import"
+ - from: "src/models/model_manager.py"
+ to: "src/models/resource_monitor.py"
+ via: "resource checks"
+ pattern: "from.*resource_monitor import"
+ - from: "src/models/model_manager.py"
+ to: "src/models/context_manager.py"
+ via: "context retrieval"
+ pattern: "from.*context_manager import"
+ - from: "src/mai.py"
+ to: "src/models/model_manager.py"
+ via: "model management"
+ pattern: "from.*model_manager import"
+---
+
+
+Integrate all components into intelligent model switching system.
+
+Purpose: Combine LM Studio client, resource monitoring, and context management into a cohesive system that can intelligently select and switch models based on resources and conversation needs.
+Output: Working ModelManager with intelligent switching and basic Mai orchestration.
+
+
+
+@~/.opencode/get-shit-done/workflows/execute-plan.md
+@~/.opencode/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/phases/01-model-interface/01-RESEARCH.md
+@.planning/phases/01-model-interface/01-CONTEXT.md
+@.planning/codebase/ARCHITECTURE.md
+@.planning/codebase/STRUCTURE.md
+@.planning/phases/01-model-interface/01-01-SUMMARY.md
+@.planning/phases/01-model-interface/01-02-SUMMARY.md
+
+
+
+
+
+ Task 1: Implement ModelManager with intelligent switching
+ src/models/model_manager.py
+
+Create ModelManager class that orchestrates all model operations:
+1. Load model configuration from config/models.yaml
+2. Implement intelligent model selection based on:
+ - Available system resources (from ResourceMonitor)
+ - Task complexity and conversation context
+ - Model capability tiers
+3. Add dynamic model switching during conversation (from CONTEXT.md)
+4. Implement fallback chains when primary model fails
+5. Handle model loading/unloading with proper resource cleanup
+6. Support silent switching without user notification
+
+Key methods:
+- __init__: Load config, initialize adapters and monitors
+- select_best_model(conversation_context): Choose optimal model
+- switch_model(target_model_key): Handle model transition
+- generate_response(message, conversation): Generate response with auto-switching
+- get_current_model_status(): Return current model and resource usage
+- preload_model(model_key): Background model loading
+
+Follow CONTEXT.md decisions:
+- Silent switching with no user notifications
+- Dynamic switching mid-task if model struggles
+- Smart context transfer during switches
+- Auto-retry on model failures
+
+Use research patterns for resource-aware selection and implement graceful degradation when no model fits constraints.
+
+ python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print(hasattr(mm, 'select_best_model') and hasattr(mm, 'generate_response'))"
+ ModelManager can intelligently select and switch models based on resources
+
+
+
+ Task 2: Create core Mai orchestration class
+ src/mai.py
+
+Create core Mai class following architecture patterns:
+1. Initialize ModelManager, ContextManager, and other systems
+2. Provide main conversation interface:
+ - process_message(user_input): Process message and return response
+ - get_conversation_history(): Retrieve conversation context
+ - get_system_status(): Return current model and resource status
+3. Implement basic conversation flow using ModelManager
+4. Add error handling and graceful degradation
+5. Support both synchronous and async operation (asyncio)
+6. Include basic logging of model switches and resource events
+
+Key methods:
+- __init__: Initialize all subsystems
+- process_message(message): Main conversation entry point
+- get_status(): Return system state for monitoring
+- shutdown(): Clean up resources
+
+Follow architecture: Mai class is main coordinator, delegates to specialized subsystems. Keep logic simple - most complexity should be in ModelManager and ContextManager.
+
+ python -c "from src.mai import Mai; mai = Mai(); print(hasattr(mai, 'process_message') and hasattr(mai, 'get_status'))"
+ Core Mai class orchestrates conversation processing with model switching
+
+
+
+ Task 3: Create CLI entry point for testing
+ src/__main__.py
+
+Create CLI entry point following project structure:
+1. Implement __main__.py with command-line interface
+2. Add simple interactive chat loop for testing model switching
+3. Include status commands to show current model and resources
+4. Support basic configuration and model management commands
+5. Add proper signal handling for graceful shutdown
+6. Include help text and usage examples
+
+Commands:
+- chat: Interactive conversation mode
+- status: Show current model and system resources
+- models: List available models
+- switch : Manual model override for testing
+
+Use argparse for command-line parsing. Follow standard Python package entry point patterns.
+
+ python -m mai --help shows usage information and commands
+ CLI interface provides working chat and system monitoring commands
+
+
+
+
+
+Verify integrated system:
+1. ModelManager can select appropriate models based on resources
+2. Conversation processing works with automatic model switching
+3. CLI interface allows testing chat and monitoring
+4. Context is preserved during model switches
+5. System gracefully handles model loading failures
+6. Resource monitoring triggers appropriate model changes
+
+
+
+Complete model interface system:
+- Intelligent model selection based on system resources
+- Seamless conversation processing with automatic switching
+- Working CLI interface for testing and monitoring
+- Foundation ready for integration with memory and personality systems
+
+
+
\ No newline at end of file