docs(01): create phase plan
Some checks failed
Discord Webhook / git (push) Has been cancelled

Phase 01-model-interface: Foundation systems
- 3 plan(s) in 2 wave(s)
- 2 parallel, 1 sequential
- Ready for execution
This commit is contained in:
Mai Development
2026-01-27 10:45:52 -05:00
parent 3268f6712d
commit 1d9f19b8c2
4 changed files with 497 additions and 0 deletions

View File

@@ -0,0 +1,188 @@
---
phase: 01-model-interface
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: ["src/models/__init__.py", "src/models/lmstudio_adapter.py", "src/models/resource_monitor.py", "config/models.yaml", "requirements.txt", "pyproject.toml"]
autonomous: true
must_haves:
truths:
- "LM Studio client can connect and list available models"
- "System resources (CPU/RAM/GPU) are monitored in real-time"
- "Configuration defines models and their resource requirements"
artifacts:
- path: "src/models/lmstudio_adapter.py"
provides: "LM Studio client and model discovery"
min_lines: 50
- path: "src/models/resource_monitor.py"
provides: "System resource monitoring"
min_lines: 40
- path: "config/models.yaml"
provides: "Model definitions and resource profiles"
contains: "models:"
key_links:
- from: "src/models/lmstudio_adapter.py"
to: "LM Studio server"
via: "lmstudio-python SDK"
pattern: "import lmstudio"
- from: "src/models/resource_monitor.py"
to: "system APIs"
via: "psutil library"
pattern: "import psutil"
---
<objective>
Establish LM Studio connectivity and resource monitoring foundation.
Purpose: Create the core infrastructure for model discovery and system resource tracking, enabling intelligent model selection in later plans.
Output: Working LM Studio client, resource monitor, and model configuration system.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/01-model-interface/01-RESEARCH.md
@.planning/phases/01-model-interface/01-CONTEXT.md
@.planning/codebase/ARCHITECTURE.md
@.planning/codebase/STRUCTURE.md
@.planning/codebase/STACK.md
</context>
<tasks>
<task type="auto">
<name>Task 1: Create project foundation and dependencies</name>
<files>requirements.txt, pyproject.toml, src/models/__init__.py</files>
<action>
Create Python project structure with required dependencies:
1. Create pyproject.toml with project metadata and lmstudio, psutil, pydantic dependencies
2. Create requirements.txt as fallback for pip install
3. Create src/models/__init__.py with proper imports and version info
4. Create basic src/ directory structure if not exists
5. Set up Python package structure following PEP 518
Dependencies from research:
- lmstudio >= 1.0.1 (official LM Studio SDK)
- psutil >= 6.1.0 (system resource monitoring)
- pydantic >= 2.10 (configuration validation)
- gpu-tracker >= 5.0.1 (GPU monitoring, optional)
Follow packaging best practices with proper metadata, authors, and optional dependencies.
</action>
<verify>pip install -e . succeeds and imports work: python -c "import lmstudio, psutil, pydantic"</verify>
<done>Project structure created with all dependencies installable via pip</done>
</task>
<task type="auto">
<name>Task 2: Implement LM Studio adapter and model discovery</name>
<files>src/models/lmstudio_adapter.py</files>
<action>
Create LM Studio client following research patterns:
1. Implement LMStudioAdapter class using lmstudio-python SDK
2. Add context manager for safe client handling: get_client()
3. Implement list_available_models() using lms.list_downloaded_models()
4. Add load_model() method with error handling and fallback logic
5. Include model validation and capability detection
6. Follow Pattern 1 from research: Model Client Factory
Key methods:
- __init__: Initialize client configuration
- list_models(): Return list of (model_key, display_name, size_gb)
- load_model(model_key): Load model with timeout and error handling
- unload_model(model_key): Clean up model resources
- get_model_info(model_key): Get model metadata and context window
Use proper error handling for LM Studio not running, model loading failures, and network issues.
</action>
<verify>Unit test passes: python -c "from src.models.lmstudio_adapter import LMStudioAdapter; adapter = LMStudioAdapter(); print(len(adapter.list_models()) >= 0)"</verify>
<done>LM Studio adapter can connect and list available models, handles errors gracefully</done>
</task>
<task type="auto">
<name>Task 3: Implement system resource monitoring</name>
<files>src/models/resource_monitor.py</files>
<action>
Create ResourceMonitor class following research patterns:
1. Monitor CPU usage (psutil.cpu_percent)
2. Track available memory (psutil.virtual_memory)
3. GPU VRAM monitoring if available (gpu-tracker library)
4. Provide resource snapshot with current usage and availability
5. Add resource trend tracking for load prediction
6. Implement should_switch_model() logic based on thresholds
Key methods:
- get_current_resources(): Return dict with memory_percent, cpu_percent, available_memory_gb, gpu_vram_gb
- get_resource_trend(window_minutes=5): Return resource usage trend
- can_load_model(model_size_gb): Check if enough resources available
- is_system_overloaded(): Return True if resources exceed thresholds
Follow Pattern 2 from research: Resource-Aware Model Selection
Set sensible thresholds: 80% memory/CPU usage triggers model downgrading.
</action>
<verify>python -c "from src.models.resource_monitor import ResourceMonitor; monitor = ResourceMonitor(); print('memory' in monitor.get_current_resources())"</verify>
<done>Resource monitor provides real-time system metrics and trend analysis</done>
</task>
<task type="auto">
<name>Task 4: Create model configuration system</name>
<files>config/models.yaml</files>
<action>
Create model configuration following research architecture:
1. Define model categories by capability tier (small, medium, large)
2. Specify resource requirements for each model
3. Set context window sizes and token limits
4. Define model switching rules and fallback chains
5. Include model metadata (display names, descriptions)
Example structure:
models:
- key: "qwen/qwen3-4b-2507"
display_name: "Qwen3 4B"
category: "medium"
min_memory_gb: 4
min_vram_gb: 2
context_window: 8192
capabilities: ["chat", "reasoning"]
- key: "qwen/qwen2.5-7b-instruct"
display_name: "Qwen2.5 7B Instruct"
category: "large"
min_memory_gb: 8
min_vram_gb: 4
context_window: 32768
capabilities: ["chat", "reasoning", "analysis"]
Include fallback chains for graceful degradation when resources are constrained.
</action>
<verify>YAML validation passes: python -c "import yaml; yaml.safe_load(open('config/models.yaml'))"</verify>
<done>Model configuration defines available models with resource requirements and fallback chains</done>
</task>
</tasks>
<verification>
Verify core connectivity and monitoring:
1. LM Studio adapter can list available models
2. Resource monitor returns valid system metrics
3. Model configuration loads without errors
4. All dependencies import correctly
5. Error handling works when LM Studio is not running
</verification>
<success_criteria>
Core infrastructure ready for model management:
- LM Studio client connects and discovers models
- System resources are monitored in real-time
- Model configuration defines resource requirements
- Foundation supports intelligent model switching
</success_criteria>
<output>
After completion, create `.planning/phases/01-model-interface/01-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,126 @@
---
phase: 01-model-interface
plan: 02
type: execute
wave: 1
depends_on: []
files_modified: ["src/models/context_manager.py", "src/models/conversation.py"]
autonomous: true
must_haves:
truths:
- "Conversation history is stored and retrieved correctly"
- "Context window is managed to prevent overflow"
- "Old messages are compressed when approaching limits"
artifacts:
- path: "src/models/context_manager.py"
provides: "Conversation context and memory management"
min_lines: 60
- path: "src/models/conversation.py"
provides: "Message data structures and types"
min_lines: 30
key_links:
- from: "src/models/context_manager.py"
to: "src/models/conversation.py"
via: "import conversation types"
pattern: "from.*conversation import"
- from: "src/models/context_manager.py"
to: "future model manager"
via: "context passing interface"
pattern: "def get_context_for_model"
---
<objective>
Implement conversation context management and memory system.
Purpose: Create the foundation for managing conversation history, context windows, and memory compression before model switching logic is added.
Output: Working context manager with message storage, compression, and token budget management.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/01-model-interface/01-RESEARCH.md
@.planning/phases/01-model-interface/01-CONTEXT.md
@.planning/codebase/ARCHITECTURE.md
@.planning/codebase/STRUCTURE.md
</context>
<tasks>
<task type="auto">
<name>Task 1: Create conversation data structures</name>
<files>src/models/conversation.py</files>
<action>
Create conversation data models following research architecture:
1. Define Message class with role, content, timestamp, metadata
2. Define Conversation class to manage message sequence
3. Define ContextWindow class for token budget tracking
4. Include message importance scoring for compression decisions
5. Add Pydantic models for validation and serialization
6. Support message types: user, assistant, system, tool_call
Key classes:
- Message: role, content, timestamp, token_count, importance_score
- Conversation: messages list, metadata, total_tokens
- ContextBudget: max_tokens, used_tokens, available_tokens
- MessageMetadata: source, context, priority flags
Use dataclasses or Pydantic BaseModel for type safety and validation. Include proper type hints throughout.
</action>
<verify>python -c "from src.models.conversation import Message, Conversation; msg = Message(role='user', content='test'); print(msg.role)"</verify>
<done>Conversation data structures support message creation and management</done>
</task>
<task type="auto">
<name>Task 2: Implement context manager with compression</name>
<files>src/models/context_manager.py</files>
<action>
Create ContextManager class following research patterns:
1. Implement sliding window context management
2. Add hybrid compression: summarize old messages, preserve recent ones
3. Trigger compression at 70% of context window (from CONTEXT.md)
4. Prioritize user instructions and explicit requests during compression
5. Implement semantic importance scoring for message retention
6. Support different model context sizes (adaptive based on model)
Key methods:
- add_message(message): Add message to conversation, check compression need
- get_context_for_model(model_key): Return context within model's token limit
- compress_conversation(target_ratio): Apply hybrid compression strategy
- estimate_tokens(text): Estimate token count for text (approximate)
- get_conversation_summary(): Generate summary of compressed messages
Follow research anti-patterns: Don't ignore context window overflow, use proven compression algorithms.
</action>
<verify>python -c "from src.models.context_manager import ContextManager; cm = ContextManager(); print(cm.add_message) and hasattr(cm, 'compress_conversation')"</verify>
<done>Context manager handles conversation history with intelligent compression</done>
</task>
</tasks>
<verification>
Verify conversation management:
1. Messages can be added and retrieved from conversation
2. Context compression triggers at correct thresholds
3. Important messages are preserved during compression
4. Token estimation works reasonably well
5. Context adapts to different model window sizes
</verification>
<success_criteria>
Conversation context system operational:
- Message storage and retrieval works correctly
- Context window management prevents overflow
- Intelligent compression preserves important information
- System ready for integration with model switching
</success_criteria>
<output>
After completion, create `.planning/phases/01-model-interface/01-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,178 @@
---
phase: 01-model-interface
plan: 03
type: execute
wave: 2
depends_on: ["01-01", "01-02"]
files_modified: ["src/models/model_manager.py", "src/mai.py", "src/__main__.py"]
autonomous: true
must_haves:
truths:
- "Model can be selected and loaded based on available resources"
- "System automatically switches models when resources constrained"
- "Conversation context is preserved during model switching"
- "Basic Mai class can generate responses using the model system"
artifacts:
- path: "src/models/model_manager.py"
provides: "Intelligent model selection and switching logic"
min_lines: 80
- path: "src/mai.py"
provides: "Core Mai orchestration class"
min_lines: 40
- path: "src/__main__.py"
provides: "CLI entry point for testing"
min_lines: 20
key_links:
- from: "src/models/model_manager.py"
to: "src/models/lmstudio_adapter.py"
via: "model loading operations"
pattern: "from.*lmstudio_adapter import"
- from: "src/models/model_manager.py"
to: "src/models/resource_monitor.py"
via: "resource checks"
pattern: "from.*resource_monitor import"
- from: "src/models/model_manager.py"
to: "src/models/context_manager.py"
via: "context retrieval"
pattern: "from.*context_manager import"
- from: "src/mai.py"
to: "src/models/model_manager.py"
via: "model management"
pattern: "from.*model_manager import"
---
<objective>
Integrate all components into intelligent model switching system.
Purpose: Combine LM Studio client, resource monitoring, and context management into a cohesive system that can intelligently select and switch models based on resources and conversation needs.
Output: Working ModelManager with intelligent switching and basic Mai orchestration.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/01-model-interface/01-RESEARCH.md
@.planning/phases/01-model-interface/01-CONTEXT.md
@.planning/codebase/ARCHITECTURE.md
@.planning/codebase/STRUCTURE.md
@.planning/phases/01-model-interface/01-01-SUMMARY.md
@.planning/phases/01-model-interface/01-02-SUMMARY.md
</context>
<tasks>
<task type="auto">
<name>Task 1: Implement ModelManager with intelligent switching</name>
<files>src/models/model_manager.py</files>
<action>
Create ModelManager class that orchestrates all model operations:
1. Load model configuration from config/models.yaml
2. Implement intelligent model selection based on:
- Available system resources (from ResourceMonitor)
- Task complexity and conversation context
- Model capability tiers
3. Add dynamic model switching during conversation (from CONTEXT.md)
4. Implement fallback chains when primary model fails
5. Handle model loading/unloading with proper resource cleanup
6. Support silent switching without user notification
Key methods:
- __init__: Load config, initialize adapters and monitors
- select_best_model(conversation_context): Choose optimal model
- switch_model(target_model_key): Handle model transition
- generate_response(message, conversation): Generate response with auto-switching
- get_current_model_status(): Return current model and resource usage
- preload_model(model_key): Background model loading
Follow CONTEXT.md decisions:
- Silent switching with no user notifications
- Dynamic switching mid-task if model struggles
- Smart context transfer during switches
- Auto-retry on model failures
Use research patterns for resource-aware selection and implement graceful degradation when no model fits constraints.
</action>
<verify>python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print(hasattr(mm, 'select_best_model') and hasattr(mm, 'generate_response'))"</verify>
<done>ModelManager can intelligently select and switch models based on resources</done>
</task>
<task type="auto">
<name>Task 2: Create core Mai orchestration class</name>
<files>src/mai.py</files>
<action>
Create core Mai class following architecture patterns:
1. Initialize ModelManager, ContextManager, and other systems
2. Provide main conversation interface:
- process_message(user_input): Process message and return response
- get_conversation_history(): Retrieve conversation context
- get_system_status(): Return current model and resource status
3. Implement basic conversation flow using ModelManager
4. Add error handling and graceful degradation
5. Support both synchronous and async operation (asyncio)
6. Include basic logging of model switches and resource events
Key methods:
- __init__: Initialize all subsystems
- process_message(message): Main conversation entry point
- get_status(): Return system state for monitoring
- shutdown(): Clean up resources
Follow architecture: Mai class is main coordinator, delegates to specialized subsystems. Keep logic simple - most complexity should be in ModelManager and ContextManager.
</action>
<verify>python -c "from src.mai import Mai; mai = Mai(); print(hasattr(mai, 'process_message') and hasattr(mai, 'get_status'))"</verify>
<done>Core Mai class orchestrates conversation processing with model switching</done>
</task>
<task type="auto">
<name>Task 3: Create CLI entry point for testing</name>
<files>src/__main__.py</files>
<action>
Create CLI entry point following project structure:
1. Implement __main__.py with command-line interface
2. Add simple interactive chat loop for testing model switching
3. Include status commands to show current model and resources
4. Support basic configuration and model management commands
5. Add proper signal handling for graceful shutdown
6. Include help text and usage examples
Commands:
- chat: Interactive conversation mode
- status: Show current model and system resources
- models: List available models
- switch <model>: Manual model override for testing
Use argparse for command-line parsing. Follow standard Python package entry point patterns.
</action>
<verify>python -m mai --help shows usage information and commands</verify>
<done>CLI interface provides working chat and system monitoring commands</done>
</task>
</tasks>
<verification>
Verify integrated system:
1. ModelManager can select appropriate models based on resources
2. Conversation processing works with automatic model switching
3. CLI interface allows testing chat and monitoring
4. Context is preserved during model switches
5. System gracefully handles model loading failures
6. Resource monitoring triggers appropriate model changes
</verification>
<success_criteria>
Complete model interface system:
- Intelligent model selection based on system resources
- Seamless conversation processing with automatic switching
- Working CLI interface for testing and monitoring
- Foundation ready for integration with memory and personality systems
</success_criteria>
<output>
After completion, create `.planning/phases/01-model-interface/01-03-SUMMARY.md`
</output>