Phase 01-model-interface: Foundation systems - 3 plan(s) in 2 wave(s) - 2 parallel, 1 sequential - Ready for execution
This commit is contained in:
@@ -15,6 +15,11 @@ Mai's development is organized into three major milestones, each delivering dist
|
||||
- Intelligently switch between models based on task and availability
|
||||
- Manage model context efficiently (conversation history, system prompt, token budget)
|
||||
|
||||
**Plans:** 3 plans in 2 waves
|
||||
- [ ] 01-01-PLAN.md — LM Studio connectivity and resource monitoring foundation
|
||||
- [ ] 01-02-PLAN.md — Conversation context management and memory system
|
||||
- [ ] 01-03-PLAN.md — Intelligent model switching integration
|
||||
|
||||
### Phase 2: Safety & Sandboxing
|
||||
- Implement sandbox execution environment for generated code
|
||||
- Multi-level security assessment (LOW/MEDIUM/HIGH/BLOCKED)
|
||||
|
||||
188
.planning/phases/01-model-interface/01-01-PLAN.md
Normal file
188
.planning/phases/01-model-interface/01-01-PLAN.md
Normal file
@@ -0,0 +1,188 @@
|
||||
---
|
||||
phase: 01-model-interface
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified: ["src/models/__init__.py", "src/models/lmstudio_adapter.py", "src/models/resource_monitor.py", "config/models.yaml", "requirements.txt", "pyproject.toml"]
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "LM Studio client can connect and list available models"
|
||||
- "System resources (CPU/RAM/GPU) are monitored in real-time"
|
||||
- "Configuration defines models and their resource requirements"
|
||||
artifacts:
|
||||
- path: "src/models/lmstudio_adapter.py"
|
||||
provides: "LM Studio client and model discovery"
|
||||
min_lines: 50
|
||||
- path: "src/models/resource_monitor.py"
|
||||
provides: "System resource monitoring"
|
||||
min_lines: 40
|
||||
- path: "config/models.yaml"
|
||||
provides: "Model definitions and resource profiles"
|
||||
contains: "models:"
|
||||
key_links:
|
||||
- from: "src/models/lmstudio_adapter.py"
|
||||
to: "LM Studio server"
|
||||
via: "lmstudio-python SDK"
|
||||
pattern: "import lmstudio"
|
||||
- from: "src/models/resource_monitor.py"
|
||||
to: "system APIs"
|
||||
via: "psutil library"
|
||||
pattern: "import psutil"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Establish LM Studio connectivity and resource monitoring foundation.
|
||||
|
||||
Purpose: Create the core infrastructure for model discovery and system resource tracking, enabling intelligent model selection in later plans.
|
||||
Output: Working LM Studio client, resource monitor, and model configuration system.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@~/.opencode/get-shit-done/workflows/execute-plan.md
|
||||
@~/.opencode/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/phases/01-model-interface/01-RESEARCH.md
|
||||
@.planning/phases/01-model-interface/01-CONTEXT.md
|
||||
@.planning/codebase/ARCHITECTURE.md
|
||||
@.planning/codebase/STRUCTURE.md
|
||||
@.planning/codebase/STACK.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create project foundation and dependencies</name>
|
||||
<files>requirements.txt, pyproject.toml, src/models/__init__.py</files>
|
||||
<action>
|
||||
Create Python project structure with required dependencies:
|
||||
1. Create pyproject.toml with project metadata and lmstudio, psutil, pydantic dependencies
|
||||
2. Create requirements.txt as fallback for pip install
|
||||
3. Create src/models/__init__.py with proper imports and version info
|
||||
4. Create basic src/ directory structure if not exists
|
||||
5. Set up Python package structure following PEP 518
|
||||
|
||||
Dependencies from research:
|
||||
- lmstudio >= 1.0.1 (official LM Studio SDK)
|
||||
- psutil >= 6.1.0 (system resource monitoring)
|
||||
- pydantic >= 2.10 (configuration validation)
|
||||
- gpu-tracker >= 5.0.1 (GPU monitoring, optional)
|
||||
|
||||
Follow packaging best practices with proper metadata, authors, and optional dependencies.
|
||||
</action>
|
||||
<verify>pip install -e . succeeds and imports work: python -c "import lmstudio, psutil, pydantic"</verify>
|
||||
<done>Project structure created with all dependencies installable via pip</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Implement LM Studio adapter and model discovery</name>
|
||||
<files>src/models/lmstudio_adapter.py</files>
|
||||
<action>
|
||||
Create LM Studio client following research patterns:
|
||||
1. Implement LMStudioAdapter class using lmstudio-python SDK
|
||||
2. Add context manager for safe client handling: get_client()
|
||||
3. Implement list_available_models() using lms.list_downloaded_models()
|
||||
4. Add load_model() method with error handling and fallback logic
|
||||
5. Include model validation and capability detection
|
||||
6. Follow Pattern 1 from research: Model Client Factory
|
||||
|
||||
Key methods:
|
||||
- __init__: Initialize client configuration
|
||||
- list_models(): Return list of (model_key, display_name, size_gb)
|
||||
- load_model(model_key): Load model with timeout and error handling
|
||||
- unload_model(model_key): Clean up model resources
|
||||
- get_model_info(model_key): Get model metadata and context window
|
||||
|
||||
Use proper error handling for LM Studio not running, model loading failures, and network issues.
|
||||
</action>
|
||||
<verify>Unit test passes: python -c "from src.models.lmstudio_adapter import LMStudioAdapter; adapter = LMStudioAdapter(); print(len(adapter.list_models()) >= 0)"</verify>
|
||||
<done>LM Studio adapter can connect and list available models, handles errors gracefully</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 3: Implement system resource monitoring</name>
|
||||
<files>src/models/resource_monitor.py</files>
|
||||
<action>
|
||||
Create ResourceMonitor class following research patterns:
|
||||
1. Monitor CPU usage (psutil.cpu_percent)
|
||||
2. Track available memory (psutil.virtual_memory)
|
||||
3. GPU VRAM monitoring if available (gpu-tracker library)
|
||||
4. Provide resource snapshot with current usage and availability
|
||||
5. Add resource trend tracking for load prediction
|
||||
6. Implement should_switch_model() logic based on thresholds
|
||||
|
||||
Key methods:
|
||||
- get_current_resources(): Return dict with memory_percent, cpu_percent, available_memory_gb, gpu_vram_gb
|
||||
- get_resource_trend(window_minutes=5): Return resource usage trend
|
||||
- can_load_model(model_size_gb): Check if enough resources available
|
||||
- is_system_overloaded(): Return True if resources exceed thresholds
|
||||
|
||||
Follow Pattern 2 from research: Resource-Aware Model Selection
|
||||
Set sensible thresholds: 80% memory/CPU usage triggers model downgrading.
|
||||
</action>
|
||||
<verify>python -c "from src.models.resource_monitor import ResourceMonitor; monitor = ResourceMonitor(); print('memory' in monitor.get_current_resources())"</verify>
|
||||
<done>Resource monitor provides real-time system metrics and trend analysis</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 4: Create model configuration system</name>
|
||||
<files>config/models.yaml</files>
|
||||
<action>
|
||||
Create model configuration following research architecture:
|
||||
1. Define model categories by capability tier (small, medium, large)
|
||||
2. Specify resource requirements for each model
|
||||
3. Set context window sizes and token limits
|
||||
4. Define model switching rules and fallback chains
|
||||
5. Include model metadata (display names, descriptions)
|
||||
|
||||
Example structure:
|
||||
models:
|
||||
- key: "qwen/qwen3-4b-2507"
|
||||
display_name: "Qwen3 4B"
|
||||
category: "medium"
|
||||
min_memory_gb: 4
|
||||
min_vram_gb: 2
|
||||
context_window: 8192
|
||||
capabilities: ["chat", "reasoning"]
|
||||
- key: "qwen/qwen2.5-7b-instruct"
|
||||
display_name: "Qwen2.5 7B Instruct"
|
||||
category: "large"
|
||||
min_memory_gb: 8
|
||||
min_vram_gb: 4
|
||||
context_window: 32768
|
||||
capabilities: ["chat", "reasoning", "analysis"]
|
||||
|
||||
Include fallback chains for graceful degradation when resources are constrained.
|
||||
</action>
|
||||
<verify>YAML validation passes: python -c "import yaml; yaml.safe_load(open('config/models.yaml'))"</verify>
|
||||
<done>Model configuration defines available models with resource requirements and fallback chains</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Verify core connectivity and monitoring:
|
||||
1. LM Studio adapter can list available models
|
||||
2. Resource monitor returns valid system metrics
|
||||
3. Model configuration loads without errors
|
||||
4. All dependencies import correctly
|
||||
5. Error handling works when LM Studio is not running
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
Core infrastructure ready for model management:
|
||||
- LM Studio client connects and discovers models
|
||||
- System resources are monitored in real-time
|
||||
- Model configuration defines resource requirements
|
||||
- Foundation supports intelligent model switching
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-model-interface/01-01-SUMMARY.md`
|
||||
</output>
|
||||
126
.planning/phases/01-model-interface/01-02-PLAN.md
Normal file
126
.planning/phases/01-model-interface/01-02-PLAN.md
Normal file
@@ -0,0 +1,126 @@
|
||||
---
|
||||
phase: 01-model-interface
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified: ["src/models/context_manager.py", "src/models/conversation.py"]
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Conversation history is stored and retrieved correctly"
|
||||
- "Context window is managed to prevent overflow"
|
||||
- "Old messages are compressed when approaching limits"
|
||||
artifacts:
|
||||
- path: "src/models/context_manager.py"
|
||||
provides: "Conversation context and memory management"
|
||||
min_lines: 60
|
||||
- path: "src/models/conversation.py"
|
||||
provides: "Message data structures and types"
|
||||
min_lines: 30
|
||||
key_links:
|
||||
- from: "src/models/context_manager.py"
|
||||
to: "src/models/conversation.py"
|
||||
via: "import conversation types"
|
||||
pattern: "from.*conversation import"
|
||||
- from: "src/models/context_manager.py"
|
||||
to: "future model manager"
|
||||
via: "context passing interface"
|
||||
pattern: "def get_context_for_model"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement conversation context management and memory system.
|
||||
|
||||
Purpose: Create the foundation for managing conversation history, context windows, and memory compression before model switching logic is added.
|
||||
Output: Working context manager with message storage, compression, and token budget management.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@~/.opencode/get-shit-done/workflows/execute-plan.md
|
||||
@~/.opencode/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/phases/01-model-interface/01-RESEARCH.md
|
||||
@.planning/phases/01-model-interface/01-CONTEXT.md
|
||||
@.planning/codebase/ARCHITECTURE.md
|
||||
@.planning/codebase/STRUCTURE.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create conversation data structures</name>
|
||||
<files>src/models/conversation.py</files>
|
||||
<action>
|
||||
Create conversation data models following research architecture:
|
||||
1. Define Message class with role, content, timestamp, metadata
|
||||
2. Define Conversation class to manage message sequence
|
||||
3. Define ContextWindow class for token budget tracking
|
||||
4. Include message importance scoring for compression decisions
|
||||
5. Add Pydantic models for validation and serialization
|
||||
6. Support message types: user, assistant, system, tool_call
|
||||
|
||||
Key classes:
|
||||
- Message: role, content, timestamp, token_count, importance_score
|
||||
- Conversation: messages list, metadata, total_tokens
|
||||
- ContextBudget: max_tokens, used_tokens, available_tokens
|
||||
- MessageMetadata: source, context, priority flags
|
||||
|
||||
Use dataclasses or Pydantic BaseModel for type safety and validation. Include proper type hints throughout.
|
||||
</action>
|
||||
<verify>python -c "from src.models.conversation import Message, Conversation; msg = Message(role='user', content='test'); print(msg.role)"</verify>
|
||||
<done>Conversation data structures support message creation and management</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Implement context manager with compression</name>
|
||||
<files>src/models/context_manager.py</files>
|
||||
<action>
|
||||
Create ContextManager class following research patterns:
|
||||
1. Implement sliding window context management
|
||||
2. Add hybrid compression: summarize old messages, preserve recent ones
|
||||
3. Trigger compression at 70% of context window (from CONTEXT.md)
|
||||
4. Prioritize user instructions and explicit requests during compression
|
||||
5. Implement semantic importance scoring for message retention
|
||||
6. Support different model context sizes (adaptive based on model)
|
||||
|
||||
Key methods:
|
||||
- add_message(message): Add message to conversation, check compression need
|
||||
- get_context_for_model(model_key): Return context within model's token limit
|
||||
- compress_conversation(target_ratio): Apply hybrid compression strategy
|
||||
- estimate_tokens(text): Estimate token count for text (approximate)
|
||||
- get_conversation_summary(): Generate summary of compressed messages
|
||||
|
||||
Follow research anti-patterns: Don't ignore context window overflow, use proven compression algorithms.
|
||||
</action>
|
||||
<verify>python -c "from src.models.context_manager import ContextManager; cm = ContextManager(); print(cm.add_message) and hasattr(cm, 'compress_conversation')"</verify>
|
||||
<done>Context manager handles conversation history with intelligent compression</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Verify conversation management:
|
||||
1. Messages can be added and retrieved from conversation
|
||||
2. Context compression triggers at correct thresholds
|
||||
3. Important messages are preserved during compression
|
||||
4. Token estimation works reasonably well
|
||||
5. Context adapts to different model window sizes
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
Conversation context system operational:
|
||||
- Message storage and retrieval works correctly
|
||||
- Context window management prevents overflow
|
||||
- Intelligent compression preserves important information
|
||||
- System ready for integration with model switching
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-model-interface/01-02-SUMMARY.md`
|
||||
</output>
|
||||
178
.planning/phases/01-model-interface/01-03-PLAN.md
Normal file
178
.planning/phases/01-model-interface/01-03-PLAN.md
Normal file
@@ -0,0 +1,178 @@
|
||||
---
|
||||
phase: 01-model-interface
|
||||
plan: 03
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["01-01", "01-02"]
|
||||
files_modified: ["src/models/model_manager.py", "src/mai.py", "src/__main__.py"]
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Model can be selected and loaded based on available resources"
|
||||
- "System automatically switches models when resources constrained"
|
||||
- "Conversation context is preserved during model switching"
|
||||
- "Basic Mai class can generate responses using the model system"
|
||||
artifacts:
|
||||
- path: "src/models/model_manager.py"
|
||||
provides: "Intelligent model selection and switching logic"
|
||||
min_lines: 80
|
||||
- path: "src/mai.py"
|
||||
provides: "Core Mai orchestration class"
|
||||
min_lines: 40
|
||||
- path: "src/__main__.py"
|
||||
provides: "CLI entry point for testing"
|
||||
min_lines: 20
|
||||
key_links:
|
||||
- from: "src/models/model_manager.py"
|
||||
to: "src/models/lmstudio_adapter.py"
|
||||
via: "model loading operations"
|
||||
pattern: "from.*lmstudio_adapter import"
|
||||
- from: "src/models/model_manager.py"
|
||||
to: "src/models/resource_monitor.py"
|
||||
via: "resource checks"
|
||||
pattern: "from.*resource_monitor import"
|
||||
- from: "src/models/model_manager.py"
|
||||
to: "src/models/context_manager.py"
|
||||
via: "context retrieval"
|
||||
pattern: "from.*context_manager import"
|
||||
- from: "src/mai.py"
|
||||
to: "src/models/model_manager.py"
|
||||
via: "model management"
|
||||
pattern: "from.*model_manager import"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Integrate all components into intelligent model switching system.
|
||||
|
||||
Purpose: Combine LM Studio client, resource monitoring, and context management into a cohesive system that can intelligently select and switch models based on resources and conversation needs.
|
||||
Output: Working ModelManager with intelligent switching and basic Mai orchestration.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@~/.opencode/get-shit-done/workflows/execute-plan.md
|
||||
@~/.opencode/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/phases/01-model-interface/01-RESEARCH.md
|
||||
@.planning/phases/01-model-interface/01-CONTEXT.md
|
||||
@.planning/codebase/ARCHITECTURE.md
|
||||
@.planning/codebase/STRUCTURE.md
|
||||
@.planning/phases/01-model-interface/01-01-SUMMARY.md
|
||||
@.planning/phases/01-model-interface/01-02-SUMMARY.md
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Implement ModelManager with intelligent switching</name>
|
||||
<files>src/models/model_manager.py</files>
|
||||
<action>
|
||||
Create ModelManager class that orchestrates all model operations:
|
||||
1. Load model configuration from config/models.yaml
|
||||
2. Implement intelligent model selection based on:
|
||||
- Available system resources (from ResourceMonitor)
|
||||
- Task complexity and conversation context
|
||||
- Model capability tiers
|
||||
3. Add dynamic model switching during conversation (from CONTEXT.md)
|
||||
4. Implement fallback chains when primary model fails
|
||||
5. Handle model loading/unloading with proper resource cleanup
|
||||
6. Support silent switching without user notification
|
||||
|
||||
Key methods:
|
||||
- __init__: Load config, initialize adapters and monitors
|
||||
- select_best_model(conversation_context): Choose optimal model
|
||||
- switch_model(target_model_key): Handle model transition
|
||||
- generate_response(message, conversation): Generate response with auto-switching
|
||||
- get_current_model_status(): Return current model and resource usage
|
||||
- preload_model(model_key): Background model loading
|
||||
|
||||
Follow CONTEXT.md decisions:
|
||||
- Silent switching with no user notifications
|
||||
- Dynamic switching mid-task if model struggles
|
||||
- Smart context transfer during switches
|
||||
- Auto-retry on model failures
|
||||
|
||||
Use research patterns for resource-aware selection and implement graceful degradation when no model fits constraints.
|
||||
</action>
|
||||
<verify>python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print(hasattr(mm, 'select_best_model') and hasattr(mm, 'generate_response'))"</verify>
|
||||
<done>ModelManager can intelligently select and switch models based on resources</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Create core Mai orchestration class</name>
|
||||
<files>src/mai.py</files>
|
||||
<action>
|
||||
Create core Mai class following architecture patterns:
|
||||
1. Initialize ModelManager, ContextManager, and other systems
|
||||
2. Provide main conversation interface:
|
||||
- process_message(user_input): Process message and return response
|
||||
- get_conversation_history(): Retrieve conversation context
|
||||
- get_system_status(): Return current model and resource status
|
||||
3. Implement basic conversation flow using ModelManager
|
||||
4. Add error handling and graceful degradation
|
||||
5. Support both synchronous and async operation (asyncio)
|
||||
6. Include basic logging of model switches and resource events
|
||||
|
||||
Key methods:
|
||||
- __init__: Initialize all subsystems
|
||||
- process_message(message): Main conversation entry point
|
||||
- get_status(): Return system state for monitoring
|
||||
- shutdown(): Clean up resources
|
||||
|
||||
Follow architecture: Mai class is main coordinator, delegates to specialized subsystems. Keep logic simple - most complexity should be in ModelManager and ContextManager.
|
||||
</action>
|
||||
<verify>python -c "from src.mai import Mai; mai = Mai(); print(hasattr(mai, 'process_message') and hasattr(mai, 'get_status'))"</verify>
|
||||
<done>Core Mai class orchestrates conversation processing with model switching</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 3: Create CLI entry point for testing</name>
|
||||
<files>src/__main__.py</files>
|
||||
<action>
|
||||
Create CLI entry point following project structure:
|
||||
1. Implement __main__.py with command-line interface
|
||||
2. Add simple interactive chat loop for testing model switching
|
||||
3. Include status commands to show current model and resources
|
||||
4. Support basic configuration and model management commands
|
||||
5. Add proper signal handling for graceful shutdown
|
||||
6. Include help text and usage examples
|
||||
|
||||
Commands:
|
||||
- chat: Interactive conversation mode
|
||||
- status: Show current model and system resources
|
||||
- models: List available models
|
||||
- switch <model>: Manual model override for testing
|
||||
|
||||
Use argparse for command-line parsing. Follow standard Python package entry point patterns.
|
||||
</action>
|
||||
<verify>python -m mai --help shows usage information and commands</verify>
|
||||
<done>CLI interface provides working chat and system monitoring commands</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
Verify integrated system:
|
||||
1. ModelManager can select appropriate models based on resources
|
||||
2. Conversation processing works with automatic model switching
|
||||
3. CLI interface allows testing chat and monitoring
|
||||
4. Context is preserved during model switches
|
||||
5. System gracefully handles model loading failures
|
||||
6. Resource monitoring triggers appropriate model changes
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
Complete model interface system:
|
||||
- Intelligent model selection based on system resources
|
||||
- Seamless conversation processing with automatic switching
|
||||
- Working CLI interface for testing and monitoring
|
||||
- Foundation ready for integration with memory and personality systems
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/01-model-interface/01-03-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user