diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 47f5c0b..493ce70 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -15,6 +15,11 @@ Mai's development is organized into three major milestones, each delivering dist - Intelligently switch between models based on task and availability - Manage model context efficiently (conversation history, system prompt, token budget) +**Plans:** 3 plans in 2 waves +- [ ] 01-01-PLAN.md — LM Studio connectivity and resource monitoring foundation +- [ ] 01-02-PLAN.md — Conversation context management and memory system +- [ ] 01-03-PLAN.md — Intelligent model switching integration + ### Phase 2: Safety & Sandboxing - Implement sandbox execution environment for generated code - Multi-level security assessment (LOW/MEDIUM/HIGH/BLOCKED) diff --git a/.planning/phases/01-model-interface/01-01-PLAN.md b/.planning/phases/01-model-interface/01-01-PLAN.md new file mode 100644 index 0000000..069cdff --- /dev/null +++ b/.planning/phases/01-model-interface/01-01-PLAN.md @@ -0,0 +1,188 @@ +--- +phase: 01-model-interface +plan: 01 +type: execute +wave: 1 +depends_on: [] +files_modified: ["src/models/__init__.py", "src/models/lmstudio_adapter.py", "src/models/resource_monitor.py", "config/models.yaml", "requirements.txt", "pyproject.toml"] +autonomous: true + +must_haves: + truths: + - "LM Studio client can connect and list available models" + - "System resources (CPU/RAM/GPU) are monitored in real-time" + - "Configuration defines models and their resource requirements" + artifacts: + - path: "src/models/lmstudio_adapter.py" + provides: "LM Studio client and model discovery" + min_lines: 50 + - path: "src/models/resource_monitor.py" + provides: "System resource monitoring" + min_lines: 40 + - path: "config/models.yaml" + provides: "Model definitions and resource profiles" + contains: "models:" + key_links: + - from: "src/models/lmstudio_adapter.py" + to: "LM Studio server" + via: "lmstudio-python SDK" + pattern: "import lmstudio" + - from: "src/models/resource_monitor.py" + to: "system APIs" + via: "psutil library" + pattern: "import psutil" +--- + + +Establish LM Studio connectivity and resource monitoring foundation. + +Purpose: Create the core infrastructure for model discovery and system resource tracking, enabling intelligent model selection in later plans. +Output: Working LM Studio client, resource monitor, and model configuration system. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/phases/01-model-interface/01-RESEARCH.md +@.planning/phases/01-model-interface/01-CONTEXT.md +@.planning/codebase/ARCHITECTURE.md +@.planning/codebase/STRUCTURE.md +@.planning/codebase/STACK.md + + + + + + Task 1: Create project foundation and dependencies + requirements.txt, pyproject.toml, src/models/__init__.py + +Create Python project structure with required dependencies: +1. Create pyproject.toml with project metadata and lmstudio, psutil, pydantic dependencies +2. Create requirements.txt as fallback for pip install +3. Create src/models/__init__.py with proper imports and version info +4. Create basic src/ directory structure if not exists +5. Set up Python package structure following PEP 518 + +Dependencies from research: +- lmstudio >= 1.0.1 (official LM Studio SDK) +- psutil >= 6.1.0 (system resource monitoring) +- pydantic >= 2.10 (configuration validation) +- gpu-tracker >= 5.0.1 (GPU monitoring, optional) + +Follow packaging best practices with proper metadata, authors, and optional dependencies. + + pip install -e . succeeds and imports work: python -c "import lmstudio, psutil, pydantic" + Project structure created with all dependencies installable via pip + + + + Task 2: Implement LM Studio adapter and model discovery + src/models/lmstudio_adapter.py + +Create LM Studio client following research patterns: +1. Implement LMStudioAdapter class using lmstudio-python SDK +2. Add context manager for safe client handling: get_client() +3. Implement list_available_models() using lms.list_downloaded_models() +4. Add load_model() method with error handling and fallback logic +5. Include model validation and capability detection +6. Follow Pattern 1 from research: Model Client Factory + +Key methods: +- __init__: Initialize client configuration +- list_models(): Return list of (model_key, display_name, size_gb) +- load_model(model_key): Load model with timeout and error handling +- unload_model(model_key): Clean up model resources +- get_model_info(model_key): Get model metadata and context window + +Use proper error handling for LM Studio not running, model loading failures, and network issues. + + Unit test passes: python -c "from src.models.lmstudio_adapter import LMStudioAdapter; adapter = LMStudioAdapter(); print(len(adapter.list_models()) >= 0)" + LM Studio adapter can connect and list available models, handles errors gracefully + + + + Task 3: Implement system resource monitoring + src/models/resource_monitor.py + +Create ResourceMonitor class following research patterns: +1. Monitor CPU usage (psutil.cpu_percent) +2. Track available memory (psutil.virtual_memory) +3. GPU VRAM monitoring if available (gpu-tracker library) +4. Provide resource snapshot with current usage and availability +5. Add resource trend tracking for load prediction +6. Implement should_switch_model() logic based on thresholds + +Key methods: +- get_current_resources(): Return dict with memory_percent, cpu_percent, available_memory_gb, gpu_vram_gb +- get_resource_trend(window_minutes=5): Return resource usage trend +- can_load_model(model_size_gb): Check if enough resources available +- is_system_overloaded(): Return True if resources exceed thresholds + +Follow Pattern 2 from research: Resource-Aware Model Selection +Set sensible thresholds: 80% memory/CPU usage triggers model downgrading. + + python -c "from src.models.resource_monitor import ResourceMonitor; monitor = ResourceMonitor(); print('memory' in monitor.get_current_resources())" + Resource monitor provides real-time system metrics and trend analysis + + + + Task 4: Create model configuration system + config/models.yaml + +Create model configuration following research architecture: +1. Define model categories by capability tier (small, medium, large) +2. Specify resource requirements for each model +3. Set context window sizes and token limits +4. Define model switching rules and fallback chains +5. Include model metadata (display names, descriptions) + +Example structure: +models: + - key: "qwen/qwen3-4b-2507" + display_name: "Qwen3 4B" + category: "medium" + min_memory_gb: 4 + min_vram_gb: 2 + context_window: 8192 + capabilities: ["chat", "reasoning"] + - key: "qwen/qwen2.5-7b-instruct" + display_name: "Qwen2.5 7B Instruct" + category: "large" + min_memory_gb: 8 + min_vram_gb: 4 + context_window: 32768 + capabilities: ["chat", "reasoning", "analysis"] + +Include fallback chains for graceful degradation when resources are constrained. + + YAML validation passes: python -c "import yaml; yaml.safe_load(open('config/models.yaml'))" + Model configuration defines available models with resource requirements and fallback chains + + + + + +Verify core connectivity and monitoring: +1. LM Studio adapter can list available models +2. Resource monitor returns valid system metrics +3. Model configuration loads without errors +4. All dependencies import correctly +5. Error handling works when LM Studio is not running + + + +Core infrastructure ready for model management: +- LM Studio client connects and discovers models +- System resources are monitored in real-time +- Model configuration defines resource requirements +- Foundation supports intelligent model switching + + + +After completion, create `.planning/phases/01-model-interface/01-01-SUMMARY.md` + \ No newline at end of file diff --git a/.planning/phases/01-model-interface/01-02-PLAN.md b/.planning/phases/01-model-interface/01-02-PLAN.md new file mode 100644 index 0000000..3f4bbfb --- /dev/null +++ b/.planning/phases/01-model-interface/01-02-PLAN.md @@ -0,0 +1,126 @@ +--- +phase: 01-model-interface +plan: 02 +type: execute +wave: 1 +depends_on: [] +files_modified: ["src/models/context_manager.py", "src/models/conversation.py"] +autonomous: true + +must_haves: + truths: + - "Conversation history is stored and retrieved correctly" + - "Context window is managed to prevent overflow" + - "Old messages are compressed when approaching limits" + artifacts: + - path: "src/models/context_manager.py" + provides: "Conversation context and memory management" + min_lines: 60 + - path: "src/models/conversation.py" + provides: "Message data structures and types" + min_lines: 30 + key_links: + - from: "src/models/context_manager.py" + to: "src/models/conversation.py" + via: "import conversation types" + pattern: "from.*conversation import" + - from: "src/models/context_manager.py" + to: "future model manager" + via: "context passing interface" + pattern: "def get_context_for_model" +--- + + +Implement conversation context management and memory system. + +Purpose: Create the foundation for managing conversation history, context windows, and memory compression before model switching logic is added. +Output: Working context manager with message storage, compression, and token budget management. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/phases/01-model-interface/01-RESEARCH.md +@.planning/phases/01-model-interface/01-CONTEXT.md +@.planning/codebase/ARCHITECTURE.md +@.planning/codebase/STRUCTURE.md + + + + + + Task 1: Create conversation data structures + src/models/conversation.py + +Create conversation data models following research architecture: +1. Define Message class with role, content, timestamp, metadata +2. Define Conversation class to manage message sequence +3. Define ContextWindow class for token budget tracking +4. Include message importance scoring for compression decisions +5. Add Pydantic models for validation and serialization +6. Support message types: user, assistant, system, tool_call + +Key classes: +- Message: role, content, timestamp, token_count, importance_score +- Conversation: messages list, metadata, total_tokens +- ContextBudget: max_tokens, used_tokens, available_tokens +- MessageMetadata: source, context, priority flags + +Use dataclasses or Pydantic BaseModel for type safety and validation. Include proper type hints throughout. + + python -c "from src.models.conversation import Message, Conversation; msg = Message(role='user', content='test'); print(msg.role)" + Conversation data structures support message creation and management + + + + Task 2: Implement context manager with compression + src/models/context_manager.py + +Create ContextManager class following research patterns: +1. Implement sliding window context management +2. Add hybrid compression: summarize old messages, preserve recent ones +3. Trigger compression at 70% of context window (from CONTEXT.md) +4. Prioritize user instructions and explicit requests during compression +5. Implement semantic importance scoring for message retention +6. Support different model context sizes (adaptive based on model) + +Key methods: +- add_message(message): Add message to conversation, check compression need +- get_context_for_model(model_key): Return context within model's token limit +- compress_conversation(target_ratio): Apply hybrid compression strategy +- estimate_tokens(text): Estimate token count for text (approximate) +- get_conversation_summary(): Generate summary of compressed messages + +Follow research anti-patterns: Don't ignore context window overflow, use proven compression algorithms. + + python -c "from src.models.context_manager import ContextManager; cm = ContextManager(); print(cm.add_message) and hasattr(cm, 'compress_conversation')" + Context manager handles conversation history with intelligent compression + + + + + +Verify conversation management: +1. Messages can be added and retrieved from conversation +2. Context compression triggers at correct thresholds +3. Important messages are preserved during compression +4. Token estimation works reasonably well +5. Context adapts to different model window sizes + + + +Conversation context system operational: +- Message storage and retrieval works correctly +- Context window management prevents overflow +- Intelligent compression preserves important information +- System ready for integration with model switching + + + +After completion, create `.planning/phases/01-model-interface/01-02-SUMMARY.md` + \ No newline at end of file diff --git a/.planning/phases/01-model-interface/01-03-PLAN.md b/.planning/phases/01-model-interface/01-03-PLAN.md new file mode 100644 index 0000000..49abb8a --- /dev/null +++ b/.planning/phases/01-model-interface/01-03-PLAN.md @@ -0,0 +1,178 @@ +--- +phase: 01-model-interface +plan: 03 +type: execute +wave: 2 +depends_on: ["01-01", "01-02"] +files_modified: ["src/models/model_manager.py", "src/mai.py", "src/__main__.py"] +autonomous: true + +must_haves: + truths: + - "Model can be selected and loaded based on available resources" + - "System automatically switches models when resources constrained" + - "Conversation context is preserved during model switching" + - "Basic Mai class can generate responses using the model system" + artifacts: + - path: "src/models/model_manager.py" + provides: "Intelligent model selection and switching logic" + min_lines: 80 + - path: "src/mai.py" + provides: "Core Mai orchestration class" + min_lines: 40 + - path: "src/__main__.py" + provides: "CLI entry point for testing" + min_lines: 20 + key_links: + - from: "src/models/model_manager.py" + to: "src/models/lmstudio_adapter.py" + via: "model loading operations" + pattern: "from.*lmstudio_adapter import" + - from: "src/models/model_manager.py" + to: "src/models/resource_monitor.py" + via: "resource checks" + pattern: "from.*resource_monitor import" + - from: "src/models/model_manager.py" + to: "src/models/context_manager.py" + via: "context retrieval" + pattern: "from.*context_manager import" + - from: "src/mai.py" + to: "src/models/model_manager.py" + via: "model management" + pattern: "from.*model_manager import" +--- + + +Integrate all components into intelligent model switching system. + +Purpose: Combine LM Studio client, resource monitoring, and context management into a cohesive system that can intelligently select and switch models based on resources and conversation needs. +Output: Working ModelManager with intelligent switching and basic Mai orchestration. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/phases/01-model-interface/01-RESEARCH.md +@.planning/phases/01-model-interface/01-CONTEXT.md +@.planning/codebase/ARCHITECTURE.md +@.planning/codebase/STRUCTURE.md +@.planning/phases/01-model-interface/01-01-SUMMARY.md +@.planning/phases/01-model-interface/01-02-SUMMARY.md + + + + + + Task 1: Implement ModelManager with intelligent switching + src/models/model_manager.py + +Create ModelManager class that orchestrates all model operations: +1. Load model configuration from config/models.yaml +2. Implement intelligent model selection based on: + - Available system resources (from ResourceMonitor) + - Task complexity and conversation context + - Model capability tiers +3. Add dynamic model switching during conversation (from CONTEXT.md) +4. Implement fallback chains when primary model fails +5. Handle model loading/unloading with proper resource cleanup +6. Support silent switching without user notification + +Key methods: +- __init__: Load config, initialize adapters and monitors +- select_best_model(conversation_context): Choose optimal model +- switch_model(target_model_key): Handle model transition +- generate_response(message, conversation): Generate response with auto-switching +- get_current_model_status(): Return current model and resource usage +- preload_model(model_key): Background model loading + +Follow CONTEXT.md decisions: +- Silent switching with no user notifications +- Dynamic switching mid-task if model struggles +- Smart context transfer during switches +- Auto-retry on model failures + +Use research patterns for resource-aware selection and implement graceful degradation when no model fits constraints. + + python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print(hasattr(mm, 'select_best_model') and hasattr(mm, 'generate_response'))" + ModelManager can intelligently select and switch models based on resources + + + + Task 2: Create core Mai orchestration class + src/mai.py + +Create core Mai class following architecture patterns: +1. Initialize ModelManager, ContextManager, and other systems +2. Provide main conversation interface: + - process_message(user_input): Process message and return response + - get_conversation_history(): Retrieve conversation context + - get_system_status(): Return current model and resource status +3. Implement basic conversation flow using ModelManager +4. Add error handling and graceful degradation +5. Support both synchronous and async operation (asyncio) +6. Include basic logging of model switches and resource events + +Key methods: +- __init__: Initialize all subsystems +- process_message(message): Main conversation entry point +- get_status(): Return system state for monitoring +- shutdown(): Clean up resources + +Follow architecture: Mai class is main coordinator, delegates to specialized subsystems. Keep logic simple - most complexity should be in ModelManager and ContextManager. + + python -c "from src.mai import Mai; mai = Mai(); print(hasattr(mai, 'process_message') and hasattr(mai, 'get_status'))" + Core Mai class orchestrates conversation processing with model switching + + + + Task 3: Create CLI entry point for testing + src/__main__.py + +Create CLI entry point following project structure: +1. Implement __main__.py with command-line interface +2. Add simple interactive chat loop for testing model switching +3. Include status commands to show current model and resources +4. Support basic configuration and model management commands +5. Add proper signal handling for graceful shutdown +6. Include help text and usage examples + +Commands: +- chat: Interactive conversation mode +- status: Show current model and system resources +- models: List available models +- switch : Manual model override for testing + +Use argparse for command-line parsing. Follow standard Python package entry point patterns. + + python -m mai --help shows usage information and commands + CLI interface provides working chat and system monitoring commands + + + + + +Verify integrated system: +1. ModelManager can select appropriate models based on resources +2. Conversation processing works with automatic model switching +3. CLI interface allows testing chat and monitoring +4. Context is preserved during model switches +5. System gracefully handles model loading failures +6. Resource monitoring triggers appropriate model changes + + + +Complete model interface system: +- Intelligent model selection based on system resources +- Seamless conversation processing with automatic switching +- Working CLI interface for testing and monitoring +- Foundation ready for integration with memory and personality systems + + + +After completion, create `.planning/phases/01-model-interface/01-03-SUMMARY.md` + \ No newline at end of file