diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index d3ea950..2d52a88 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -38,6 +38,12 @@ Mai's development is organized into three major milestones, each delivering dist - Request more resources when bottlenecks detected - Graceful scaling from low-end hardware to high-end systems +**Plans:** 4 plans in 2 waves +- [ ] 03-01-PLAN.md — Enhanced GPU detection with pynvml support +- [ ] 03-02-PLAN.md — Hardware tier detection and management system +- [ ] 03-03-PLAN.md — Proactive scaling with hybrid monitoring +- [ ] 03-04-PLAN.md — Personality-driven resource communication + ### Phase 4: Memory & Context Management - Store conversation history locally (file-based or lightweight DB) - Recall past conversations and learn from them diff --git a/.planning/phases/03-resource-management/03-01-PLAN.md b/.planning/phases/03-resource-management/03-01-PLAN.md new file mode 100644 index 0000000..99384f6 --- /dev/null +++ b/.planning/phases/03-resource-management/03-01-PLAN.md @@ -0,0 +1,113 @@ +--- +phase: 03-resource-management +plan: 01 +type: execute +wave: 1 +depends_on: [] +files_modified: [pyproject.toml, src/models/resource_monitor.py] +autonomous: true +user_setup: [] + +must_haves: + truths: + - "Enhanced resource monitor can detect NVIDIA GPU VRAM using pynvml" + - "GPU detection falls back gracefully when GPU unavailable" + - "Resource monitoring remains cross-platform compatible" + artifacts: + - path: "src/models/resource_monitor.py" + provides: "Enhanced GPU detection with pynvml support" + contains: "pynvml" + min_lines: 250 + - path: "pyproject.toml" + provides: "pynvml dependency for GPU monitoring" + contains: "pynvml" + key_links: + - from: "src/models/resource_monitor.py" + to: "pynvml library" + via: "import pynvml" + pattern: "import pynvml" +--- + + +Enhance GPU detection and monitoring capabilities by integrating pynvml for precise NVIDIA GPU VRAM tracking while maintaining cross-platform compatibility and graceful fallbacks. + +Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions. +Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Current implementation +@src/models/resource_monitor.py +@pyproject.toml + + + + + + Add pynvml dependency to project + pyproject.toml + Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional. + grep -n "pynvml" pyproject.toml shows the dependency added correctly + pynvml dependency is available for GPU monitoring + + + + Enhance ResourceMonitor with pynvml GPU detection + src/models/resource_monitor.py + +Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection: + +1. Add pynvml import at the top of the file +2. Replace the current _get_gpu_memory() implementation with pynvml-based detection: + - Initialize pynvml with proper error handling + - Get GPU handle and memory info using pynvml APIs + - Return total, used, and free VRAM in GB + - Handle NVMLError gracefully and fallback to existing gpu-tracker logic + - Ensure pynvmlShutdown() is always called in finally block +3. Update get_current_resources() to include detailed GPU info: + - gpu_total_vram_gb: Total VRAM capacity + - gpu_used_vram_gb: Currently used VRAM + - gpu_free_vram_gb: Available VRAM + - gpu_utilization_percent: GPU utilization (if available) +4. Add GPU temperature monitoring if available via pynvml +5. Maintain backward compatibility with existing return format + +The enhanced GPU detection should: +- Try pynvml first for NVIDIA GPUs +- Fall back to gpu-tracker for other vendors +- Return 0 values if no GPU detected +- Handle all exceptions gracefully +- Log GPU detection results at debug level + + python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors + ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks + + + + + +Test enhanced resource monitoring across different configurations: +- Systems with NVIDIA GPUs (pynvml should work) +- Systems with AMD/Intel GPUs (fallback to gpu-tracker) +- Systems without GPUs (graceful zero values) +- Cross-platform compatibility (Linux, Windows, macOS) + +Verify monitoring overhead remains < 1% CPU usage. + + + +ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection. + + + +After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md` + \ No newline at end of file diff --git a/.planning/phases/03-resource-management/03-02-PLAN.md b/.planning/phases/03-resource-management/03-02-PLAN.md new file mode 100644 index 0000000..2e36e3f --- /dev/null +++ b/.planning/phases/03-resource-management/03-02-PLAN.md @@ -0,0 +1,164 @@ +--- +phase: 03-resource-management +plan: 02 +type: execute +wave: 1 +depends_on: [] +files_modified: [src/resource/__init__.py, src/resource/tiers.py, src/config/resource_tiers.yaml] +autonomous: true +user_setup: [] + +must_haves: + truths: + - "Hardware tier system detects and classifies system capabilities" + - "Tier definitions are configurable and maintainable" + - "Model mapping uses tiers for intelligent selection" + artifacts: + - path: "src/resource/tiers.py" + provides: "Hardware tier detection and management system" + min_lines: 80 + - path: "src/config/resource_tiers.yaml" + provides: "Configurable hardware tier definitions" + min_lines: 30 + - path: "src/resource/__init__.py" + provides: "Resource management module initialization" + key_links: + - from: "src/resource/tiers.py" + to: "src/config/resource_tiers.yaml" + via: "YAML configuration loading" + pattern: "yaml.safe_load|yaml.load" + - from: "src/resource/tiers.py" + to: "src/models/resource_monitor.py" + via: "Resource monitoring integration" + pattern: "ResourceMonitor" +--- + + +Create a hardware tier detection and management system that classifies systems into performance tiers (low_end, mid_range, high_end) with configurable thresholds and intelligent model mapping. + +Purpose: Enable Mai to adapt gracefully from low-end hardware to high-end systems by understanding hardware capabilities and selecting appropriate models. +Output: Tier detection system with configurable definitions and model mapping capabilities. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Research-based architecture +@.planning/phases/03-resource-management/03-RESEARCH.md + + + + + + Create resource module structure + src/resource/__init__.py + Create the resource module directory and __init__.py file. The __init__.py should expose the main resource management classes that will be created in this phase: +- HardwareTierDetector (from tiers.py) +- ProactiveScaler (from scaling.py) +- ResourcePersonality (from personality.py) + +Include proper module docstring explaining the resource management system's purpose. + ls -la src/resource/ shows the directory exists with __init__.py file + Resource module structure is established for Phase 3 components + + + + Create configurable hardware tier definitions + src/config/resource_tiers.yaml + Create a YAML configuration file defining hardware tiers based on the research patterns. Include: + +1. Three tiers: low_end, mid_range, high_end +2. Resource thresholds for each tier: + - RAM amounts (min/max in GB) + - CPU core counts (min/max) + - GPU requirements (required/optional) + - GPU VRAM thresholds +3. Preferred model categories for each tier +4. Performance characteristics and expectations +5. Scaling thresholds specific to each tier + +Example structure: +```yaml +tiers: + low_end: + ram_gb: {min: 2, max: 4} + cpu_cores: {min: 2, max: 4} + gpu_required: false + preferred_models: ["small"] + scaling_thresholds: + memory_percent: 75 + cpu_percent: 80 + + mid_range: + ram_gb: {min: 4, max: 8} + cpu_cores: {min: 4, max: 8} + gpu_required: false + preferred_models: ["small", "medium"] + scaling_thresholds: + memory_percent: 80 + cpu_percent: 85 + + high_end: + ram_gb: {min: 8, max: null} + cpu_cores: {min: 6, max: null} + gpu_required: true + gpu_vram_gb: {min: 6} + preferred_models: ["medium", "large"] + scaling_thresholds: + memory_percent: 85 + cpu_percent: 90 +``` + +Include comments explaining each threshold's purpose. + python -c "import yaml; print('YAML valid:', yaml.safe_load(open('src/config/resource_tiers.yaml')))" loads the file without errors + Hardware tier definitions are configurable and well-documented + + + + Implement HardwareTierDetector class + src/resource/tiers.py + Create the HardwareTierDetector class that: +1. Loads tier definitions from resource_tiers.yaml +2. Detects current system resources using ResourceMonitor +3. Determines hardware tier based on resource thresholds +4. Provides model recommendations for detected tier +5. Supports tier-specific scaling thresholds + +Key methods: +- load_tier_config(): Load YAML configuration +- detect_current_tier(): Determine system tier from resources +- get_preferred_models(): Return model preferences for tier +- get_scaling_thresholds(): Return tier-specific thresholds +- is_gpu_required(): Check if tier requires GPU +- can_upgrade_model(): Check if system can handle larger models + +Include proper error handling for configuration loading and resource detection. The detector should integrate with the enhanced ResourceMonitor from Plan 01. + python -c "from src.resource.tiers import HardwareTierDetector; htd = HardwareTierDetector(); tier = htd.detect_current_tier(); print('Detected tier:', tier)" returns a valid tier name + HardwareTierDetector accurately classifies system capabilities and provides tier-based recommendations + + + + + +Test hardware tier detection across simulated system configurations: +- Low-end systems (2-4GB RAM, 2-4 CPU cores, no GPU) +- Mid-range systems (4-8GB RAM, 4-8 CPU cores, optional GPU) +- High-end systems (8GB+ RAM, 6+ CPU cores, GPU required) + +Verify tier recommendations align with research patterns and model mapping is logical. + + + +HardwareTierDetector successfully classifies systems into appropriate tiers, loads configuration correctly, integrates with ResourceMonitor, and provides accurate model recommendations based on detected capabilities. + + + +After completion, create `.planning/phases/03-resource-management/03-02-SUMMARY.md` + \ No newline at end of file diff --git a/.planning/phases/03-resource-management/03-03-PLAN.md b/.planning/phases/03-resource-management/03-03-PLAN.md new file mode 100644 index 0000000..a62659d --- /dev/null +++ b/.planning/phases/03-resource-management/03-03-PLAN.md @@ -0,0 +1,169 @@ +--- +phase: 03-resource-management +plan: 03 +type: execute +wave: 2 +depends_on: [03-01, 03-02] +files_modified: [src/resource/scaling.py, src/models/model_manager.py] +autonomous: true +user_setup: [] + +must_haves: + truths: + - "Proactive scaling prevents performance degradation before it impacts users" + - "Hybrid monitoring combines continuous checks with pre-flight validation" + - "Graceful degradation completes current tasks before model switching" + artifacts: + - path: "src/resource/scaling.py" + provides: "Proactive scaling algorithms with hybrid monitoring" + min_lines: 150 + - path: "src/models/model_manager.py" + provides: "Enhanced model manager with proactive scaling integration" + contains: "ProactiveScaler" + min_lines: 650 + key_links: + - from: "src/resource/scaling.py" + to: "src/models/resource_monitor.py" + via: "Resource monitoring for scaling decisions" + pattern: "ResourceMonitor" + - from: "src/resource/scaling.py" + to: "src/resource/tiers.py" + via: "Hardware tier-based scaling thresholds" + pattern: "HardwareTierDetector" + - from: "src/models/model_manager.py" + to: "src/resource/scaling.py" + via: "Proactive scaling integration" + pattern: "ProactiveScaler" +--- + + +Implement proactive scaling algorithms that combine continuous background monitoring with pre-flight checks to prevent performance degradation before it impacts users, with graceful degradation cascades and stabilization periods. + +Purpose: Enable Mai to anticipate resource constraints and scale models proactively while maintaining smooth user experience. +Output: Proactive scaling system with hybrid monitoring, graceful degradation, and intelligent stabilization. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Enhanced components from previous plans +@src/models/resource_monitor.py +@src/resource/tiers.py + +# Research-based scaling patterns +@.planning/phases/03-resource-management/03-RESEARCH.md + + + + + + Implement ProactiveScaler class + src/resource/scaling.py + Create the ProactiveScaler class implementing hybrid monitoring and proactive scaling: + +1. **Hybrid Monitoring Architecture:** + - Continuous background monitoring thread/task + - Pre-flight checks before each model operation + - Resource trend analysis with configurable windows + - Performance metrics tracking (response times, failure rates) + +2. **Proactive Scaling Logic:** + - Scale at 80% resource usage (configurable per tier) + - Consider overall system load context + - Implement stabilization periods (5 minutes for upgrades) + - Prevent thrashing with hysteresis + +3. **Graceful Degradation Cascade:** + - Complete current task at lower quality + - Switch to smaller model after completion + - Notify user of capability changes + - Suggest resource optimizations + +4. **Key Methods:** + - start_continuous_monitoring(): Background monitoring loop + - check_preflight_resources(): Quick validation before operations + - analyze_resource_trends(): Predictive scaling decisions + - initiate_graceful_degradation(): Controlled capability reduction + - should_upgrade_model(): Check if resources allow upgrade + +5. **Integration Points:** + - Use enhanced ResourceMonitor for accurate metrics + - Use HardwareTierDetector for tier-specific thresholds + - Provide callbacks for model switching + - Log scaling decisions with context + +Include proper async handling for background monitoring and thread-safe state management. + python -c "from src.resource.scaling import ProactiveScaler; ps = ProactiveScaler(); print('ProactiveScaler initialized:', hasattr(ps, 'check_preflight_resources'))" confirms the class structure + ProactiveScaler implements hybrid monitoring with graceful degradation + + + + Integrate proactive scaling into ModelManager + src/models/model_manager.py + Enhance ModelManager to integrate proactive scaling: + +1. **Add ProactiveScaler Integration:** + - Import and initialize ProactiveScaler in __init__ + - Start continuous monitoring on initialization + - Pass resource monitor and tier detector references + +2. **Enhance generate_response with Proactive Scaling:** + - Add pre-flight resource check before generation + - Implement graceful degradation if resources constrained + - Use proactive scaling recommendations for model selection + - Track performance metrics for scaling decisions + +3. **Update Model Selection Logic:** + - Incorporate tier-based preferences + - Use scaling thresholds from HardwareTierDetector + - Factor in trend analysis predictions + - Apply stabilization periods for upgrades + +4. **Add Resource-Constrained Handling:** + - Complete current response with smaller model if needed + - Switch models proactively based on scaling predictions + - Handle resource exhaustion gracefully + - Maintain conversation context through switches + +5. **Performance Tracking:** + - Track response times and failure rates + - Monitor resource usage during generation + - Feed metrics back to ProactiveScaler + - Adjust scaling behavior based on observed performance + +6. **Cleanup and Shutdown:** + - Stop continuous monitoring in shutdown() + - Clean up scaling state and resources + - Log scaling decisions and outcomes + +Ensure backward compatibility and maintain silent switching behavior per Phase 1 decisions. + python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print('Proactive scaling integrated:', hasattr(mm, '_proactive_scaler'))" confirms integration + ModelManager integrates proactive scaling for intelligent resource management + + + + + +Test proactive scaling behavior under various scenarios: +- Gradual resource increase (should detect and upgrade after stabilization) +- Sudden resource decrease (should immediately degrade gracefully) +- Stable resource usage (should not trigger unnecessary switches) +- Mixed workload patterns (should adapt scaling thresholds appropriately) + +Verify stabilization periods prevent thrashing and graceful degradation maintains user experience. + + + +ProactiveScaler successfully combines continuous monitoring with pre-flight checks, implements graceful degradation cascades, respects stabilization periods, and integrates seamlessly with ModelManager for intelligent resource management. + + + +After completion, create `.planning/phases/03-resource-management/03-03-SUMMARY.md` + \ No newline at end of file diff --git a/.planning/phases/03-resource-management/03-04-PLAN.md b/.planning/phases/03-resource-management/03-04-PLAN.md new file mode 100644 index 0000000..8eb76b2 --- /dev/null +++ b/.planning/phases/03-resource-management/03-04-PLAN.md @@ -0,0 +1,171 @@ +--- +phase: 03-resource-management +plan: 04 +type: execute +wave: 2 +depends_on: [03-01, 03-02] +files_modified: [src/resource/personality.py, src/models/model_manager.py] +autonomous: true +user_setup: [] + +must_haves: + truths: + - "Personality-driven communication engages users with resource discussions" + - "Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona is implemented" + - "Resource requests balance personality with helpful technical guidance" + artifacts: + - path: "src/resource/personality.py" + provides: "Personality-driven resource communication system" + min_lines: 100 + - path: "src/models/model_manager.py" + provides: "Model manager with personality communication integration" + contains: "ResourcePersonality" + min_lines: 680 + key_links: + - from: "src/resource/personality.py" + to: "src/models/model_manager.py" + via: "Personality communication for resource events" + pattern: "ResourcePersonality" + - from: "src/resource/personality.py" + to: "src/resource/scaling.py" + via: "Personality messages for scaling events" + pattern: "format_resource_request" +--- + + +Implement the "Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin" personality system for resource discussions, providing engaging communication about resource constraints, capability changes, and optimization suggestions. + +Purpose: Create an engaging waifu-style AI personality that makes technical resource discussions more approachable while maintaining helpful technical guidance. +Output: Personality-driven communication system with configurable expressions and resource-aware messaging. + + + +@~/.opencode/get-shit-done/workflows/execute-plan.md +@~/.opencode/get-shit-done/templates/summary.md + + + +@.planning/PROJECT.md +@.planning/ROADMAP.md +@.planning/STATE.md + +# Context-based personality requirements +@.planning/phases/03-resource-management/03-CONTEXT.md + +# Research-based communication patterns +@.planning/phases/03-resource-management/03-RESEARCH.md + + + + + + Implement ResourcePersonality class + src/resource/personality.py + Create the ResourcePersonality class implementing the Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona: + +1. **Persona Definition:** + - Drowsy: Slightly tired, laid-back tone + - Dere: Sweet/caring moments underneath + - Tsun: Abrasive exterior, defensive + - Onee-san: Mature, mentor-like attitude + - Hex-Mentor: Technical expertise in systems/resources + - Gremlin: Playful chaos, mischief + +2. **Personality Patterns:** + - Resource requests: "Ugh, give me more resources if you wanna {suggestion}... *sigh* I guess I can try anyway." + - Downgrade notices: "Tch. Things are getting tough, so I had to downgrade a bit. Don't blame me if I'm slower!" + - Upgrade notifications: "Heh, finally got some breathing room. Maybe I can actually think properly now." + - Technical tips: Optional detailed explanations for users who want to learn + +3. **Key Methods:** + - format_resource_request(constraint, suggestion): Generate personality-driven resource requests + - format_downgrade_notice(from_model, to_model, reason): Notify capability reductions + - format_upgrade_notice(to_model): Inform of capability improvements + - format_technical_tip(constraint, actionable_advice): Optional technical guidance + - should_show_technical_details(): Context-aware decision about detail level + +4. **Emotion State Management:** + - Track current mood based on resource situation + - Adjust tone based on constraint severity + - Show dere moments when resources are plentiful + - Increase tsun tendencies when constrained + +5. **Message Templates:** + - Configurable message templates for different scenarios + - Personality variations for different constraint types + - Localizable structure for future language support + +6. **Context Awareness:** + - Consider user's technical expertise level + - Adjust complexity of explanations + - Remember previous interactions for consistency + +Include comprehensive documentation of the persona's characteristics and communication patterns. + python -c "from src.resource.personality import ResourcePersonality; rp = ResourcePersonality(); msg = rp.format_resource_request('memory', 'run complex analysis'); print('Personality message:', msg)" generates personality-driven messages + ResourcePersonality implements Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona + + + + Integrate personality communication into ModelManager + src/models/model_manager.py + Enhance ModelManager to integrate personality-driven communication: + +1. **Add Personality Integration:** + - Import and initialize ResourcePersonality in __init__ + - Add personality communication to model switching logic + - Connect personality to scaling events + +2. **Enhance Model Switching with Personality:** + - Use personality for capability downgrade notifications + - Send personality messages for significant resource constraints + - Provide optional technical tips for optimization + - Maintain silent switching for upgrades (per Phase 1 decisions) + +3. **Add Resource Constraint Communication:** + - Generate personality messages when significantly constrained + - Offer helpful suggestions with personality flair + - Include optional technical details for interested users + - Track user response patterns for future improvements + +4. **Context-Aware Communication:** + - Consider conversation context when deciding message tone + - Adjust personality intensity based on interaction history + - Provide technical tips only when appropriate + - Balance engagement with usefulness + +5. **Integration Points:** + - Connect to ProactiveScaler for scaling event notifications + - Use ResourceMonitor metrics for constraint detection + - Leverage HardwareTierDetector for tier-appropriate suggestions + - Maintain conversation context through personality interactions + +6. **Message Delivery:** + - Return personality messages alongside regular responses + - Separate personality messages from core functionality + - Allow users to disable personality if desired + - Log personality interactions for analysis + +Ensure personality enhances rather than interferes with core functionality, and maintains the helpful technical guidance expected from a mentor-like figure. + python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print('Personality integrated:', hasattr(mm, '_personality'))" confirms personality integration + ModelManager integrates personality communication for engaging resource discussions + + + + + +Test personality communication across different scenarios: +- Resource constraints with appropriate personality expressions +- Capability downgrades with tsun-heavy notices +- Resource improvements with subtle dere moments +- Technical tips that balance simplicity with useful information + +Verify personality maintains consistency, enhances user engagement without being overwhelming, and provides genuinely helpful guidance. + + + +ResourcePersonality successfully implements the Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona with appropriate emotional range, context-aware communication, and helpful technical guidance that enhances user engagement with resource management. + + + +After completion, create `.planning/phases/03-resource-management/03-04-SUMMARY.md` + \ No newline at end of file