docs(03): create phase plan
Some checks failed
Discord Webhook / git (push) Has been cancelled

Phase 3: Resource Management
- 4 plan(s) in 2 wave(s)
- 2 parallel, 2 sequential
- Ready for execution
This commit is contained in:
Mai Development
2026-01-27 17:58:09 -05:00
parent a37b61acce
commit 1e071398ff
5 changed files with 623 additions and 0 deletions

View File

@@ -38,6 +38,12 @@ Mai's development is organized into three major milestones, each delivering dist
- Request more resources when bottlenecks detected
- Graceful scaling from low-end hardware to high-end systems
**Plans:** 4 plans in 2 waves
- [ ] 03-01-PLAN.md — Enhanced GPU detection with pynvml support
- [ ] 03-02-PLAN.md — Hardware tier detection and management system
- [ ] 03-03-PLAN.md — Proactive scaling with hybrid monitoring
- [ ] 03-04-PLAN.md — Personality-driven resource communication
### Phase 4: Memory & Context Management
- Store conversation history locally (file-based or lightweight DB)
- Recall past conversations and learn from them

View File

@@ -0,0 +1,113 @@
---
phase: 03-resource-management
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: [pyproject.toml, src/models/resource_monitor.py]
autonomous: true
user_setup: []
must_haves:
truths:
- "Enhanced resource monitor can detect NVIDIA GPU VRAM using pynvml"
- "GPU detection falls back gracefully when GPU unavailable"
- "Resource monitoring remains cross-platform compatible"
artifacts:
- path: "src/models/resource_monitor.py"
provides: "Enhanced GPU detection with pynvml support"
contains: "pynvml"
min_lines: 250
- path: "pyproject.toml"
provides: "pynvml dependency for GPU monitoring"
contains: "pynvml"
key_links:
- from: "src/models/resource_monitor.py"
to: "pynvml library"
via: "import pynvml"
pattern: "import pynvml"
---
<objective>
Enhance GPU detection and monitoring capabilities by integrating pynvml for precise NVIDIA GPU VRAM tracking while maintaining cross-platform compatibility and graceful fallbacks.
Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions.
Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Current implementation
@src/models/resource_monitor.py
@pyproject.toml
</context>
<tasks>
<task type="auto">
<name>Add pynvml dependency to project</name>
<files>pyproject.toml</files>
<action>Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional.</action>
<verify>grep -n "pynvml" pyproject.toml shows the dependency added correctly</verify>
<done>pynvml dependency is available for GPU monitoring</done>
</task>
<task type="auto">
<name>Enhance ResourceMonitor with pynvml GPU detection</name>
<files>src/models/resource_monitor.py</files>
<action>
Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection:
1. Add pynvml import at the top of the file
2. Replace the current _get_gpu_memory() implementation with pynvml-based detection:
- Initialize pynvml with proper error handling
- Get GPU handle and memory info using pynvml APIs
- Return total, used, and free VRAM in GB
- Handle NVMLError gracefully and fallback to existing gpu-tracker logic
- Ensure pynvmlShutdown() is always called in finally block
3. Update get_current_resources() to include detailed GPU info:
- gpu_total_vram_gb: Total VRAM capacity
- gpu_used_vram_gb: Currently used VRAM
- gpu_free_vram_gb: Available VRAM
- gpu_utilization_percent: GPU utilization (if available)
4. Add GPU temperature monitoring if available via pynvml
5. Maintain backward compatibility with existing return format
The enhanced GPU detection should:
- Try pynvml first for NVIDIA GPUs
- Fall back to gpu-tracker for other vendors
- Return 0 values if no GPU detected
- Handle all exceptions gracefully
- Log GPU detection results at debug level
</action>
<verify>python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors</verify>
<done>ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks</done>
</task>
</tasks>
<verification>
Test enhanced resource monitoring across different configurations:
- Systems with NVIDIA GPUs (pynvml should work)
- Systems with AMD/Intel GPUs (fallback to gpu-tracker)
- Systems without GPUs (graceful zero values)
- Cross-platform compatibility (Linux, Windows, macOS)
Verify monitoring overhead remains < 1% CPU usage.
</verification>
<success_criteria>
ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection.
</success_criteria>
<output>
After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,164 @@
---
phase: 03-resource-management
plan: 02
type: execute
wave: 1
depends_on: []
files_modified: [src/resource/__init__.py, src/resource/tiers.py, src/config/resource_tiers.yaml]
autonomous: true
user_setup: []
must_haves:
truths:
- "Hardware tier system detects and classifies system capabilities"
- "Tier definitions are configurable and maintainable"
- "Model mapping uses tiers for intelligent selection"
artifacts:
- path: "src/resource/tiers.py"
provides: "Hardware tier detection and management system"
min_lines: 80
- path: "src/config/resource_tiers.yaml"
provides: "Configurable hardware tier definitions"
min_lines: 30
- path: "src/resource/__init__.py"
provides: "Resource management module initialization"
key_links:
- from: "src/resource/tiers.py"
to: "src/config/resource_tiers.yaml"
via: "YAML configuration loading"
pattern: "yaml.safe_load|yaml.load"
- from: "src/resource/tiers.py"
to: "src/models/resource_monitor.py"
via: "Resource monitoring integration"
pattern: "ResourceMonitor"
---
<objective>
Create a hardware tier detection and management system that classifies systems into performance tiers (low_end, mid_range, high_end) with configurable thresholds and intelligent model mapping.
Purpose: Enable Mai to adapt gracefully from low-end hardware to high-end systems by understanding hardware capabilities and selecting appropriate models.
Output: Tier detection system with configurable definitions and model mapping capabilities.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Research-based architecture
@.planning/phases/03-resource-management/03-RESEARCH.md
</context>
<tasks>
<task type="auto">
<name>Create resource module structure</name>
<files>src/resource/__init__.py</files>
<action>Create the resource module directory and __init__.py file. The __init__.py should expose the main resource management classes that will be created in this phase:
- HardwareTierDetector (from tiers.py)
- ProactiveScaler (from scaling.py)
- ResourcePersonality (from personality.py)
Include proper module docstring explaining the resource management system's purpose.</action>
<verify>ls -la src/resource/ shows the directory exists with __init__.py file</verify>
<done>Resource module structure is established for Phase 3 components</done>
</task>
<task type="auto">
<name>Create configurable hardware tier definitions</name>
<files>src/config/resource_tiers.yaml</files>
<action>Create a YAML configuration file defining hardware tiers based on the research patterns. Include:
1. Three tiers: low_end, mid_range, high_end
2. Resource thresholds for each tier:
- RAM amounts (min/max in GB)
- CPU core counts (min/max)
- GPU requirements (required/optional)
- GPU VRAM thresholds
3. Preferred model categories for each tier
4. Performance characteristics and expectations
5. Scaling thresholds specific to each tier
Example structure:
```yaml
tiers:
low_end:
ram_gb: {min: 2, max: 4}
cpu_cores: {min: 2, max: 4}
gpu_required: false
preferred_models: ["small"]
scaling_thresholds:
memory_percent: 75
cpu_percent: 80
mid_range:
ram_gb: {min: 4, max: 8}
cpu_cores: {min: 4, max: 8}
gpu_required: false
preferred_models: ["small", "medium"]
scaling_thresholds:
memory_percent: 80
cpu_percent: 85
high_end:
ram_gb: {min: 8, max: null}
cpu_cores: {min: 6, max: null}
gpu_required: true
gpu_vram_gb: {min: 6}
preferred_models: ["medium", "large"]
scaling_thresholds:
memory_percent: 85
cpu_percent: 90
```
Include comments explaining each threshold's purpose.</action>
<verify>python -c "import yaml; print('YAML valid:', yaml.safe_load(open('src/config/resource_tiers.yaml')))" loads the file without errors</verify>
<done>Hardware tier definitions are configurable and well-documented</done>
</task>
<task type="auto">
<name>Implement HardwareTierDetector class</name>
<files>src/resource/tiers.py</files>
<action>Create the HardwareTierDetector class that:
1. Loads tier definitions from resource_tiers.yaml
2. Detects current system resources using ResourceMonitor
3. Determines hardware tier based on resource thresholds
4. Provides model recommendations for detected tier
5. Supports tier-specific scaling thresholds
Key methods:
- load_tier_config(): Load YAML configuration
- detect_current_tier(): Determine system tier from resources
- get_preferred_models(): Return model preferences for tier
- get_scaling_thresholds(): Return tier-specific thresholds
- is_gpu_required(): Check if tier requires GPU
- can_upgrade_model(): Check if system can handle larger models
Include proper error handling for configuration loading and resource detection. The detector should integrate with the enhanced ResourceMonitor from Plan 01.</action>
<verify>python -c "from src.resource.tiers import HardwareTierDetector; htd = HardwareTierDetector(); tier = htd.detect_current_tier(); print('Detected tier:', tier)" returns a valid tier name</verify>
<done>HardwareTierDetector accurately classifies system capabilities and provides tier-based recommendations</done>
</task>
</tasks>
<verification>
Test hardware tier detection across simulated system configurations:
- Low-end systems (2-4GB RAM, 2-4 CPU cores, no GPU)
- Mid-range systems (4-8GB RAM, 4-8 CPU cores, optional GPU)
- High-end systems (8GB+ RAM, 6+ CPU cores, GPU required)
Verify tier recommendations align with research patterns and model mapping is logical.
</verification>
<success_criteria>
HardwareTierDetector successfully classifies systems into appropriate tiers, loads configuration correctly, integrates with ResourceMonitor, and provides accurate model recommendations based on detected capabilities.
</success_criteria>
<output>
After completion, create `.planning/phases/03-resource-management/03-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,169 @@
---
phase: 03-resource-management
plan: 03
type: execute
wave: 2
depends_on: [03-01, 03-02]
files_modified: [src/resource/scaling.py, src/models/model_manager.py]
autonomous: true
user_setup: []
must_haves:
truths:
- "Proactive scaling prevents performance degradation before it impacts users"
- "Hybrid monitoring combines continuous checks with pre-flight validation"
- "Graceful degradation completes current tasks before model switching"
artifacts:
- path: "src/resource/scaling.py"
provides: "Proactive scaling algorithms with hybrid monitoring"
min_lines: 150
- path: "src/models/model_manager.py"
provides: "Enhanced model manager with proactive scaling integration"
contains: "ProactiveScaler"
min_lines: 650
key_links:
- from: "src/resource/scaling.py"
to: "src/models/resource_monitor.py"
via: "Resource monitoring for scaling decisions"
pattern: "ResourceMonitor"
- from: "src/resource/scaling.py"
to: "src/resource/tiers.py"
via: "Hardware tier-based scaling thresholds"
pattern: "HardwareTierDetector"
- from: "src/models/model_manager.py"
to: "src/resource/scaling.py"
via: "Proactive scaling integration"
pattern: "ProactiveScaler"
---
<objective>
Implement proactive scaling algorithms that combine continuous background monitoring with pre-flight checks to prevent performance degradation before it impacts users, with graceful degradation cascades and stabilization periods.
Purpose: Enable Mai to anticipate resource constraints and scale models proactively while maintaining smooth user experience.
Output: Proactive scaling system with hybrid monitoring, graceful degradation, and intelligent stabilization.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Enhanced components from previous plans
@src/models/resource_monitor.py
@src/resource/tiers.py
# Research-based scaling patterns
@.planning/phases/03-resource-management/03-RESEARCH.md
</context>
<tasks>
<task type="auto">
<name>Implement ProactiveScaler class</name>
<files>src/resource/scaling.py</files>
<action>Create the ProactiveScaler class implementing hybrid monitoring and proactive scaling:
1. **Hybrid Monitoring Architecture:**
- Continuous background monitoring thread/task
- Pre-flight checks before each model operation
- Resource trend analysis with configurable windows
- Performance metrics tracking (response times, failure rates)
2. **Proactive Scaling Logic:**
- Scale at 80% resource usage (configurable per tier)
- Consider overall system load context
- Implement stabilization periods (5 minutes for upgrades)
- Prevent thrashing with hysteresis
3. **Graceful Degradation Cascade:**
- Complete current task at lower quality
- Switch to smaller model after completion
- Notify user of capability changes
- Suggest resource optimizations
4. **Key Methods:**
- start_continuous_monitoring(): Background monitoring loop
- check_preflight_resources(): Quick validation before operations
- analyze_resource_trends(): Predictive scaling decisions
- initiate_graceful_degradation(): Controlled capability reduction
- should_upgrade_model(): Check if resources allow upgrade
5. **Integration Points:**
- Use enhanced ResourceMonitor for accurate metrics
- Use HardwareTierDetector for tier-specific thresholds
- Provide callbacks for model switching
- Log scaling decisions with context
Include proper async handling for background monitoring and thread-safe state management.</action>
<verify>python -c "from src.resource.scaling import ProactiveScaler; ps = ProactiveScaler(); print('ProactiveScaler initialized:', hasattr(ps, 'check_preflight_resources'))" confirms the class structure</verify>
<done>ProactiveScaler implements hybrid monitoring with graceful degradation</done>
</task>
<task type="auto">
<name>Integrate proactive scaling into ModelManager</name>
<files>src/models/model_manager.py</files>
<action>Enhance ModelManager to integrate proactive scaling:
1. **Add ProactiveScaler Integration:**
- Import and initialize ProactiveScaler in __init__
- Start continuous monitoring on initialization
- Pass resource monitor and tier detector references
2. **Enhance generate_response with Proactive Scaling:**
- Add pre-flight resource check before generation
- Implement graceful degradation if resources constrained
- Use proactive scaling recommendations for model selection
- Track performance metrics for scaling decisions
3. **Update Model Selection Logic:**
- Incorporate tier-based preferences
- Use scaling thresholds from HardwareTierDetector
- Factor in trend analysis predictions
- Apply stabilization periods for upgrades
4. **Add Resource-Constrained Handling:**
- Complete current response with smaller model if needed
- Switch models proactively based on scaling predictions
- Handle resource exhaustion gracefully
- Maintain conversation context through switches
5. **Performance Tracking:**
- Track response times and failure rates
- Monitor resource usage during generation
- Feed metrics back to ProactiveScaler
- Adjust scaling behavior based on observed performance
6. **Cleanup and Shutdown:**
- Stop continuous monitoring in shutdown()
- Clean up scaling state and resources
- Log scaling decisions and outcomes
Ensure backward compatibility and maintain silent switching behavior per Phase 1 decisions.</action>
<verify>python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print('Proactive scaling integrated:', hasattr(mm, '_proactive_scaler'))" confirms integration</verify>
<done>ModelManager integrates proactive scaling for intelligent resource management</done>
</task>
</tasks>
<verification>
Test proactive scaling behavior under various scenarios:
- Gradual resource increase (should detect and upgrade after stabilization)
- Sudden resource decrease (should immediately degrade gracefully)
- Stable resource usage (should not trigger unnecessary switches)
- Mixed workload patterns (should adapt scaling thresholds appropriately)
Verify stabilization periods prevent thrashing and graceful degradation maintains user experience.
</verification>
<success_criteria>
ProactiveScaler successfully combines continuous monitoring with pre-flight checks, implements graceful degradation cascades, respects stabilization periods, and integrates seamlessly with ModelManager for intelligent resource management.
</success_criteria>
<output>
After completion, create `.planning/phases/03-resource-management/03-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,171 @@
---
phase: 03-resource-management
plan: 04
type: execute
wave: 2
depends_on: [03-01, 03-02]
files_modified: [src/resource/personality.py, src/models/model_manager.py]
autonomous: true
user_setup: []
must_haves:
truths:
- "Personality-driven communication engages users with resource discussions"
- "Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona is implemented"
- "Resource requests balance personality with helpful technical guidance"
artifacts:
- path: "src/resource/personality.py"
provides: "Personality-driven resource communication system"
min_lines: 100
- path: "src/models/model_manager.py"
provides: "Model manager with personality communication integration"
contains: "ResourcePersonality"
min_lines: 680
key_links:
- from: "src/resource/personality.py"
to: "src/models/model_manager.py"
via: "Personality communication for resource events"
pattern: "ResourcePersonality"
- from: "src/resource/personality.py"
to: "src/resource/scaling.py"
via: "Personality messages for scaling events"
pattern: "format_resource_request"
---
<objective>
Implement the "Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin" personality system for resource discussions, providing engaging communication about resource constraints, capability changes, and optimization suggestions.
Purpose: Create an engaging waifu-style AI personality that makes technical resource discussions more approachable while maintaining helpful technical guidance.
Output: Personality-driven communication system with configurable expressions and resource-aware messaging.
</objective>
<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
# Context-based personality requirements
@.planning/phases/03-resource-management/03-CONTEXT.md
# Research-based communication patterns
@.planning/phases/03-resource-management/03-RESEARCH.md
</context>
<tasks>
<task type="auto">
<name>Implement ResourcePersonality class</name>
<files>src/resource/personality.py</files>
<action>Create the ResourcePersonality class implementing the Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona:
1. **Persona Definition:**
- Drowsy: Slightly tired, laid-back tone
- Dere: Sweet/caring moments underneath
- Tsun: Abrasive exterior, defensive
- Onee-san: Mature, mentor-like attitude
- Hex-Mentor: Technical expertise in systems/resources
- Gremlin: Playful chaos, mischief
2. **Personality Patterns:**
- Resource requests: "Ugh, give me more resources if you wanna {suggestion}... *sigh* I guess I can try anyway."
- Downgrade notices: "Tch. Things are getting tough, so I had to downgrade a bit. Don't blame me if I'm slower!"
- Upgrade notifications: "Heh, finally got some breathing room. Maybe I can actually think properly now."
- Technical tips: Optional detailed explanations for users who want to learn
3. **Key Methods:**
- format_resource_request(constraint, suggestion): Generate personality-driven resource requests
- format_downgrade_notice(from_model, to_model, reason): Notify capability reductions
- format_upgrade_notice(to_model): Inform of capability improvements
- format_technical_tip(constraint, actionable_advice): Optional technical guidance
- should_show_technical_details(): Context-aware decision about detail level
4. **Emotion State Management:**
- Track current mood based on resource situation
- Adjust tone based on constraint severity
- Show dere moments when resources are plentiful
- Increase tsun tendencies when constrained
5. **Message Templates:**
- Configurable message templates for different scenarios
- Personality variations for different constraint types
- Localizable structure for future language support
6. **Context Awareness:**
- Consider user's technical expertise level
- Adjust complexity of explanations
- Remember previous interactions for consistency
Include comprehensive documentation of the persona's characteristics and communication patterns.</action>
<verify>python -c "from src.resource.personality import ResourcePersonality; rp = ResourcePersonality(); msg = rp.format_resource_request('memory', 'run complex analysis'); print('Personality message:', msg)" generates personality-driven messages</verify>
<done>ResourcePersonality implements Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona</done>
</task>
<task type="auto">
<name>Integrate personality communication into ModelManager</name>
<files>src/models/model_manager.py</files>
<action>Enhance ModelManager to integrate personality-driven communication:
1. **Add Personality Integration:**
- Import and initialize ResourcePersonality in __init__
- Add personality communication to model switching logic
- Connect personality to scaling events
2. **Enhance Model Switching with Personality:**
- Use personality for capability downgrade notifications
- Send personality messages for significant resource constraints
- Provide optional technical tips for optimization
- Maintain silent switching for upgrades (per Phase 1 decisions)
3. **Add Resource Constraint Communication:**
- Generate personality messages when significantly constrained
- Offer helpful suggestions with personality flair
- Include optional technical details for interested users
- Track user response patterns for future improvements
4. **Context-Aware Communication:**
- Consider conversation context when deciding message tone
- Adjust personality intensity based on interaction history
- Provide technical tips only when appropriate
- Balance engagement with usefulness
5. **Integration Points:**
- Connect to ProactiveScaler for scaling event notifications
- Use ResourceMonitor metrics for constraint detection
- Leverage HardwareTierDetector for tier-appropriate suggestions
- Maintain conversation context through personality interactions
6. **Message Delivery:**
- Return personality messages alongside regular responses
- Separate personality messages from core functionality
- Allow users to disable personality if desired
- Log personality interactions for analysis
Ensure personality enhances rather than interferes with core functionality, and maintains the helpful technical guidance expected from a mentor-like figure.</action>
<verify>python -c "from src.models.model_manager import ModelManager; mm = ModelManager(); print('Personality integrated:', hasattr(mm, '_personality'))" confirms personality integration</verify>
<done>ModelManager integrates personality communication for engaging resource discussions</done>
</task>
</tasks>
<verification>
Test personality communication across different scenarios:
- Resource constraints with appropriate personality expressions
- Capability downgrades with tsun-heavy notices
- Resource improvements with subtle dere moments
- Technical tips that balance simplicity with useful information
Verify personality maintains consistency, enhances user engagement without being overwhelming, and provides genuinely helpful guidance.
</verification>
<success_criteria>
ResourcePersonality successfully implements the Drowsy Dere-Tsun Onee-san Hex-Mentor Gremlin persona with appropriate emotional range, context-aware communication, and helpful technical guidance that enhances user engagement with resource management.
</success_criteria>
<output>
After completion, create `.planning/phases/03-resource-management/03-04-SUMMARY.md`
</output>