Phase 3: Resource Management - 4 plan(s) in 2 wave(s) - 2 parallel, 2 sequential - Ready for execution
4.3 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, user_setup, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | user_setup | must_haves | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 03-resource-management | 01 | execute | 1 |
|
true |
|
Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions. Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations.
<execution_context>
@/.opencode/get-shit-done/workflows/execute-plan.md
@/.opencode/get-shit-done/templates/summary.md
</execution_context>
Current implementation
@src/models/resource_monitor.py @pyproject.toml
Add pynvml dependency to project pyproject.toml Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional. grep -n "pynvml" pyproject.toml shows the dependency added correctly pynvml dependency is available for GPU monitoring Enhance ResourceMonitor with pynvml GPU detection src/models/resource_monitor.py Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection:- Add pynvml import at the top of the file
- Replace the current _get_gpu_memory() implementation with pynvml-based detection:
- Initialize pynvml with proper error handling
- Get GPU handle and memory info using pynvml APIs
- Return total, used, and free VRAM in GB
- Handle NVMLError gracefully and fallback to existing gpu-tracker logic
- Ensure pynvmlShutdown() is always called in finally block
- Update get_current_resources() to include detailed GPU info:
- gpu_total_vram_gb: Total VRAM capacity
- gpu_used_vram_gb: Currently used VRAM
- gpu_free_vram_gb: Available VRAM
- gpu_utilization_percent: GPU utilization (if available)
- Add GPU temperature monitoring if available via pynvml
- Maintain backward compatibility with existing return format
The enhanced GPU detection should:
- Try pynvml first for NVIDIA GPUs
- Fall back to gpu-tracker for other vendors
- Return 0 values if no GPU detected
- Handle all exceptions gracefully
- Log GPU detection results at debug level python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks
Verify monitoring overhead remains < 1% CPU usage.
<success_criteria> ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection. </success_criteria>
After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md`