--- phase: 03-resource-management plan: 01 type: execute wave: 1 depends_on: [] files_modified: [pyproject.toml, src/models/resource_monitor.py] autonomous: true user_setup: [] must_haves: truths: - "Enhanced resource monitor can detect NVIDIA GPU VRAM using pynvml" - "GPU detection falls back gracefully when GPU unavailable" - "Resource monitoring remains cross-platform compatible" artifacts: - path: "src/models/resource_monitor.py" provides: "Enhanced GPU detection with pynvml support" contains: "pynvml" min_lines: 250 - path: "pyproject.toml" provides: "pynvml dependency for GPU monitoring" contains: "pynvml" key_links: - from: "src/models/resource_monitor.py" to: "pynvml library" via: "import pynvml" pattern: "import pynvml" --- Enhance GPU detection and monitoring capabilities by integrating pynvml for precise NVIDIA GPU VRAM tracking while maintaining cross-platform compatibility and graceful fallbacks. Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions. Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations. @~/.opencode/get-shit-done/workflows/execute-plan.md @~/.opencode/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md # Current implementation @src/models/resource_monitor.py @pyproject.toml Add pynvml dependency to project pyproject.toml Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional. grep -n "pynvml" pyproject.toml shows the dependency added correctly pynvml dependency is available for GPU monitoring Enhance ResourceMonitor with pynvml GPU detection src/models/resource_monitor.py Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection: 1. Add pynvml import at the top of the file 2. Replace the current _get_gpu_memory() implementation with pynvml-based detection: - Initialize pynvml with proper error handling - Get GPU handle and memory info using pynvml APIs - Return total, used, and free VRAM in GB - Handle NVMLError gracefully and fallback to existing gpu-tracker logic - Ensure pynvmlShutdown() is always called in finally block 3. Update get_current_resources() to include detailed GPU info: - gpu_total_vram_gb: Total VRAM capacity - gpu_used_vram_gb: Currently used VRAM - gpu_free_vram_gb: Available VRAM - gpu_utilization_percent: GPU utilization (if available) 4. Add GPU temperature monitoring if available via pynvml 5. Maintain backward compatibility with existing return format The enhanced GPU detection should: - Try pynvml first for NVIDIA GPUs - Fall back to gpu-tracker for other vendors - Return 0 values if no GPU detected - Handle all exceptions gracefully - Log GPU detection results at debug level python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks Test enhanced resource monitoring across different configurations: - Systems with NVIDIA GPUs (pynvml should work) - Systems with AMD/Intel GPUs (fallback to gpu-tracker) - Systems without GPUs (graceful zero values) - Cross-platform compatibility (Linux, Windows, macOS) Verify monitoring overhead remains < 1% CPU usage. ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection. After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md`