Mai/.planning/phases/03-resource-management/03-01-PLAN.md

---
phase: 03-resource-management
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: [pyproject.toml, src/models/resource_monitor.py]
autonomous: true
user_setup: []

must_haves:
  truths:
    - "Enhanced resource monitor can detect NVIDIA GPU VRAM using pynvml"
    - "GPU detection falls back gracefully when GPU unavailable"
    - "Resource monitoring remains cross-platform compatible"
  artifacts:
    - path: "src/models/resource_monitor.py"
      provides: "Enhanced GPU detection with pynvml support"
      contains: "pynvml"
      min_lines: 250
    - path: "pyproject.toml"
      provides: "pynvml dependency for GPU monitoring"
      contains: "pynvml"
  key_links:
    - from: "src/models/resource_monitor.py"
      to: "pynvml library"
      via: "import pynvml"
      pattern: "import pynvml"
---

<objective>
Enhance GPU detection and monitoring capabilities by integrating pynvml for precise NVIDIA GPU VRAM tracking while maintaining cross-platform compatibility and graceful fallbacks.

Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions.
Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations.
</objective>

<execution_context>
@~/.opencode/get-shit-done/workflows/execute-plan.md
@~/.opencode/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md

# Current implementation
@src/models/resource_monitor.py
@pyproject.toml
</context>

<tasks>

<task type="auto">
  <name>Add pynvml dependency to project</name>
  <files>pyproject.toml</files>
  <action>Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional.</action>
  <verify>grep -n "pynvml" pyproject.toml shows the dependency added correctly</verify>
  <done>pynvml dependency is available for GPU monitoring</done>
</task>

<task type="auto">
  <name>Enhance ResourceMonitor with pynvml GPU detection</name>
  <files>src/models/resource_monitor.py</files>
  <action>
Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection:

1. Add pynvml import at the top of the file
2. Replace the current _get_gpu_memory() implementation with pynvml-based detection:
   - Initialize pynvml with proper error handling
   - Get GPU handle and memory info using pynvml APIs
   - Return total, used, and free VRAM in GB
   - Handle NVMLError gracefully and fallback to existing gpu-tracker logic
   - Ensure pynvmlShutdown() is always called in finally block
3. Update get_current_resources() to include detailed GPU info:
   - gpu_total_vram_gb: Total VRAM capacity
   - gpu_used_vram_gb: Currently used VRAM
   - gpu_free_vram_gb: Available VRAM
   - gpu_utilization_percent: GPU utilization (if available)
4. Add GPU temperature monitoring if available via pynvml
5. Maintain backward compatibility with existing return format

The enhanced GPU detection should:
- Try pynvml first for NVIDIA GPUs
- Fall back to gpu-tracker for other vendors
- Return 0 values if no GPU detected
- Handle all exceptions gracefully
- Log GPU detection results at debug level
</action>
  <verify>python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors</verify>
  <done>ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks</done>
</task>

</tasks>

<verification>
Test enhanced resource monitoring across different configurations:
- Systems with NVIDIA GPUs (pynvml should work)
- Systems with AMD/Intel GPUs (fallback to gpu-tracker)
- Systems without GPUs (graceful zero values)
- Cross-platform compatibility (Linux, Windows, macOS)

Verify monitoring overhead remains < 1% CPU usage.
</verification>

<success_criteria>
ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection.
</success_criteria>

<output>
After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md`
</output>