Files
Mai/.planning/phases/03-resource-management/03-01-PLAN.md
Mai Development 1e071398ff
Some checks failed
Discord Webhook / git (push) Has been cancelled
docs(03): create phase plan
Phase 3: Resource Management
- 4 plan(s) in 2 wave(s)
- 2 parallel, 2 sequential
- Ready for execution
2026-01-27 17:58:09 -05:00

4.3 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, user_setup, must_haves
phase plan type wave depends_on files_modified autonomous user_setup must_haves
03-resource-management 01 execute 1
pyproject.toml
src/models/resource_monitor.py
true
truths artifacts key_links
Enhanced resource monitor can detect NVIDIA GPU VRAM using pynvml
GPU detection falls back gracefully when GPU unavailable
Resource monitoring remains cross-platform compatible
path provides contains min_lines
src/models/resource_monitor.py Enhanced GPU detection with pynvml support pynvml 250
path provides contains
pyproject.toml pynvml dependency for GPU monitoring pynvml
from to via pattern
src/models/resource_monitor.py pynvml library import pynvml import pynvml
Enhance GPU detection and monitoring capabilities by integrating pynvml for precise NVIDIA GPU VRAM tracking while maintaining cross-platform compatibility and graceful fallbacks.

Purpose: Provide accurate GPU resource detection for intelligent model selection and proactive scaling decisions. Output: Enhanced ResourceMonitor with reliable GPU VRAM monitoring across different hardware configurations.

<execution_context> @/.opencode/get-shit-done/workflows/execute-plan.md @/.opencode/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md

Current implementation

@src/models/resource_monitor.py @pyproject.toml

Add pynvml dependency to project pyproject.toml Add pynvml>=11.0.0 to the main dependencies array in pyproject.toml. This ensures NVIDIA GPU monitoring capabilities are available by default rather than being optional. grep -n "pynvml" pyproject.toml shows the dependency added correctly pynvml dependency is available for GPU monitoring Enhance ResourceMonitor with pynvml GPU detection src/models/resource_monitor.py Enhance the _get_gpu_memory() method to use pynvml for precise NVIDIA GPU VRAM detection:
  1. Add pynvml import at the top of the file
  2. Replace the current _get_gpu_memory() implementation with pynvml-based detection:
    • Initialize pynvml with proper error handling
    • Get GPU handle and memory info using pynvml APIs
    • Return total, used, and free VRAM in GB
    • Handle NVMLError gracefully and fallback to existing gpu-tracker logic
    • Ensure pynvmlShutdown() is always called in finally block
  3. Update get_current_resources() to include detailed GPU info:
    • gpu_total_vram_gb: Total VRAM capacity
    • gpu_used_vram_gb: Currently used VRAM
    • gpu_free_vram_gb: Available VRAM
    • gpu_utilization_percent: GPU utilization (if available)
  4. Add GPU temperature monitoring if available via pynvml
  5. Maintain backward compatibility with existing return format

The enhanced GPU detection should:

  • Try pynvml first for NVIDIA GPUs
  • Fall back to gpu-tracker for other vendors
  • Return 0 values if no GPU detected
  • Handle all exceptions gracefully
  • Log GPU detection results at debug level python -c "from src.models.resource_monitor import ResourceMonitor; rm = ResourceMonitor(); resources = rm.get_current_resources(); print('GPU detection:', {k: v for k, v in resources.items() if 'gpu' in k})" returns GPU metrics without errors ResourceMonitor provides accurate GPU VRAM monitoring using pynvml with proper fallbacks
Test enhanced resource monitoring across different configurations: - Systems with NVIDIA GPUs (pynvml should work) - Systems with AMD/Intel GPUs (fallback to gpu-tracker) - Systems without GPUs (graceful zero values) - Cross-platform compatibility (Linux, Windows, macOS)

Verify monitoring overhead remains < 1% CPU usage.

<success_criteria> ResourceMonitor successfully detects and reports GPU VRAM using pynvml when available, falls back gracefully to other methods, maintains cross-platform compatibility, and provides detailed GPU metrics for intelligent model selection. </success_criteria>

After completion, create `.planning/phases/03-resource-management/03-01-SUMMARY.md`