Files
Rosie/PROJECT_PLAN.md
Dani a657979bfd Initial commit: Desktop Waifu MVP foundation
- Project structure with modular architecture
- State management system (emotion states, conversation history)
- Transparent, draggable PyQt6 window
- OpenGL rendering widget with placeholder cube
- Discord bot framework (commands, event handlers)
- Complete documentation (README, project plan, research findings)
- Environment configuration template
- Dependencies defined in requirements.txt

MVP features working:
- Transparent window appears at bottom-right
- Window can be dragged around
- Placeholder 3D cube renders and rotates
- Emotion state changes on interaction
- Event-driven state management

Next steps: VRM model loading and rendering
2025-09-30 18:42:54 -04:00

3.2 KiB

Desktop Waifu Project

Overview

Desktop companion application controlled by LLM that can interact both on desktop and Discord.

Tech Stack

  • Language: Python
  • Character Model: VRM format
  • LLM: Local (TBD which model)
  • Distribution: .exe packaging
  • Platforms: Desktop app + Discord bot

Core Features

Desktop Visuals

  • VRM model rendering in transparent window
  • Draggable character
  • Sound effects on interaction (squeaks, touch sounds)
  • Multiple poses/expressions controlled by LLM
  • Always on top unless asked to hide
  • VRM animations (TBD - see notes below)

AI & Interaction

  • Local LLM (custom/TBD)
  • Memory/context persistence
  • AI-chosen personality
  • Always-on chat interface
  • Voice input (STT)
  • Voice output (TTS - local)

Discord Integration

  • Respond in servers/channels and DMs
  • Proactive messaging
  • Desktop-Discord state sync

System Integration

  • System access (notifications, apps, searches, etc.)
  • Designed for cross-platform deployment
  • Future: OS-level integration

VRM Animation Notes

VRM models support:

  • Blend shapes (facial expressions): smile, blink, surprised, angry, sad, etc.
  • Bone animations: waving, pointing, head tilts, body movements
  • Presets: if your VRM has animation clips embedded
  • IK (Inverse Kinematics): dynamic movements like looking at cursor

We can trigger these based on:

  • LLM emotional state (happy → smile + wave)
  • User interaction (grabbed → surprised expression + squeak)
  • Idle states (occasional blinks, breathing animation)
  • Context (thinking → hand on chin pose)

MVP (Phase 1)

  1. VRM model renders on screen
  2. Transparent, draggable window
  3. Basic sound on interaction
  4. Simple chat interface (text only)
  5. Basic LLM connection (local)
  6. Simple expression changes (happy/neutral/sad)
  7. Discord bot integration (respond in servers/DMs, basic sync with desktop)

Post-MVP Features

  • Voice I/O (STT/TTS)
  • Advanced animations
  • Full system integration (notifications, app control, etc.)
  • Memory persistence (database)
  • Proactive messaging
  • .exe packaging
  • Cross-platform support

Architecture

Components

  1. VRM Renderer (PyOpenGL + VRM loader)
  2. LLM Backend (local inference)
  3. Audio System (TTS, STT, sound effects)
  4. Discord Client (discord.py)
  5. State Manager (sync between desktop/Discord)
  6. System Interface (OS interactions)
  7. GUI Framework (PyQt/tkinter with transparency)

Data Flow

User Input (voice/text/click)
    ↓
State Manager
    ↓
LLM Processing
    ↓
Output (animation + voice + text + actions)
    ↓
Desktop Display + Discord Bot

Tech Stack Candidates

VRM Rendering

  • PyVRM (if available)
  • PyOpenGL + custom VRM parser
  • Unity Python wrapper (heavy)
  • Godot Python binding (alternative)

LLM

  • llama.cpp Python bindings
  • Ollama API
  • Custom model server
  • Transformers library

Voice

  • TTS: pyttsx3, Coqui TTS, XTTS
  • STT: Whisper (local), Vosk

Discord

  • discord.py

Packaging

  • PyInstaller
  • Nuitka (better performance)

Current Status

  • Phase: Planning
  • Last Updated: 2025-09-30

Next Steps

  1. Research VRM rendering in Python
  2. Create basic window with transparency
  3. Load and display VRM model
  4. Implement dragging
  5. Add sound effects