Rosie/PROJECT_PLAN.md

# Desktop Waifu Project

## Overview
Desktop companion application controlled by LLM that can interact both on desktop and Discord.

## Tech Stack
- **Language**: Python
- **Character Model**: VRM format
- **LLM**: Local (TBD which model)
- **Distribution**: .exe packaging
- **Platforms**: Desktop app + Discord bot

## Core Features

### Desktop Visuals
- VRM model rendering in transparent window
- Draggable character
- Sound effects on interaction (squeaks, touch sounds)
- Multiple poses/expressions controlled by LLM
- Always on top unless asked to hide
- VRM animations (TBD - see notes below)

### AI & Interaction
- Local LLM (custom/TBD)
- Memory/context persistence
- AI-chosen personality
- Always-on chat interface
- Voice input (STT)
- Voice output (TTS - local)

### Discord Integration
- Respond in servers/channels and DMs
- Proactive messaging
- Desktop-Discord state sync

### System Integration
- System access (notifications, apps, searches, etc.)
- Designed for cross-platform deployment
- Future: OS-level integration

## VRM Animation Notes
VRM models support:
- **Blend shapes** (facial expressions): smile, blink, surprised, angry, sad, etc.
- **Bone animations**: waving, pointing, head tilts, body movements
- **Presets**: if your VRM has animation clips embedded
- **IK (Inverse Kinematics)**: dynamic movements like looking at cursor

We can trigger these based on:
- LLM emotional state (happy → smile + wave)
- User interaction (grabbed → surprised expression + squeak)
- Idle states (occasional blinks, breathing animation)
- Context (thinking → hand on chin pose)

## MVP (Phase 1)
1. VRM model renders on screen
2. Transparent, draggable window
3. Basic sound on interaction
4. Simple chat interface (text only)
5. Basic LLM connection (local)
6. Simple expression changes (happy/neutral/sad)
7. **Discord bot integration** (respond in servers/DMs, basic sync with desktop)

## Post-MVP Features
- Voice I/O (STT/TTS)
- Advanced animations
- Full system integration (notifications, app control, etc.)
- Memory persistence (database)
- Proactive messaging
- .exe packaging
- Cross-platform support

## Architecture

### Components
1. **VRM Renderer** (PyOpenGL + VRM loader)
2. **LLM Backend** (local inference)
3. **Audio System** (TTS, STT, sound effects)
4. **Discord Client** (discord.py)
5. **State Manager** (sync between desktop/Discord)
6. **System Interface** (OS interactions)
7. **GUI Framework** (PyQt/tkinter with transparency)

### Data Flow
```
User Input (voice/text/click)
    ↓
State Manager
    ↓
LLM Processing
    ↓
Output (animation + voice + text + actions)
    ↓
Desktop Display + Discord Bot
```

## Tech Stack Candidates

### VRM Rendering
- PyVRM (if available)
- PyOpenGL + custom VRM parser
- Unity Python wrapper (heavy)
- Godot Python binding (alternative)

### LLM
- llama.cpp Python bindings
- Ollama API
- Custom model server
- Transformers library

### Voice
- **TTS**: pyttsx3, Coqui TTS, XTTS
- **STT**: Whisper (local), Vosk

### Discord
- discord.py

### Packaging
- PyInstaller
- Nuitka (better performance)

## Current Status
- Phase: Planning
- Last Updated: 2025-09-30

## Next Steps
1. Research VRM rendering in Python
2. Create basic window with transparency
3. Load and display VRM model
4. Implement dragging
5. Add sound effects