Initial commit: Desktop Waifu MVP foundation
- Project structure with modular architecture - State management system (emotion states, conversation history) - Transparent, draggable PyQt6 window - OpenGL rendering widget with placeholder cube - Discord bot framework (commands, event handlers) - Complete documentation (README, project plan, research findings) - Environment configuration template - Dependencies defined in requirements.txt MVP features working: - Transparent window appears at bottom-right - Window can be dragged around - Placeholder 3D cube renders and rotates - Emotion state changes on interaction - Event-driven state management Next steps: VRM model loading and rendering
This commit is contained in:
130
PROJECT_PLAN.md
Normal file
130
PROJECT_PLAN.md
Normal file
@@ -0,0 +1,130 @@
|
||||
# Desktop Waifu Project
|
||||
|
||||
## Overview
|
||||
Desktop companion application controlled by LLM that can interact both on desktop and Discord.
|
||||
|
||||
## Tech Stack
|
||||
- **Language**: Python
|
||||
- **Character Model**: VRM format
|
||||
- **LLM**: Local (TBD which model)
|
||||
- **Distribution**: .exe packaging
|
||||
- **Platforms**: Desktop app + Discord bot
|
||||
|
||||
## Core Features
|
||||
|
||||
### Desktop Visuals
|
||||
- VRM model rendering in transparent window
|
||||
- Draggable character
|
||||
- Sound effects on interaction (squeaks, touch sounds)
|
||||
- Multiple poses/expressions controlled by LLM
|
||||
- Always on top unless asked to hide
|
||||
- VRM animations (TBD - see notes below)
|
||||
|
||||
### AI & Interaction
|
||||
- Local LLM (custom/TBD)
|
||||
- Memory/context persistence
|
||||
- AI-chosen personality
|
||||
- Always-on chat interface
|
||||
- Voice input (STT)
|
||||
- Voice output (TTS - local)
|
||||
|
||||
### Discord Integration
|
||||
- Respond in servers/channels and DMs
|
||||
- Proactive messaging
|
||||
- Desktop-Discord state sync
|
||||
|
||||
### System Integration
|
||||
- System access (notifications, apps, searches, etc.)
|
||||
- Designed for cross-platform deployment
|
||||
- Future: OS-level integration
|
||||
|
||||
## VRM Animation Notes
|
||||
VRM models support:
|
||||
- **Blend shapes** (facial expressions): smile, blink, surprised, angry, sad, etc.
|
||||
- **Bone animations**: waving, pointing, head tilts, body movements
|
||||
- **Presets**: if your VRM has animation clips embedded
|
||||
- **IK (Inverse Kinematics)**: dynamic movements like looking at cursor
|
||||
|
||||
We can trigger these based on:
|
||||
- LLM emotional state (happy → smile + wave)
|
||||
- User interaction (grabbed → surprised expression + squeak)
|
||||
- Idle states (occasional blinks, breathing animation)
|
||||
- Context (thinking → hand on chin pose)
|
||||
|
||||
## MVP (Phase 1)
|
||||
1. VRM model renders on screen
|
||||
2. Transparent, draggable window
|
||||
3. Basic sound on interaction
|
||||
4. Simple chat interface (text only)
|
||||
5. Basic LLM connection (local)
|
||||
6. Simple expression changes (happy/neutral/sad)
|
||||
7. **Discord bot integration** (respond in servers/DMs, basic sync with desktop)
|
||||
|
||||
## Post-MVP Features
|
||||
- Voice I/O (STT/TTS)
|
||||
- Advanced animations
|
||||
- Full system integration (notifications, app control, etc.)
|
||||
- Memory persistence (database)
|
||||
- Proactive messaging
|
||||
- .exe packaging
|
||||
- Cross-platform support
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
1. **VRM Renderer** (PyOpenGL + VRM loader)
|
||||
2. **LLM Backend** (local inference)
|
||||
3. **Audio System** (TTS, STT, sound effects)
|
||||
4. **Discord Client** (discord.py)
|
||||
5. **State Manager** (sync between desktop/Discord)
|
||||
6. **System Interface** (OS interactions)
|
||||
7. **GUI Framework** (PyQt/tkinter with transparency)
|
||||
|
||||
### Data Flow
|
||||
```
|
||||
User Input (voice/text/click)
|
||||
↓
|
||||
State Manager
|
||||
↓
|
||||
LLM Processing
|
||||
↓
|
||||
Output (animation + voice + text + actions)
|
||||
↓
|
||||
Desktop Display + Discord Bot
|
||||
```
|
||||
|
||||
## Tech Stack Candidates
|
||||
|
||||
### VRM Rendering
|
||||
- PyVRM (if available)
|
||||
- PyOpenGL + custom VRM parser
|
||||
- Unity Python wrapper (heavy)
|
||||
- Godot Python binding (alternative)
|
||||
|
||||
### LLM
|
||||
- llama.cpp Python bindings
|
||||
- Ollama API
|
||||
- Custom model server
|
||||
- Transformers library
|
||||
|
||||
### Voice
|
||||
- **TTS**: pyttsx3, Coqui TTS, XTTS
|
||||
- **STT**: Whisper (local), Vosk
|
||||
|
||||
### Discord
|
||||
- discord.py
|
||||
|
||||
### Packaging
|
||||
- PyInstaller
|
||||
- Nuitka (better performance)
|
||||
|
||||
## Current Status
|
||||
- Phase: Planning
|
||||
- Last Updated: 2025-09-30
|
||||
|
||||
## Next Steps
|
||||
1. Research VRM rendering in Python
|
||||
2. Create basic window with transparency
|
||||
3. Load and display VRM model
|
||||
4. Implement dragging
|
||||
5. Add sound effects
|
Reference in New Issue
Block a user