- Project structure with modular architecture - State management system (emotion states, conversation history) - Transparent, draggable PyQt6 window - OpenGL rendering widget with placeholder cube - Discord bot framework (commands, event handlers) - Complete documentation (README, project plan, research findings) - Environment configuration template - Dependencies defined in requirements.txt MVP features working: - Transparent window appears at bottom-right - Window can be dragged around - Placeholder 3D cube renders and rotates - Emotion state changes on interaction - Event-driven state management Next steps: VRM model loading and rendering
131 lines
3.2 KiB
Markdown
131 lines
3.2 KiB
Markdown
# Desktop Waifu Project
|
|
|
|
## Overview
|
|
Desktop companion application controlled by LLM that can interact both on desktop and Discord.
|
|
|
|
## Tech Stack
|
|
- **Language**: Python
|
|
- **Character Model**: VRM format
|
|
- **LLM**: Local (TBD which model)
|
|
- **Distribution**: .exe packaging
|
|
- **Platforms**: Desktop app + Discord bot
|
|
|
|
## Core Features
|
|
|
|
### Desktop Visuals
|
|
- VRM model rendering in transparent window
|
|
- Draggable character
|
|
- Sound effects on interaction (squeaks, touch sounds)
|
|
- Multiple poses/expressions controlled by LLM
|
|
- Always on top unless asked to hide
|
|
- VRM animations (TBD - see notes below)
|
|
|
|
### AI & Interaction
|
|
- Local LLM (custom/TBD)
|
|
- Memory/context persistence
|
|
- AI-chosen personality
|
|
- Always-on chat interface
|
|
- Voice input (STT)
|
|
- Voice output (TTS - local)
|
|
|
|
### Discord Integration
|
|
- Respond in servers/channels and DMs
|
|
- Proactive messaging
|
|
- Desktop-Discord state sync
|
|
|
|
### System Integration
|
|
- System access (notifications, apps, searches, etc.)
|
|
- Designed for cross-platform deployment
|
|
- Future: OS-level integration
|
|
|
|
## VRM Animation Notes
|
|
VRM models support:
|
|
- **Blend shapes** (facial expressions): smile, blink, surprised, angry, sad, etc.
|
|
- **Bone animations**: waving, pointing, head tilts, body movements
|
|
- **Presets**: if your VRM has animation clips embedded
|
|
- **IK (Inverse Kinematics)**: dynamic movements like looking at cursor
|
|
|
|
We can trigger these based on:
|
|
- LLM emotional state (happy → smile + wave)
|
|
- User interaction (grabbed → surprised expression + squeak)
|
|
- Idle states (occasional blinks, breathing animation)
|
|
- Context (thinking → hand on chin pose)
|
|
|
|
## MVP (Phase 1)
|
|
1. VRM model renders on screen
|
|
2. Transparent, draggable window
|
|
3. Basic sound on interaction
|
|
4. Simple chat interface (text only)
|
|
5. Basic LLM connection (local)
|
|
6. Simple expression changes (happy/neutral/sad)
|
|
7. **Discord bot integration** (respond in servers/DMs, basic sync with desktop)
|
|
|
|
## Post-MVP Features
|
|
- Voice I/O (STT/TTS)
|
|
- Advanced animations
|
|
- Full system integration (notifications, app control, etc.)
|
|
- Memory persistence (database)
|
|
- Proactive messaging
|
|
- .exe packaging
|
|
- Cross-platform support
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
1. **VRM Renderer** (PyOpenGL + VRM loader)
|
|
2. **LLM Backend** (local inference)
|
|
3. **Audio System** (TTS, STT, sound effects)
|
|
4. **Discord Client** (discord.py)
|
|
5. **State Manager** (sync between desktop/Discord)
|
|
6. **System Interface** (OS interactions)
|
|
7. **GUI Framework** (PyQt/tkinter with transparency)
|
|
|
|
### Data Flow
|
|
```
|
|
User Input (voice/text/click)
|
|
↓
|
|
State Manager
|
|
↓
|
|
LLM Processing
|
|
↓
|
|
Output (animation + voice + text + actions)
|
|
↓
|
|
Desktop Display + Discord Bot
|
|
```
|
|
|
|
## Tech Stack Candidates
|
|
|
|
### VRM Rendering
|
|
- PyVRM (if available)
|
|
- PyOpenGL + custom VRM parser
|
|
- Unity Python wrapper (heavy)
|
|
- Godot Python binding (alternative)
|
|
|
|
### LLM
|
|
- llama.cpp Python bindings
|
|
- Ollama API
|
|
- Custom model server
|
|
- Transformers library
|
|
|
|
### Voice
|
|
- **TTS**: pyttsx3, Coqui TTS, XTTS
|
|
- **STT**: Whisper (local), Vosk
|
|
|
|
### Discord
|
|
- discord.py
|
|
|
|
### Packaging
|
|
- PyInstaller
|
|
- Nuitka (better performance)
|
|
|
|
## Current Status
|
|
- Phase: Planning
|
|
- Last Updated: 2025-09-30
|
|
|
|
## Next Steps
|
|
1. Research VRM rendering in Python
|
|
2. Create basic window with transparency
|
|
3. Load and display VRM model
|
|
4. Implement dragging
|
|
5. Add sound effects
|