- Project structure with modular architecture - State management system (emotion states, conversation history) - Transparent, draggable PyQt6 window - OpenGL rendering widget with placeholder cube - Discord bot framework (commands, event handlers) - Complete documentation (README, project plan, research findings) - Environment configuration template - Dependencies defined in requirements.txt MVP features working: - Transparent window appears at bottom-right - Window can be dragged around - Placeholder 3D cube renders and rotates - Emotion state changes on interaction - Event-driven state management Next steps: VRM model loading and rendering
3.2 KiB
3.2 KiB
Desktop Waifu Project
Overview
Desktop companion application controlled by LLM that can interact both on desktop and Discord.
Tech Stack
- Language: Python
- Character Model: VRM format
- LLM: Local (TBD which model)
- Distribution: .exe packaging
- Platforms: Desktop app + Discord bot
Core Features
Desktop Visuals
- VRM model rendering in transparent window
- Draggable character
- Sound effects on interaction (squeaks, touch sounds)
- Multiple poses/expressions controlled by LLM
- Always on top unless asked to hide
- VRM animations (TBD - see notes below)
AI & Interaction
- Local LLM (custom/TBD)
- Memory/context persistence
- AI-chosen personality
- Always-on chat interface
- Voice input (STT)
- Voice output (TTS - local)
Discord Integration
- Respond in servers/channels and DMs
- Proactive messaging
- Desktop-Discord state sync
System Integration
- System access (notifications, apps, searches, etc.)
- Designed for cross-platform deployment
- Future: OS-level integration
VRM Animation Notes
VRM models support:
- Blend shapes (facial expressions): smile, blink, surprised, angry, sad, etc.
- Bone animations: waving, pointing, head tilts, body movements
- Presets: if your VRM has animation clips embedded
- IK (Inverse Kinematics): dynamic movements like looking at cursor
We can trigger these based on:
- LLM emotional state (happy → smile + wave)
- User interaction (grabbed → surprised expression + squeak)
- Idle states (occasional blinks, breathing animation)
- Context (thinking → hand on chin pose)
MVP (Phase 1)
- VRM model renders on screen
- Transparent, draggable window
- Basic sound on interaction
- Simple chat interface (text only)
- Basic LLM connection (local)
- Simple expression changes (happy/neutral/sad)
- Discord bot integration (respond in servers/DMs, basic sync with desktop)
Post-MVP Features
- Voice I/O (STT/TTS)
- Advanced animations
- Full system integration (notifications, app control, etc.)
- Memory persistence (database)
- Proactive messaging
- .exe packaging
- Cross-platform support
Architecture
Components
- VRM Renderer (PyOpenGL + VRM loader)
- LLM Backend (local inference)
- Audio System (TTS, STT, sound effects)
- Discord Client (discord.py)
- State Manager (sync between desktop/Discord)
- System Interface (OS interactions)
- GUI Framework (PyQt/tkinter with transparency)
Data Flow
User Input (voice/text/click)
↓
State Manager
↓
LLM Processing
↓
Output (animation + voice + text + actions)
↓
Desktop Display + Discord Bot
Tech Stack Candidates
VRM Rendering
- PyVRM (if available)
- PyOpenGL + custom VRM parser
- Unity Python wrapper (heavy)
- Godot Python binding (alternative)
LLM
- llama.cpp Python bindings
- Ollama API
- Custom model server
- Transformers library
Voice
- TTS: pyttsx3, Coqui TTS, XTTS
- STT: Whisper (local), Vosk
Discord
- discord.py
Packaging
- PyInstaller
- Nuitka (better performance)
Current Status
- Phase: Planning
- Last Updated: 2025-09-30
Next Steps
- Research VRM rendering in Python
- Create basic window with transparency
- Load and display VRM model
- Implement dragging
- Add sound effects