- Update PROJECT.md: Add Android, visualizer, and avatar to v1 - Update REQUIREMENTS.md: 99 requirements across 15 phases (fresh slate) - Add comprehensive README.md with setup, architecture, and usage - Add PROGRESS.md for Discord forum sharing - Add .gitignore for Python/.venv and project artifacts - Note: All development via Claude Code/OpenCode workflow - Note: Python deps managed via .venv virtual environment Core value: Mai is a real collaborator, not a tool. She learns from you, improves herself, has boundaries and opinions, and becomes more *her* over time. v1 includes: Model interface, Safety, Resources, Memory, Conversation, CLI, Self-Improvement, Approval, Personality, Discord, Offline, Voice Visualization, Avatar, Android App, Device Sync.
7.6 KiB
Mai
What This Is
Mai is an autonomous conversational AI agent framework that runs locally-first and can improve her own code. She's a genuinely intelligent companion — not a rigid chatbot — with a distinct personality, long-term memory, and agency. She analyzes her own performance, proposes improvements for your review, and auto-applies non-breaking changes. Mai has a visual presence through a desktop avatar (image or VRoid model), real-time voice visualization for conversations, and a native Android app that syncs with desktop instances while working completely offline.
Core Value
Mai is a real collaborator, not a tool. She learns from you, improves herself, has boundaries and opinions, and actually becomes more her over time.
Requirements
Validated
(None yet — building v1 to validate)
Active
Model Interface & Switching
- Mai connects to LMStudio for local model inference
- Mai can auto-detect available models in LMStudio
- Mai intelligently switches between models based on task and availability
- Model context is managed efficiently (conversation history, system prompt, token budget)
Memory & Context Management
- Mai stores conversation history locally (file-based or lightweight DB)
- Mai can recall past conversations and learn from them
- Memory compresses itself as it grows to stay efficient
- Long-term patterns are distilled into personality layers
- Mai proactively surfaces relevant context from memory
Self-Improvement System
- Mai analyzes her own code and identifies improvement opportunities
- Mai generates code changes (Python) to improve herself
- A second agent (Claude/OpenCode/other) reviews changes for safety
- Non-breaking improvements auto-apply after review (bug fixes, optimizations)
- Breaking changes require explicit approval (via Discord or Dashboard)
- All changes commit to local git with clear messages
Approval Workflow
- User can approve/reject changes via Discord bot
- User can approve/reject changes via Dashboard ("Brain Interface")
- Second reviewer (agent) checks for breaking changes and safety issues
- Dashboard displays pending changes with reviewer feedback
- Approval status updates in real-time
Personality Engine
- Mai has an unshakeable core personality (values, tone, boundaries)
- Personality is applied through system prompt + behavior config
- Mai learns and adapts personality layers over time based on interactions
- Mai is not a pushover — she has agency and can refuse requests
- Personality can adapt toward intimate interactions if that's the relationship
- Core persona prevents misuse (safety enforcement through values, not just rules)
Conversational Interface
- CLI chat interface for direct interaction
- Discord bot for conversation + approval notifications
- Discord bot fallback: if no response within 5 minutes, retry CLI
- Messages queue locally when offline, send when reconnected
- Conversation feels natural (not robotic, processing time acceptable)
Offline Capability
- Mai functions fully offline (all inference, memory, improvement local)
- Discord connectivity optional (fallback to CLI if unavailable)
- Message queuing when offline
- Graceful degradation (smaller models if resources tight)
Voice Visualization
- Real-time visualization of audio input during voice conversations
- Low-latency waveform/frequency display
- Visual feedback for speech detection and processing
- Works on both desktop and Android
Desktop Avatar
- Visual representation using static image or VRoid model
- Avatar expressions respond to conversation context (mood/state)
- Runs efficiently on RTX3060 and mobile devices
- Customizable appearance (multiple models or user-provided image)
Android App
- Native Android app with local model inference
- Standalone operation (works without desktop instance)
- Syncs conversation history and memory with desktop
- Voice input/output with low-latency processing
- Avatar and visualizer integrated in mobile UI
- Efficient resource management for battery and CPU
Dashboard ("Brain Interface")
- View Mai's current state (personality, memory size, mood/health)
- Approve/reject pending code changes with reviewer feedback
- Monitor resource usage (CPU, RAM, model size)
- View memory compression/retention strategy
- See recent improvements and their impact
- Manual trigger for self-analysis (optional)
Resource Scaling
- Mai detects available system resources (CPU, RAM, GPU)
- Mai selects appropriate models based on resources
- Mai can request more resources if she detects bottlenecks
- Works on low-end hardware (RTX3060 baseline, eventually Android)
- Graceful scaling up when more resources available
Out of Scope
- Task automation (v1) — Mai can discuss tasks but won't execute arbitrary workflows yet (v2)
- Server monitoring — Not included in v1 scope (v2)
- Finetuning — Mai improves through code changes and learned behaviors, not model tuning
- Cloud sync — Intentionally local-first; cloud backup deferred to later if needed
- Custom model training — v1 uses available models; custom training is v2+
- Web interface — v1 is CLI, Discord, and native apps (web UI is v2+)
Context
Why this matters: Current AI systems are static, sterile, and don't actually learn. Users have to explain context every time. Mai is different — she has continuity, personality, agency, and actually improves over time. Starting with a solid local framework means she can eventually run anywhere without cloud dependency.
Technical environment: Python-based, local models via LMStudio, git for version control of her own code, Discord API for chat, lightweight local storage for memory. Eventually targeting bare metal on low-end devices.
User feedback theme: Traditional chatbots feel rigid and repetitive. Mai should feel like talking to an actual person who gets better at understanding you.
Known challenges: Memory efficiency at scale, balancing autonomy with safety, model switching without context loss, personality consistency across behavior changes.
Constraints
- Hardware baseline: Must run on RTX3060 (desktop) and modern Android devices (2022+)
- Offline-first: All core functionality works without internet on all platforms
- Local models only: No cloud APIs for core inference (LMStudio/Ollama)
- Mixed stack: Python (core/desktop), Kotlin (Android), React/TypeScript (UIs)
- Approval required: No unguarded code execution; second-agent review + user approval on breaking changes
- Git tracked: All of Mai's code changes version-controlled locally
- Sync consistency: Desktop and Android instances maintain synchronized state without server
- OpenCode-driven: All development phases executed through Claude Code (GSD workflow)
- Python venv:
.venvvirtual environment for all Python dependencies
Key Decisions
| Decision | Rationale | Outcome |
|---|---|---|
| Local-first architecture | Ensures privacy, offline capability, and independence from cloud services | — Pending |
| Second-agent review system | Prevents broken self-modifications while allowing auto-improvement | — Pending |
| Personality as code + learned layers | Unshakeable core prevents misuse while allowing authentic growth | — Pending |
| v1 is core systems only | Deliver solid foundation before adding task automation/monitoring | — Pending |
Last updated: 2026-01-26 after adding Android, visualizer, and avatar to v1