docs: map existing codebase
- STACK.md - Technologies and dependencies - ARCHITECTURE.md - System design and patterns - STRUCTURE.md - Directory layout - CONVENTIONS.md - Code style and patterns - TESTING.md - Test structure - INTEGRATIONS.md - External services - CONCERNS.md - Technical debt and issues
This commit is contained in:
177
.planning/codebase/ARCHITECTURE.md
Normal file
177
.planning/codebase/ARCHITECTURE.md
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
# Architecture
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Pattern Overview
|
||||||
|
|
||||||
|
**Overall:** Layered modular architecture with clear separation of concerns
|
||||||
|
|
||||||
|
**Key Characteristics:**
|
||||||
|
- Modular layer separation (Model Interface, Memory, Conversation, Interfaces, Safety, Core Personality)
|
||||||
|
- Local-first, offline-capable design with graceful degradation
|
||||||
|
- Plugin-like interface system allowing CLI and Discord without tight coupling
|
||||||
|
- Sandboxed execution environment for self-improvement code
|
||||||
|
- Bidirectional feedback loops between conversation, memory, and personality
|
||||||
|
|
||||||
|
## Layers
|
||||||
|
|
||||||
|
**Model Interface (Inference Layer):**
|
||||||
|
- Purpose: Abstract model inference operations and handle model switching
|
||||||
|
- Location: `src/models/`
|
||||||
|
- Contains: Model adapters, resource monitoring, context management
|
||||||
|
- Depends on: Local Ollama/LMStudio, system resource API
|
||||||
|
- Used by: Conversation engine, core Mai reasoning
|
||||||
|
|
||||||
|
**Memory System (Persistence Layer):**
|
||||||
|
- Purpose: Store and retrieve conversation history, patterns, learned behaviors
|
||||||
|
- Location: `src/memory/`
|
||||||
|
- Contains: SQLite operations, vector search, compression logic, pattern extraction
|
||||||
|
- Depends on: Local SQLite database, embeddings generation
|
||||||
|
- Used by: Conversation engine for context retrieval, personality learning
|
||||||
|
|
||||||
|
**Conversation Engine (Reasoning Layer):**
|
||||||
|
- Purpose: Orchestrate multi-turn conversations with context awareness
|
||||||
|
- Location: `src/conversation/`
|
||||||
|
- Contains: Turn handling, context window management, clarifying question logic, reasoning transparency
|
||||||
|
- Depends on: Model Interface, Memory System, Personality System
|
||||||
|
- Used by: Interface layers (CLI, Discord)
|
||||||
|
|
||||||
|
**Personality System (Behavior Layer):**
|
||||||
|
- Purpose: Enforce core values and enable personality adaptation
|
||||||
|
- Location: `src/personality/`
|
||||||
|
- Contains: Core personality rules, learned behavior layers, guardrails, values enforcement
|
||||||
|
- Depends on: Configuration files (YAML), Memory System for learned patterns
|
||||||
|
- Used by: Conversation Engine for decision making and refusal logic
|
||||||
|
|
||||||
|
**Safety & Execution Sandbox (Security Layer):**
|
||||||
|
- Purpose: Validate and execute generated code safely with risk assessment
|
||||||
|
- Location: `src/safety/`
|
||||||
|
- Contains: Risk analysis, Docker sandbox management, AST validation, audit logging
|
||||||
|
- Depends on: Docker runtime, code analysis libraries
|
||||||
|
- Used by: Self-improvement system for generated code execution
|
||||||
|
|
||||||
|
**Self-Improvement System (Autonomous Layer):**
|
||||||
|
- Purpose: Analyze own code, generate improvements, manage review and approval workflow
|
||||||
|
- Location: `src/selfmod/`
|
||||||
|
- Contains: Code analysis, improvement generation, review coordination, git integration
|
||||||
|
- Depends on: Safety layer, second-agent review API, git operations, code parser
|
||||||
|
- Used by: Core Mai autonomous operation
|
||||||
|
|
||||||
|
**Interface Adapters (Presentation Layer):**
|
||||||
|
- Purpose: Translate between external communication channels and core conversation engine
|
||||||
|
- Location: `src/interfaces/`
|
||||||
|
- Contains: CLI handler, Discord bot, message queuing, approval workflow
|
||||||
|
- Depends on: Conversation Engine, self-improvement system
|
||||||
|
- Used by: External communication channels (terminal, Discord)
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
**Conversation Flow:**
|
||||||
|
|
||||||
|
1. User message arrives via interface (CLI or Discord)
|
||||||
|
2. Message queued if offline, held in memory if online
|
||||||
|
3. Interface adapter passes to Conversation Engine
|
||||||
|
4. Conversation Engine queries Memory System for relevant context
|
||||||
|
5. Context + message passed to Model Interface with system prompt (includes personality)
|
||||||
|
6. Model generates response
|
||||||
|
7. Response returned to Conversation Engine
|
||||||
|
8. Conversation Engine stores turn in Memory System
|
||||||
|
9. Response sent back through interface to user
|
||||||
|
10. Memory System may trigger asynchronous compression if history grows
|
||||||
|
|
||||||
|
**Self-Improvement Flow:**
|
||||||
|
|
||||||
|
1. Self-Improvement System analyzes own code (triggered by timer or explicit request)
|
||||||
|
2. Generates potential improvements as Python code patches
|
||||||
|
3. Performs AST validation and basic static analysis
|
||||||
|
4. Submits for second-agent review with risk classification
|
||||||
|
5. If LOW risk: auto-approved, sent to Safety layer for execution
|
||||||
|
6. If MEDIUM risk: user approval required via CLI or Discord reactions
|
||||||
|
7. If HIGH/BLOCKED risk: blocked, logged, user notified
|
||||||
|
8. Approved changes executed in Docker sandbox with resource limits
|
||||||
|
9. Execution results captured, logged, committed to git with clear message
|
||||||
|
10. Breaking changes require explicit user approval before commit
|
||||||
|
|
||||||
|
**State Management:**
|
||||||
|
- Conversation state: Maintained in Memory System as persisted history
|
||||||
|
- Model state: Loaded fresh per request, no state persistence between calls
|
||||||
|
- Personality state: Mix of code-enforced rules and learned behavior layers in Memory
|
||||||
|
- Resource state: Monitored continuously, triggering model downgrade if limits approached
|
||||||
|
- Approval state: Tracked in git commits, audit log, and in-memory queue
|
||||||
|
|
||||||
|
## Key Abstractions
|
||||||
|
|
||||||
|
**ModelAdapter:**
|
||||||
|
- Purpose: Abstract different model providers (Ollama local models)
|
||||||
|
- Examples: `src/models/ollama_adapter.py`, `src/models/model_manager.py`
|
||||||
|
- Pattern: Strategy pattern with resource-aware selection logic
|
||||||
|
|
||||||
|
**ContextWindow:**
|
||||||
|
- Purpose: Manage token budget and conversation history within model limits
|
||||||
|
- Examples: `src/conversation/context_manager.py`
|
||||||
|
- Pattern: Intelligent windowing with semantic importance weighting
|
||||||
|
|
||||||
|
**MemoryStore:**
|
||||||
|
- Purpose: Unified interface to conversation history, patterns, and learned behaviors
|
||||||
|
- Examples: `src/memory/store.py`, `src/memory/vector_search.py`
|
||||||
|
- Pattern: Repository pattern with multiple index types
|
||||||
|
|
||||||
|
**PersonalityRules:**
|
||||||
|
- Purpose: Encode Mai's core values as evaluable constraints
|
||||||
|
- Examples: `src/personality/core_rules.py`, `config/personality.yaml`
|
||||||
|
- Pattern: Rule engine with value-based decision making
|
||||||
|
|
||||||
|
**SandboxExecutor:**
|
||||||
|
- Purpose: Execute generated code safely with resource limits and audit trail
|
||||||
|
- Examples: `src/safety/executor.py`, `src/safety/risk_analyzer.py`
|
||||||
|
- Pattern: Facade wrapping Docker API with security checks
|
||||||
|
|
||||||
|
**ApprovalWorkflow:**
|
||||||
|
- Purpose: Coordinate user and agent approval for code changes
|
||||||
|
- Examples: `src/interfaces/approval_handler.py`, `src/selfmod/reviewer.py`
|
||||||
|
- Pattern: State machine with async notification coordination
|
||||||
|
|
||||||
|
## Entry Points
|
||||||
|
|
||||||
|
**CLI Entry:**
|
||||||
|
- Location: `src/interfaces/cli.py` / `__main__.py`
|
||||||
|
- Triggers: `python -m mai` or `mai` command
|
||||||
|
- Responsibilities: Initialize conversation session, handle user input loop, display responses, manage approval prompts
|
||||||
|
|
||||||
|
**Discord Entry:**
|
||||||
|
- Location: `src/interfaces/discord_bot.py`
|
||||||
|
- Triggers: Discord message events
|
||||||
|
- Responsibilities: Extract message context, route to conversation engine, format response, handle reactions for approvals
|
||||||
|
|
||||||
|
**Self-Improvement Entry:**
|
||||||
|
- Location: `src/selfmod/scheduler.py`
|
||||||
|
- Triggers: Timer-based (periodic analysis) or explicit trigger from conversation
|
||||||
|
- Responsibilities: Analyze code, generate improvements, initiate review workflow
|
||||||
|
|
||||||
|
**Core Mai Entry:**
|
||||||
|
- Location: `src/mai.py` (main class)
|
||||||
|
- Triggers: System startup
|
||||||
|
- Responsibilities: Initialize all systems (models, memory, personality), coordinate between layers
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
**Strategy:** Graceful degradation with clear user communication
|
||||||
|
|
||||||
|
**Patterns:**
|
||||||
|
- Model unavailable: Fall back to smaller model if available, notify user of reduced capabilities
|
||||||
|
- Memory retrieval failure: Continue conversation without historical context, log error
|
||||||
|
- Network error: Queue offline messages, retry on reconnection (Discord only)
|
||||||
|
- Unsafe code generated: Block execution, log with risk analysis, notify user
|
||||||
|
- Syntax error in generated code: Reject change, log, generate new proposal
|
||||||
|
|
||||||
|
## Cross-Cutting Concerns
|
||||||
|
|
||||||
|
**Logging:** Structured logging with severity levels throughout codebase. Use Python `logging` module with JSON formatter for production. Log all: model selections, memory operations, safety decisions, approval workflows, code changes.
|
||||||
|
|
||||||
|
**Validation:** Input validation at interface boundaries. AST validation for generated code. Type hints throughout codebase with mypy enforcement.
|
||||||
|
|
||||||
|
**Authentication:** None required for local CLI. Discord bot authenticated via token (environment variable). API calls between services use simple function calls (single-process model).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Architecture analysis: 2026-01-26*
|
||||||
297
.planning/codebase/CONCERNS.md
Normal file
297
.planning/codebase/CONCERNS.md
Normal file
@@ -0,0 +1,297 @@
|
|||||||
|
# Codebase Concerns
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Tech Debt
|
||||||
|
|
||||||
|
**Incomplete Memory System Integration:**
|
||||||
|
- Issue: Memory manager gracefully initializes but may fail silently when dependencies are missing
|
||||||
|
- Files: `src/mai/memory/manager.py`
|
||||||
|
- Impact: Memory features degrade ungracefully; users don't know compression or retrieval is disabled
|
||||||
|
- Fix approach: Add explicit logging and health checks on startup, expose memory system status in CLI
|
||||||
|
|
||||||
|
**Large Monolithic Memory Manager:**
|
||||||
|
- Issue: MemoryManager is 1036 lines with multiple responsibilities (storage, compression, retrieval orchestration)
|
||||||
|
- Files: `src/mai/memory/manager.py`
|
||||||
|
- Impact: Difficult to test individual memory subsystems; changes affect multiple concerns simultaneously
|
||||||
|
- Fix approach: Extract retrieval delegation and compression orchestration into separate coordinator classes
|
||||||
|
|
||||||
|
**Conversation Engine Complexity:**
|
||||||
|
- Issue: ConversationEngine is 648 lines handling timing, state, decomposition, reasoning, interruption, and metrics
|
||||||
|
- Files: `src/mai/conversation/engine.py`
|
||||||
|
- Impact: High cognitive load for maintainers; hard to isolate bugs in specific subsystems
|
||||||
|
- Fix approach: Separate concerns into focused orchestrator (engine) and behavior modules (timing/reasoning/decomposition are already separated but loosely coupled)
|
||||||
|
|
||||||
|
**Permission/Approval System Fragility:**
|
||||||
|
- Issue: ApprovalSystem uses regex pattern matching for risk analysis with hardcoded patterns
|
||||||
|
- Files: `src/mai/sandbox/approval_system.py`
|
||||||
|
- Impact: Pattern-matching approach is fragile (false positives/negatives); patterns not maintainable as code evolves
|
||||||
|
- Fix approach: Replace regex with AST-based code analysis for more reliable risk detection; move risk patterns to configuration
|
||||||
|
|
||||||
|
**Docker Executor Dependency Chain:**
|
||||||
|
- Issue: DockerExecutor falls back silently to unavailable state if Docker isn't installed
|
||||||
|
- Files: `src/mai/sandbox/docker_executor.py`
|
||||||
|
- Impact: Approval system thinks code is sandboxed when Docker is missing; security false sense of safety
|
||||||
|
- Fix approach: Require explicit Docker availability check at startup; block code execution if Docker unavailable and user requests sandboxing
|
||||||
|
|
||||||
|
## Known Bugs
|
||||||
|
|
||||||
|
**Session Persistence Restoration:**
|
||||||
|
- Symptoms: "ConversationState object has no attribute 'set_conversation_history'" error when restarting CLI
|
||||||
|
- Files: `src/mai/conversation/state.py`, `src/app/__main__.py`
|
||||||
|
- Trigger: Start conversation, exit CLI, restart CLI session
|
||||||
|
- Workaround: None - session restoration broken; users lose conversation history
|
||||||
|
- Status: Identified in Phase 6 UAT but remediation code not applied (commit c70ee88 "Complete fresh slate" removed implementation)
|
||||||
|
|
||||||
|
**Session File Feedback Missing:**
|
||||||
|
- Symptoms: Users don't see where/when session files are created
|
||||||
|
- Files: `src/app/__main__.py`
|
||||||
|
- Trigger: Create new session or use /session command
|
||||||
|
- Workaround: Manually check ~/.mai/session.json directory
|
||||||
|
- Status: Identified in Phase 6 UAT as major issue (test 3 failed)
|
||||||
|
|
||||||
|
**Resource Display Color Coding:**
|
||||||
|
- Symptoms: Resource monitoring displays plain text instead of color-coded status indicators
|
||||||
|
- Files: `src/app/__main__.py`
|
||||||
|
- Trigger: Run CLI and observe resource display during conversation
|
||||||
|
- Workaround: Parse output manually to understand resource status
|
||||||
|
- Status: Identified in Phase 6 UAT as minor issue (test 5 failed); root cause: Rich console loses color output in non-terminal environments
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
**Approval System Risk Analysis Insufficient:**
|
||||||
|
- Risk: Regex-based risk detection can be bypassed with obfuscated code (e.g., string concatenation to build dangerous commands)
|
||||||
|
- Files: `src/mai/sandbox/approval_system.py`
|
||||||
|
- Current mitigation: Hardcoded high-risk patterns (os.system, exec, eval); fallback to block on unrecognized patterns
|
||||||
|
- Recommendations:
|
||||||
|
- Implement AST-based code analysis for more reliable detection
|
||||||
|
- Add code deobfuscation step before risk analysis
|
||||||
|
- Create risk assessment database with test cases and known bypasses
|
||||||
|
- Require explicit docker verification before allowing code execution
|
||||||
|
|
||||||
|
**Docker Fallback Security Gap:**
|
||||||
|
- Risk: Code could execute without actual sandboxing if Docker unavailable, creating false sense of security
|
||||||
|
- Files: `src/mai/sandbox/docker_executor.py`
|
||||||
|
- Current mitigation: AuditLogger records all execution; approval system presents requests regardless
|
||||||
|
- Recommendations:
|
||||||
|
- Fail-safe: Block code execution if Docker unavailable and user hasn't explicitly allowed non-sandboxed execution
|
||||||
|
- Add warning dialog explaining sandbox unavailability
|
||||||
|
- Log all non-sandboxed execution attempts explicitly
|
||||||
|
- Require explicit override from user with confirmation
|
||||||
|
|
||||||
|
**Approval Preference Learning Risk:**
|
||||||
|
- Risk: User can set "auto_allow" on risky code patterns; once learned, code execution auto-approves without user intervention
|
||||||
|
- Files: `src/mai/sandbox/approval_system.py` (lines with `user_preferences` and `auto_allow`)
|
||||||
|
- Current mitigation: Auto-allow only applies to LOW risk level code
|
||||||
|
- Recommendations:
|
||||||
|
- Require explicit user confirmation before enabling auto-allow (not just responding "a")
|
||||||
|
- Log all auto-approved executions in audit trail with reason
|
||||||
|
- Add periodic review mechanism for auto-allow rules (e.g., "You have X auto-approved rules, review them?" on startup)
|
||||||
|
- Restrict auto-allow to strictly limited operation types (print, basic math, not file operations)
|
||||||
|
|
||||||
|
## Performance Bottlenecks
|
||||||
|
|
||||||
|
**Memory Retrieval Search Not Optimized:**
|
||||||
|
- Problem: ContextRetriever does full database scans for semantic similarity without indexing
|
||||||
|
- Files: `src/mai/memory/retrieval.py`
|
||||||
|
- Cause: Vector similarity search likely using brute-force nearest-neighbor without FAISS or similar
|
||||||
|
- Improvement path:
|
||||||
|
- Add FAISS vector index for semantic search acceleration
|
||||||
|
- Implement result caching for frequent queries
|
||||||
|
- Add search result pagination to avoid loading entire result sets
|
||||||
|
- Benchmark retrieval latency and set targets (e.g., <500ms for top-10 similar conversations)
|
||||||
|
|
||||||
|
**Conversation State History Accumulation:**
|
||||||
|
- Problem: ConversationState.conversation_history grows unbounded during long sessions
|
||||||
|
- Files: `src/mai/conversation/state.py`
|
||||||
|
- Cause: No automatic truncation or archival of old turns; all conversation turns kept in memory
|
||||||
|
- Improvement path:
|
||||||
|
- Implement sliding window of recent turns (e.g., keep last 50 turns in memory)
|
||||||
|
- Archive old turns to disk and load on demand
|
||||||
|
- Add compression trigger at configurable message count
|
||||||
|
- Monitor memory usage and alert when conversation history exceeds threshold
|
||||||
|
|
||||||
|
**Memory Manager Compression Not Scheduled:**
|
||||||
|
- Problem: Manual `compress_conversation()` calls required; no automatic compression scheduling
|
||||||
|
- Files: `src/mai/memory/manager.py`
|
||||||
|
- Cause: Compression is triggered manually or not at all; no background task or event-driven compression
|
||||||
|
- Improvement path:
|
||||||
|
- Implement background compression task triggered by conversation age or message count
|
||||||
|
- Add periodic compression sweep for all old conversations
|
||||||
|
- Make compression interval configurable (e.g., compress every 500 messages or 24 hours)
|
||||||
|
- Track compression effectiveness and adjust thresholds
|
||||||
|
|
||||||
|
## Fragile Areas
|
||||||
|
|
||||||
|
**Ollama Integration Dependency:**
|
||||||
|
- Files: `src/mai/model/ollama_client.py`, `src/mai/core/interface.py`
|
||||||
|
- Why fragile: Hard-coded Ollama endpoint assumption; no fallback model provider; no retry logic for model inference
|
||||||
|
- Safe modification:
|
||||||
|
- Use dependency injection for model provider (interface-based)
|
||||||
|
- Add configurable model provider endpoints
|
||||||
|
- Implement retry logic with exponential backoff for transient failures
|
||||||
|
- Add model availability detection at startup
|
||||||
|
- Test coverage: Limited tests for model switching and unavailability scenarios
|
||||||
|
|
||||||
|
**Git Integration Fragility:**
|
||||||
|
- Files: `src/mai/git/committer.py`, `src/mai/git/workflow.py`
|
||||||
|
- Why fragile: Assumes clean git state; no handling for merge conflicts, detached HEAD, or dirty working directory
|
||||||
|
- Safe modification:
|
||||||
|
- Add pre-commit git status validation
|
||||||
|
- Handle merge conflict detection and defer commits
|
||||||
|
- Implement conflict resolution strategy (manual review or aborting)
|
||||||
|
- Test against all git states (detached HEAD, dirty working tree, conflicted merge)
|
||||||
|
- Test coverage: No tests for edge cases like merge conflicts
|
||||||
|
|
||||||
|
**Conversation State Serialization Round-Trip:**
|
||||||
|
- Files: `src/mai/conversation/state.py`, `src/mai/models/conversation.py`
|
||||||
|
- Why fragile: ConversationTurn -> Ollama message -> ConversationTurn conversion can lose context
|
||||||
|
- Safe modification:
|
||||||
|
- Add comprehensive unit tests for serialization round-trip
|
||||||
|
- Document serialization format and invariants
|
||||||
|
- Add validation after deserialization (verify message count, order, role integrity)
|
||||||
|
- Create fixture tests with edge cases (unicode, very long messages, special characters)
|
||||||
|
- Test coverage: No existing tests for message serialization/deserialization
|
||||||
|
|
||||||
|
**Docker Configuration Hardcoding:**
|
||||||
|
- Files: `src/mai/sandbox/docker_executor.py`
|
||||||
|
- Why fragile: Docker image names, CPU limits, memory limits hardcoded as class constants
|
||||||
|
- Safe modification:
|
||||||
|
- Move Docker config to configuration file
|
||||||
|
- Add validation on startup that Docker limits match system resources
|
||||||
|
- Document all Docker configuration assumptions
|
||||||
|
- Make limits tunable per system resource profile
|
||||||
|
- Test coverage: Docker integration tests likely mocked; no testing on actual Docker variations
|
||||||
|
|
||||||
|
## Scaling Limits
|
||||||
|
|
||||||
|
**Memory Database Size Growth:**
|
||||||
|
- Current capacity: SQLite with no explicit limits; storage grows with every conversation
|
||||||
|
- Limit: SQLite performance degrades significantly above ~1GB; queries become slow
|
||||||
|
- Scaling path:
|
||||||
|
- Implement database rotation (archive old conversations, start new DB periodically)
|
||||||
|
- Add migration path to PostgreSQL for production deployments
|
||||||
|
- Implement automatic old conversation archival (move to cold storage after 30 days)
|
||||||
|
- Add database vacuum and index optimization on scheduled basis
|
||||||
|
|
||||||
|
**Conversation Context Window Management:**
|
||||||
|
- Current capacity: Model context window determined by Ollama model selection (varies)
|
||||||
|
- Limit: ConversationEngine doesn't prevent context overflow; will fail when history exceeds model limit
|
||||||
|
- Scaling path:
|
||||||
|
- Track token count of conversation history and refuse new messages before overflow
|
||||||
|
- Implement automatic compression trigger at 80% context usage
|
||||||
|
- Add model switching logic to use larger-context models if available
|
||||||
|
- Document context budget requirements per model
|
||||||
|
|
||||||
|
**Approval History Unbounded Growth:**
|
||||||
|
- Current capacity: ApprovalSystem.approval_history list grows indefinitely
|
||||||
|
- Limit: Memory accumulation over time; each approval decision stored in memory forever
|
||||||
|
- Scaling path:
|
||||||
|
- Archive approval history to database after threshold (e.g., 1000 decisions)
|
||||||
|
- Implement approval history rotation with configurable retention
|
||||||
|
- Add aggregate statistics (approval patterns) instead of storing raw history
|
||||||
|
- Clean up approval history on startup or scheduled task
|
||||||
|
|
||||||
|
## Dependencies at Risk
|
||||||
|
|
||||||
|
**Ollama Dependency and Model Availability:**
|
||||||
|
- Risk: Hard requirement on Ollama being available and having models installed
|
||||||
|
- Impact: Mai cannot function without Ollama; no fallback to cloud inference or other providers
|
||||||
|
- Migration plan:
|
||||||
|
- Implement abstract model provider interface
|
||||||
|
- Add support for OpenAI/other cloud models as fallback (even if v1 is offline-first)
|
||||||
|
- Document minimum Ollama model requirements
|
||||||
|
- Add diagnostic tool to check Ollama health on startup
|
||||||
|
|
||||||
|
**Docker Dependency for Sandboxing:**
|
||||||
|
- Risk: Docker required for code execution safety; no alternative sandbox implementations
|
||||||
|
- Impact: Users without Docker can't safely execute generated code; no graceful degradation
|
||||||
|
- Migration plan:
|
||||||
|
- Implement abstract executor interface (not just DockerExecutor)
|
||||||
|
- Add noop executor for testing
|
||||||
|
- Consider lightweight alternatives (seccomp, chroot, or bubblewrap) for Linux systems
|
||||||
|
- Add explicit warning if Docker unavailable
|
||||||
|
|
||||||
|
**Rich Library Terminal Detection:**
|
||||||
|
- Risk: Rich disables colors in non-terminal environments; users see degraded UX
|
||||||
|
- Impact: Resource monitoring and status displays lack visual feedback in non-terminal contexts
|
||||||
|
- Migration plan:
|
||||||
|
- Use Console(force_terminal=True) to force color output when desired
|
||||||
|
- Add configuration option for color preference
|
||||||
|
- Implement fallback emoji/unicode indicators for non-color environments
|
||||||
|
- Test in various terminal emulators and SSH sessions
|
||||||
|
|
||||||
|
## Missing Critical Features
|
||||||
|
|
||||||
|
**Session Data Portability:**
|
||||||
|
- Problem: Session files are JSON but no export/import mechanism; can't backup or migrate sessions
|
||||||
|
- Blocks: Users can't back up conversations; losing ~/.mai/session.json loses all context
|
||||||
|
- Fix: Add export/import commands (/export, /import) and document session file format
|
||||||
|
|
||||||
|
**Conversation Memory Persistence:**
|
||||||
|
- Problem: Conversation history is session-scoped (stored in memory); not saved to memory system
|
||||||
|
- Blocks: Long-term pattern learning relies on memory system but conversations aren't automatically stored
|
||||||
|
- Fix: Implement automatic conversation archival to memory system after session ends
|
||||||
|
|
||||||
|
**User Preference Learning Audit Trail:**
|
||||||
|
- Problem: User preferences for auto-approval learned silently; no visibility into what patterns auto-approve
|
||||||
|
- Blocks: Users can't audit their own auto-approval rules; hard to recover from accidentally enabling auto-allow
|
||||||
|
- Fix: Add /preferences or /audit command to show all learned rules and allow revocation
|
||||||
|
|
||||||
|
**Resource Constraint Graceful Degradation:**
|
||||||
|
- Problem: System shows resource usage but doesn't adapt model selection or conversation behavior
|
||||||
|
- Blocks: Mai can't suggest switching to smaller models when resources tight
|
||||||
|
- Fix: Implement resource-aware model recommendation system
|
||||||
|
|
||||||
|
**Approval Change Logging:**
|
||||||
|
- Problem: Approval decisions not tracked in git; can't audit "who approved what when"
|
||||||
|
- Blocks: No accountability trail for approval decisions
|
||||||
|
- Fix: Log all approval decisions to git with commit messages including timestamp and user
|
||||||
|
|
||||||
|
## Test Coverage Gaps
|
||||||
|
|
||||||
|
**Docker Executor Network Isolation:**
|
||||||
|
- What's not tested: Whether network actually restricted in Docker containers
|
||||||
|
- Files: `src/mai/sandbox/docker_executor.py`
|
||||||
|
- Risk: Code might have network access despite supposed isolation
|
||||||
|
- Priority: High (security-critical)
|
||||||
|
|
||||||
|
**Session Persistence Edge Cases:**
|
||||||
|
- What's not tested: Very large conversations (1000+ messages), unicode characters, special characters
|
||||||
|
- Files: `src/mai/conversation/state.py`, session persistence code
|
||||||
|
- Risk: Session files corrupt or lose data with edge case inputs
|
||||||
|
- Priority: High (data loss)
|
||||||
|
|
||||||
|
**Approval System Obfuscation Bypass:**
|
||||||
|
- What's not tested: Obfuscated code patterns, string concatenation attacks, bytecode approaches
|
||||||
|
- Files: `src/mai/sandbox/approval_system.py`
|
||||||
|
- Risk: Risky code could slip through as "low risk" via obfuscation
|
||||||
|
- Priority: High (security-critical)
|
||||||
|
|
||||||
|
**Memory Compression Round-Trip Data Loss:**
|
||||||
|
- What's not tested: Whether compressed conversations can be exactly reconstructed
|
||||||
|
- Files: `src/mai/memory/compression.py`, `src/mai/memory/storage.py`
|
||||||
|
- Risk: Compression could lose important context patterns; compression metrics may be misleading
|
||||||
|
- Priority: Medium (data integrity)
|
||||||
|
|
||||||
|
**Model Switching During Active Conversation:**
|
||||||
|
- What's not tested: Switching models mid-conversation, context migration, embedding space changes
|
||||||
|
- Files: `src/mai/model/switcher.py`, `src/mai/conversation/engine.py`
|
||||||
|
- Risk: Context might not transfer correctly when models switch
|
||||||
|
- Priority: Medium (feature reliability)
|
||||||
|
|
||||||
|
**Offline Queue Conflict Resolution:**
|
||||||
|
- What's not tested: What happens when offline messages conflict with new context when reconnecting
|
||||||
|
- Files: `src/mai/conversation/engine.py` (offline queueing)
|
||||||
|
- Risk: Offline messages might create incoherent conversation when reconnected
|
||||||
|
- Priority: Medium (conversation coherence)
|
||||||
|
|
||||||
|
**Resource Detector System Resource Edge Cases:**
|
||||||
|
- What's not tested: GPU detection on systems with unusual hardware, CPU count on virtual systems
|
||||||
|
- Files: `src/mai/model/resource_detector.py`
|
||||||
|
- Risk: Wrong model selection due to misdetected resources
|
||||||
|
- Priority: Low (graceful degradation usually handles this)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Concerns audit: 2026-01-26*
|
||||||
298
.planning/codebase/CONVENTIONS.md
Normal file
298
.planning/codebase/CONVENTIONS.md
Normal file
@@ -0,0 +1,298 @@
|
|||||||
|
# Coding Conventions
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
**Note:** This codebase is in planning phase. No source code has been written yet. These conventions are **prescriptive** for the Mai project and should be applied to all code from the first commit forward.
|
||||||
|
|
||||||
|
## Naming Patterns
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Python modules: `lowercase_with_underscores.py` (PEP 8)
|
||||||
|
- Configuration files: `config.yaml`, `.env.example`
|
||||||
|
- Test files: `test_module_name.py` (co-located with source)
|
||||||
|
- Example: `src/memory/storage.py`, `src/memory/test_storage.py`
|
||||||
|
|
||||||
|
**Functions:**
|
||||||
|
- Use `snake_case` for all function names (PEP 8)
|
||||||
|
- Private functions: Prefix with single underscore `_private_function()`
|
||||||
|
- Async functions: Use `async def async_operation()` naming
|
||||||
|
- Example: `def get_conversation_history()`, `async def stream_response()`
|
||||||
|
|
||||||
|
**Variables:**
|
||||||
|
- Use `snake_case` for all variable names
|
||||||
|
- Constants: `UPPERCASE_WITH_UNDERSCORES`
|
||||||
|
- Private module variables: Prefix with `_`
|
||||||
|
- Example: `conversation_history`, `MAX_CONTEXT_TOKENS`, `_internal_cache`
|
||||||
|
|
||||||
|
**Types:**
|
||||||
|
- Classes: `PascalCase`
|
||||||
|
- Enums: `PascalCase` (inherit from `Enum`)
|
||||||
|
- TypedDict: `PascalCase` with `Dict` suffix
|
||||||
|
- Example: `class ConversationManager`, `class ErrorLevel(Enum)`, `class MemoryConfigDict(TypedDict)`
|
||||||
|
|
||||||
|
**Directories:**
|
||||||
|
- Core modules: `src/[module_name]/` (lowercase, plural when appropriate)
|
||||||
|
- Example: `src/models/`, `src/memory/`, `src/safety/`, `src/interfaces/`
|
||||||
|
|
||||||
|
## Code Style
|
||||||
|
|
||||||
|
**Formatting:**
|
||||||
|
- Tool: **Ruff** (formatter and linter)
|
||||||
|
- Line length: 88 characters (Ruff default)
|
||||||
|
- Quote style: Double quotes (`"string"`)
|
||||||
|
- Indentation: 4 spaces (no tabs)
|
||||||
|
|
||||||
|
**Linting:**
|
||||||
|
- Tool: **Ruff**
|
||||||
|
- Configuration enforced via `.ruff.toml` (when created)
|
||||||
|
- All imports must pass ruff checks
|
||||||
|
- No unused imports allowed
|
||||||
|
- Type hints required for public functions
|
||||||
|
|
||||||
|
**Python Version:**
|
||||||
|
- Minimum: Python 3.10+
|
||||||
|
- Use modern type hints: `from typing import *`
|
||||||
|
- Use `str | None` instead of `Optional[str]` (union syntax)
|
||||||
|
|
||||||
|
## Import Organization
|
||||||
|
|
||||||
|
**Order:**
|
||||||
|
1. Standard library imports (`import os`, `import sys`)
|
||||||
|
2. Third-party imports (`import discord`, `import numpy`)
|
||||||
|
3. Local imports (`from src.memory import Storage`)
|
||||||
|
4. Blank line between each group
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```python
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import discord
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
from src.memory import ConversationStorage
|
||||||
|
from src.models import ModelManager
|
||||||
|
```
|
||||||
|
|
||||||
|
**Path Aliases:**
|
||||||
|
- Use relative imports from `src/` root
|
||||||
|
- Avoid deep relative imports (no `../../../`)
|
||||||
|
- Example: `from src.safety import SandboxExecutor` not `from ...safety import SandboxExecutor`
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
**Patterns:**
|
||||||
|
- Define domain-specific exceptions in `src/exceptions.py`
|
||||||
|
- Use exception hierarchy (base `MaiException`, specific subclasses)
|
||||||
|
- Always include context in exceptions (error code, details, suggestions)
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class MaiException(Exception):
|
||||||
|
"""Base exception for Mai framework."""
|
||||||
|
def __init__(self, code: str, message: str, details: dict | None = None):
|
||||||
|
self.code = code
|
||||||
|
self.message = message
|
||||||
|
self.details = details or {}
|
||||||
|
super().__init__(f"[{code}] {message}")
|
||||||
|
|
||||||
|
class ModelError(MaiException):
|
||||||
|
"""Raised when model inference fails."""
|
||||||
|
pass
|
||||||
|
|
||||||
|
class MemoryError(MaiException):
|
||||||
|
"""Raised when memory operations fail."""
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
- Log before raising (see Logging section)
|
||||||
|
- Use context managers for cleanup (async context managers for async code)
|
||||||
|
- Never catch bare `Exception` - catch specific exceptions
|
||||||
|
|
||||||
|
## Logging
|
||||||
|
|
||||||
|
**Framework:** `logging` module (Python standard library)
|
||||||
|
|
||||||
|
**Patterns:**
|
||||||
|
- Create logger per module: `logger = logging.getLogger(__name__)`
|
||||||
|
- Log levels guide:
|
||||||
|
- `DEBUG`: Detailed diagnostic info (token counts, decision trees)
|
||||||
|
- `INFO`: Significant operational events (conversation started, model loaded)
|
||||||
|
- `WARNING`: Unexpected but handled conditions (fallback triggered, retry)
|
||||||
|
- `ERROR`: Failed operation (model error, memory access failed)
|
||||||
|
- `CRITICAL`: System-level failures (cannot recover)
|
||||||
|
- Structured logging preferred (include operation context)
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import logging
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
async def invoke_model(prompt: str, model: str) -> str:
|
||||||
|
logger.debug(f"Invoking model={model} with token_count={len(prompt.split())}")
|
||||||
|
try:
|
||||||
|
response = await model_manager.generate(prompt)
|
||||||
|
logger.info(f"Model response generated, length={len(response)}")
|
||||||
|
return response
|
||||||
|
except ModelError as e:
|
||||||
|
logger.error(f"Model invocation failed: {e.code}", exc_info=True)
|
||||||
|
raise
|
||||||
|
```
|
||||||
|
|
||||||
|
## Comments
|
||||||
|
|
||||||
|
**When to Comment:**
|
||||||
|
- Complex logic requiring explanation (multi-step algorithms, non-obvious decisions)
|
||||||
|
- Important context that code alone cannot convey (why a workaround exists)
|
||||||
|
- Do NOT comment obvious code (`x = 1 # set x to 1` is noise)
|
||||||
|
- Do NOT duplicate what the code already says
|
||||||
|
|
||||||
|
**JSDoc/Docstrings:**
|
||||||
|
- Use Google-style docstrings for all public functions/classes
|
||||||
|
- Include return type even if type hints exist (for readability)
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def get_memory_context(
|
||||||
|
query: str,
|
||||||
|
max_tokens: int = 2000,
|
||||||
|
) -> str:
|
||||||
|
"""Retrieve relevant memory context for a query.
|
||||||
|
|
||||||
|
Performs vector similarity search on conversation history,
|
||||||
|
compresses results to fit token budget, and returns formatted context.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
query: The search query for memory retrieval.
|
||||||
|
max_tokens: Maximum tokens in returned context (default 2000).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted memory context as markdown-structured string.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
MemoryError: If database query fails or storage is corrupted.
|
||||||
|
"""
|
||||||
|
```
|
||||||
|
|
||||||
|
## Function Design
|
||||||
|
|
||||||
|
**Size:**
|
||||||
|
- Target: Functions under 50 lines (hard limit: 100 lines)
|
||||||
|
- Break complex logic into smaller helper functions
|
||||||
|
- One responsibility per function (single responsibility principle)
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- Maximum 4 positional parameters
|
||||||
|
- Use keyword-only arguments for optional params: `def func(required, *, optional=None)`
|
||||||
|
- Use dataclasses or TypedDict for complex parameter groups
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Good: Clear structure
|
||||||
|
async def approve_change(
|
||||||
|
change_id: str,
|
||||||
|
*,
|
||||||
|
reviewer_id: str,
|
||||||
|
decision: Literal["approve", "reject"],
|
||||||
|
reason: str | None = None,
|
||||||
|
) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Bad: Too many params
|
||||||
|
async def approve_change(change_id, reviewer_id, decision, reason, timestamp, context, metadata):
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
**Return Values:**
|
||||||
|
- Explicitly return values (no implicit `None` returns unless documented)
|
||||||
|
- Use `Optional[T]` or `T | None` in type hints for nullable returns
|
||||||
|
- Prefer returning data objects over tuples: return `Result` not `(status, data, error)`
|
||||||
|
- Async functions return awaitable, not callbacks
|
||||||
|
|
||||||
|
## Module Design
|
||||||
|
|
||||||
|
**Exports:**
|
||||||
|
- Define `__all__` in each module to be explicit about public API
|
||||||
|
- Example in `src/memory/__init__.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from src.memory.storage import ConversationStorage
|
||||||
|
from src.memory.compression import MemoryCompressor
|
||||||
|
|
||||||
|
__all__ = ["ConversationStorage", "MemoryCompressor"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Barrel Files:**
|
||||||
|
- Use `__init__.py` to export key classes/functions from submodules
|
||||||
|
- Keep import chains shallow (max 2 levels deep)
|
||||||
|
- Example structure:
|
||||||
|
```
|
||||||
|
src/
|
||||||
|
├── memory/
|
||||||
|
│ ├── __init__.py (exports Storage, Compressor)
|
||||||
|
│ ├── storage.py
|
||||||
|
│ └── compression.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**Async/Await:**
|
||||||
|
- All I/O operations (database, API calls, file I/O) must be async
|
||||||
|
- Use `asyncio` for concurrency, not threading
|
||||||
|
- Async context managers for resource management:
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def process_request(prompt: str) -> str:
|
||||||
|
async with model_manager.get_session() as session:
|
||||||
|
response = await session.generate(prompt)
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
## Type Hints
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- All public function signatures must have type hints
|
||||||
|
- Use `from __future__ import annotations` for forward references
|
||||||
|
- Prefer union syntax: `str | None` over `Optional[str]`
|
||||||
|
- Use `Literal` for string enums: `Literal["approve", "reject"]`
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from __future__ import annotations
|
||||||
|
from typing import Literal
|
||||||
|
|
||||||
|
def evaluate_risk(code: str) -> Literal["LOW", "MEDIUM", "HIGH", "BLOCKED"]:
|
||||||
|
"""Evaluate code risk level."""
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
**Pattern:**
|
||||||
|
- Use YAML for human-editable config files
|
||||||
|
- Use environment variables for secrets (never commit `.env`)
|
||||||
|
- Validation at import time (fail fast if config invalid)
|
||||||
|
- Example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# config.py
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
class Config:
|
||||||
|
DEBUG = os.getenv("DEBUG", "false").lower() == "true"
|
||||||
|
MODELS_PATH = Path(os.getenv("MODELS_PATH", "~/.mai/models")).expanduser()
|
||||||
|
MAX_CONTEXT_TOKENS = int(os.getenv("MAX_CONTEXT_TOKENS", "8000"))
|
||||||
|
|
||||||
|
# Validate on import
|
||||||
|
if not MODELS_PATH.exists():
|
||||||
|
raise RuntimeError(f"Models path does not exist: {MODELS_PATH}")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Convention guide: 2026-01-26*
|
||||||
|
*Status: Prescriptive for Mai v1 implementation*
|
||||||
129
.planning/codebase/INTEGRATIONS.md
Normal file
129
.planning/codebase/INTEGRATIONS.md
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
# External Integrations
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## APIs & External Services
|
||||||
|
|
||||||
|
**Model Inference:**
|
||||||
|
- LMStudio - Local model server for inference and model switching
|
||||||
|
- SDK/Client: LMStudio Python API
|
||||||
|
- Auth: None (local service, no authentication required)
|
||||||
|
- Configuration: model_path env var, endpoint URL
|
||||||
|
|
||||||
|
- Ollama - Alternative local model management system
|
||||||
|
- SDK/Client: Ollama REST API (HTTP)
|
||||||
|
- Auth: None (local service)
|
||||||
|
- Purpose: Model loading, switching, inference with resource detection
|
||||||
|
|
||||||
|
**Communication & Approvals:**
|
||||||
|
- Discord - Bot interface for conversation and change approvals
|
||||||
|
- SDK/Client: discord.py library
|
||||||
|
- Auth: DISCORD_BOT_TOKEN env variable
|
||||||
|
- Purpose: Multi-turn conversations, approval reactions (thumbs up/down), status updates
|
||||||
|
|
||||||
|
## Data Storage
|
||||||
|
|
||||||
|
**Databases:**
|
||||||
|
- SQLite3 (local file-based)
|
||||||
|
- Connection: Local file path, no remote connection
|
||||||
|
- Client: Python sqlite3 (stdlib) or SQLAlchemy ORM
|
||||||
|
- Purpose: Persistent conversation history, memory compression, learned patterns
|
||||||
|
- Location: Local filesystem (.db files)
|
||||||
|
|
||||||
|
**File Storage:**
|
||||||
|
- Local filesystem only - Git-tracked code changes, conversation history backups
|
||||||
|
- No cloud storage integration in v1
|
||||||
|
|
||||||
|
**Caching:**
|
||||||
|
- In-memory caching for current conversation context
|
||||||
|
- Redis: Not used in v1 (local-first constraint)
|
||||||
|
- Model context window management: Token-based cache within model inference
|
||||||
|
|
||||||
|
## Authentication & Identity
|
||||||
|
|
||||||
|
**Auth Provider:**
|
||||||
|
- Custom local auth - No external identity provider
|
||||||
|
- Implementation:
|
||||||
|
- Discord user ID as conversation context identifier
|
||||||
|
- Optional local password/PIN for CLI access
|
||||||
|
- No OAuth/cloud identity providers (offline-first requirement)
|
||||||
|
|
||||||
|
## Monitoring & Observability
|
||||||
|
|
||||||
|
**Error Tracking:**
|
||||||
|
- None (local only, no error reporting service)
|
||||||
|
- Local audit logging to SQLite instead
|
||||||
|
|
||||||
|
**Logs:**
|
||||||
|
- File-based logging to `.logs/` directory
|
||||||
|
- Format: Structured JSON logs with timestamp, level, context
|
||||||
|
- Rotation: Size-based or time-based rotation strategy
|
||||||
|
- No external log aggregation (offline-first)
|
||||||
|
|
||||||
|
## CI/CD & Deployment
|
||||||
|
|
||||||
|
**Hosting:**
|
||||||
|
- Local machine only (desktop/laptop with RTX 3060+)
|
||||||
|
- No cloud hosting in v1
|
||||||
|
|
||||||
|
**CI Pipeline:**
|
||||||
|
- GitHub Actions for Discord webhook on push
|
||||||
|
- Workflow: `.github/workflows/discord_sync.yml`
|
||||||
|
- Trigger: Push events
|
||||||
|
- Action: POST to Discord webhook for notification
|
||||||
|
|
||||||
|
**Git Integration:**
|
||||||
|
- All Mai's self-modifications committed automatically with git
|
||||||
|
- Local git repo tracking all code changes
|
||||||
|
- Commit messages include decision context and review results
|
||||||
|
|
||||||
|
## Environment Configuration
|
||||||
|
|
||||||
|
**Required env vars:**
|
||||||
|
- `DISCORD_BOT_TOKEN` - Discord bot authentication
|
||||||
|
- `LMSTUDIO_ENDPOINT` - LMStudio API URL (default: localhost:8000)
|
||||||
|
- `OLLAMA_ENDPOINT` - Ollama API URL (optional alternative, default: localhost:11434)
|
||||||
|
- `DISCORD_USER_ID` - User Discord ID for approval requests
|
||||||
|
- `MEMORY_DB_PATH` - SQLite database file location
|
||||||
|
- `MODEL_CACHE_DIR` - Directory for model files
|
||||||
|
- `CPU_CORES_AVAILABLE` - System CPU count for resource management
|
||||||
|
- `GPU_VRAM_AVAILABLE` - VRAM in GB for model selection
|
||||||
|
- `SANDBOX_DOCKER_IMAGE` - Docker image ID for code sandbox execution
|
||||||
|
|
||||||
|
**Secrets location:**
|
||||||
|
- `.env` file (Python-dotenv) for local development
|
||||||
|
- Environment variables for production/runtime
|
||||||
|
- Git-ignored: `.env` not committed
|
||||||
|
|
||||||
|
## Webhooks & Callbacks
|
||||||
|
|
||||||
|
**Incoming:**
|
||||||
|
- Discord message webhooks - Handled by discord.py bot event listeners
|
||||||
|
- No external webhook endpoints in v1
|
||||||
|
|
||||||
|
**Outgoing:**
|
||||||
|
- Discord webhook for git notifications (configured in GitHub Actions)
|
||||||
|
- Endpoint: Stored in GitHub secrets as WEBHOOK
|
||||||
|
- Triggered on: git push events
|
||||||
|
- Payload: Git commit information (author, message, timestamp)
|
||||||
|
|
||||||
|
**Model Callback Handling:**
|
||||||
|
- LMStudio streaming callbacks for token-by-token responses
|
||||||
|
- Ollama streaming responses for incremental model output
|
||||||
|
|
||||||
|
## Code Execution Sandbox
|
||||||
|
|
||||||
|
**Sandbox Environment:**
|
||||||
|
- Docker container with resource limits
|
||||||
|
- SDK: Docker SDK for Python (docker-py)
|
||||||
|
- Environment: Isolated Linux container
|
||||||
|
- Resource limits: CPU cores, RAM, network restrictions
|
||||||
|
|
||||||
|
**Risk Assessment:**
|
||||||
|
- Multi-level risk evaluation (LOW/MEDIUM/HIGH/BLOCKED)
|
||||||
|
- AST validation before container execution
|
||||||
|
- Second-agent review via Claude/OpenCode API
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Integration audit: 2026-01-26*
|
||||||
93
.planning/codebase/STACK.md
Normal file
93
.planning/codebase/STACK.md
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
# Technology Stack
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Languages
|
||||||
|
|
||||||
|
**Primary:**
|
||||||
|
- Python 3.x - Core Mai agent codebase, local model inference, self-improvement system
|
||||||
|
|
||||||
|
**Secondary:**
|
||||||
|
- YAML - Configuration files for personality, behavior settings
|
||||||
|
- JSON - Configuration, metadata, API responses
|
||||||
|
- SQL - Memory storage and retrieval queries
|
||||||
|
|
||||||
|
## Runtime
|
||||||
|
|
||||||
|
**Environment:**
|
||||||
|
- Python (local execution, no remote runtime)
|
||||||
|
- LMStudio or Ollama - Local model inference server
|
||||||
|
|
||||||
|
**Package Manager:**
|
||||||
|
- pip - Python package management
|
||||||
|
- Lockfile: requirements.txt or poetry.lock (typical Python approach)
|
||||||
|
|
||||||
|
## Frameworks
|
||||||
|
|
||||||
|
**Core:**
|
||||||
|
- No web framework for v1 (CLI/Discord only)
|
||||||
|
|
||||||
|
**Model Inference:**
|
||||||
|
- LMStudio Python SDK - Local model switching and inference
|
||||||
|
- Ollama API - Alternative local model management per requirements
|
||||||
|
|
||||||
|
**Discord Integration:**
|
||||||
|
- discord.py - Discord bot API client
|
||||||
|
|
||||||
|
**CLI:**
|
||||||
|
- Click or Typer - Command-line interface building
|
||||||
|
|
||||||
|
**Testing:**
|
||||||
|
- pytest - Unit/integration test framework
|
||||||
|
- pytest-asyncio - Async test support for Discord bot testing
|
||||||
|
|
||||||
|
**Build/Dev:**
|
||||||
|
- Git - Version control for Mai's own code changes
|
||||||
|
- Docker - Sandbox execution environment for safety
|
||||||
|
|
||||||
|
## Key Dependencies
|
||||||
|
|
||||||
|
**Critical:**
|
||||||
|
- LMStudio Python Client - Model loading, switching, inference with token management
|
||||||
|
- discord.py - Discord bot functionality for approval workflows
|
||||||
|
- SQLite3 - Lightweight persistent storage (Python stdlib)
|
||||||
|
- Docker SDK for Python - Sandbox execution management
|
||||||
|
|
||||||
|
**Infrastructure:**
|
||||||
|
- requests - HTTP client for Discord API fallback and Ollama API communication
|
||||||
|
- PyYAML - Personality configuration parsing
|
||||||
|
- pydantic - Data validation for internal structures
|
||||||
|
- python-dotenv - Environment variable management for secrets
|
||||||
|
- GitPython - Programmatic git operations for committing self-improvements
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
**Environment:**
|
||||||
|
- .env file - Discord bot token, model paths, resource thresholds
|
||||||
|
- environment variables - Runtime configuration loaded at startup
|
||||||
|
- personality.yaml - Core personality values and learned behavior layers
|
||||||
|
- config.json - Resource limits, model preferences, memory settings
|
||||||
|
|
||||||
|
**Build:**
|
||||||
|
- setup.py or pyproject.toml - Package metadata and dependency declaration
|
||||||
|
- Dockerfile - Sandbox execution environment specification
|
||||||
|
- .dockerignore - Docker build optimization
|
||||||
|
|
||||||
|
## Platform Requirements
|
||||||
|
|
||||||
|
**Development:**
|
||||||
|
- Python 3.8+ (for type hints and async/await)
|
||||||
|
- Git (for version control and self-modification tracking)
|
||||||
|
- Docker (for sandbox execution environment)
|
||||||
|
- LMStudio or Ollama running locally (for model inference)
|
||||||
|
|
||||||
|
**Production (Runtime):**
|
||||||
|
- RTX 3060 GPU minimum (per project constraints)
|
||||||
|
- 16GB+ RAM (for model loading and context management)
|
||||||
|
- Linux/macOS/Windows with Python 3.8+
|
||||||
|
- Docker daemon (for sandboxed code execution)
|
||||||
|
- Local LMStudio/Ollama instance (no cloud models)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Stack analysis: 2026-01-26*
|
||||||
258
.planning/codebase/STRUCTURE.md
Normal file
258
.planning/codebase/STRUCTURE.md
Normal file
@@ -0,0 +1,258 @@
|
|||||||
|
# Codebase Structure
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Directory Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
mai/
|
||||||
|
├── src/
|
||||||
|
│ ├── __main__.py # CLI entry point
|
||||||
|
│ ├── mai.py # Core Mai class, orchestration
|
||||||
|
│ ├── models/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── adapter.py # Base model adapter interface
|
||||||
|
│ │ ├── ollama_adapter.py # Ollama/LMStudio implementation
|
||||||
|
│ │ ├── model_manager.py # Model selection and switching logic
|
||||||
|
│ │ └── resource_monitor.py # CPU, RAM, GPU tracking
|
||||||
|
│ ├── memory/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── store.py # SQLite conversation store
|
||||||
|
│ │ ├── vector_search.py # Semantic similarity search
|
||||||
|
│ │ ├── compression.py # History compression and summarization
|
||||||
|
│ │ └── pattern_extractor.py # Learning and pattern recognition
|
||||||
|
│ ├── conversation/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── engine.py # Main conversation orchestration
|
||||||
|
│ │ ├── context_manager.py # Token budget and window management
|
||||||
|
│ │ ├── turn_handler.py # Single turn processing
|
||||||
|
│ │ └── reasoning.py # Reasoning transparency and clarification
|
||||||
|
│ ├── personality/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── core_rules.py # Unshakeable core values enforcement
|
||||||
|
│ │ ├── learned_behaviors.py # Personality adaptation from interactions
|
||||||
|
│ │ ├── guardrails.py # Safety constraints and refusal logic
|
||||||
|
│ │ └── config_loader.py # YAML personality configuration
|
||||||
|
│ ├── safety/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── executor.py # Docker sandbox execution wrapper
|
||||||
|
│ │ ├── risk_analyzer.py # Risk classification (LOW/MEDIUM/HIGH/BLOCKED)
|
||||||
|
│ │ ├── ast_validator.py # Syntax and import validation
|
||||||
|
│ │ └── audit_log.py # Immutable execution history
|
||||||
|
│ ├── selfmod/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── analyzer.py # Code analysis and improvement detection
|
||||||
|
│ │ ├── generator.py # Improvement code generation
|
||||||
|
│ │ ├── scheduler.py # Periodic and on-demand analysis trigger
|
||||||
|
│ │ ├── reviewer.py # Second-agent review coordination
|
||||||
|
│ │ └── git_manager.py # Git commit integration
|
||||||
|
│ ├── interfaces/
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── cli.py # CLI chat interface
|
||||||
|
│ │ ├── discord_bot.py # Discord bot implementation
|
||||||
|
│ │ ├── message_handler.py # Shared message processing
|
||||||
|
│ │ ├── approval_handler.py # Change approval workflow
|
||||||
|
│ │ └── offline_queue.py # Message queueing during disconnection
|
||||||
|
│ └── utils/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── config.py # Configuration loading
|
||||||
|
│ ├── logging.py # Structured logging setup
|
||||||
|
│ ├── validators.py # Input validation helpers
|
||||||
|
│ └── helpers.py # Shared utility functions
|
||||||
|
├── config/
|
||||||
|
│ ├── personality.yaml # Core personality configuration
|
||||||
|
│ ├── models.yaml # Model definitions and resource limits
|
||||||
|
│ ├── safety_rules.yaml # Risk assessment rules
|
||||||
|
│ └── logging.yaml # Logging configuration
|
||||||
|
├── tests/
|
||||||
|
│ ├── unit/
|
||||||
|
│ │ ├── test_models.py
|
||||||
|
│ │ ├── test_memory.py
|
||||||
|
│ │ ├── test_conversation.py
|
||||||
|
│ │ ├── test_personality.py
|
||||||
|
│ │ ├── test_safety.py
|
||||||
|
│ │ └── test_selfmod.py
|
||||||
|
│ ├── integration/
|
||||||
|
│ │ ├── test_conversation_flow.py
|
||||||
|
│ │ ├── test_selfmod_workflow.py
|
||||||
|
│ │ └── test_interfaces.py
|
||||||
|
│ └── fixtures/
|
||||||
|
│ ├── mock_models.py
|
||||||
|
│ ├── test_data.py
|
||||||
|
│ └── sample_conversations.json
|
||||||
|
├── scripts/
|
||||||
|
│ ├── setup_ollama.py # Initial model downloading
|
||||||
|
│ ├── init_db.py # Database schema initialization
|
||||||
|
│ └── verify_environment.py # Pre-flight checks
|
||||||
|
├── docker/
|
||||||
|
│ └── Dockerfile # Sandbox execution environment
|
||||||
|
├── .env.example # Environment variables template
|
||||||
|
├── pyproject.toml # Project metadata and dependencies
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── pytest.ini # Test configuration
|
||||||
|
├── Makefile # Development commands
|
||||||
|
└── README.md # Project overview
|
||||||
|
```
|
||||||
|
|
||||||
|
## Directory Purposes
|
||||||
|
|
||||||
|
**src/:**
|
||||||
|
- Purpose: All application code
|
||||||
|
- Contains: Python modules organized by architectural layer
|
||||||
|
- Key files: `mai.py` (core), `__main__.py` (CLI entry)
|
||||||
|
|
||||||
|
**src/models/:**
|
||||||
|
- Purpose: Model inference abstraction
|
||||||
|
- Contains: Adapter interfaces, Ollama client, resource monitoring
|
||||||
|
- Key files: `model_manager.py` (selection logic), `resource_monitor.py` (constraints)
|
||||||
|
|
||||||
|
**src/memory/:**
|
||||||
|
- Purpose: Persistent storage and retrieval
|
||||||
|
- Contains: SQLite operations, vector search, compression
|
||||||
|
- Key files: `store.py` (main interface), `vector_search.py` (semantic search)
|
||||||
|
|
||||||
|
**src/conversation/:**
|
||||||
|
- Purpose: Multi-turn conversation orchestration
|
||||||
|
- Contains: Turn handling, context windowing, reasoning transparency
|
||||||
|
- Key files: `engine.py` (main coordinator), `context_manager.py` (token budget)
|
||||||
|
|
||||||
|
**src/personality/:**
|
||||||
|
- Purpose: Values enforcement and personality adaptation
|
||||||
|
- Contains: Core rules, learned behaviors, guardrails
|
||||||
|
- Key files: `core_rules.py` (unshakeable values), `learned_behaviors.py` (adaptation)
|
||||||
|
|
||||||
|
**src/safety/:**
|
||||||
|
- Purpose: Code execution sandboxing and risk assessment
|
||||||
|
- Contains: Docker wrapper, AST validation, risk classification, audit logging
|
||||||
|
- Key files: `executor.py` (sandbox wrapper), `risk_analyzer.py` (classification)
|
||||||
|
|
||||||
|
**src/selfmod/:**
|
||||||
|
- Purpose: Autonomous code improvement and review
|
||||||
|
- Contains: Code analysis, improvement generation, approval workflow
|
||||||
|
- Key files: `analyzer.py` (detection), `reviewer.py` (second-agent coordination)
|
||||||
|
|
||||||
|
**src/interfaces/:**
|
||||||
|
- Purpose: External communication adapters
|
||||||
|
- Contains: CLI handler, Discord bot, approval system
|
||||||
|
- Key files: `cli.py` (terminal UI), `discord_bot.py` (Discord integration)
|
||||||
|
|
||||||
|
**src/utils/:**
|
||||||
|
- Purpose: Shared utilities and helpers
|
||||||
|
- Contains: Configuration loading, logging, validation
|
||||||
|
- Key files: `config.py` (env/file loading), `logging.py` (structured logs)
|
||||||
|
|
||||||
|
**config/:**
|
||||||
|
- Purpose: Non-code configuration files
|
||||||
|
- Contains: YAML personality, models, safety rules definitions
|
||||||
|
- Key files: `personality.yaml` (core values), `models.yaml` (resource profiles)
|
||||||
|
|
||||||
|
**tests/:**
|
||||||
|
- Purpose: Test suites organized by type
|
||||||
|
- Contains: Unit tests (layer isolation), integration tests (flows), fixtures (test data)
|
||||||
|
- Key files: Each test file mirrors `src/` structure
|
||||||
|
|
||||||
|
**scripts/:**
|
||||||
|
- Purpose: One-off setup and maintenance scripts
|
||||||
|
- Contains: Database initialization, environment verification
|
||||||
|
- Key files: `setup_ollama.py` (first-time model setup)
|
||||||
|
|
||||||
|
**docker/:**
|
||||||
|
- Purpose: Container configuration for sandboxed execution
|
||||||
|
- Contains: Dockerfile for isolation environment
|
||||||
|
- Key files: `Dockerfile` (build recipe)
|
||||||
|
|
||||||
|
## Key File Locations
|
||||||
|
|
||||||
|
**Entry Points:**
|
||||||
|
- `src/__main__.py`: CLI entry, `python -m mai` launches here
|
||||||
|
- `src/interfaces/discord_bot.py`: Discord bot main loop
|
||||||
|
- `src/mai.py`: Core Mai class, system initialization
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- `config/personality.yaml`: Core values, interaction patterns, refusal rules
|
||||||
|
- `config/models.yaml`: Available models, resource requirements, context windows
|
||||||
|
- `.env.example`: Required environment variables template
|
||||||
|
|
||||||
|
**Core Logic:**
|
||||||
|
- `src/mai.py`: Main orchestration
|
||||||
|
- `src/conversation/engine.py`: Conversation turn processing
|
||||||
|
- `src/selfmod/analyzer.py`: Improvement opportunity detection
|
||||||
|
- `src/safety/executor.py`: Safe code execution
|
||||||
|
|
||||||
|
**Testing:**
|
||||||
|
- `tests/unit/`: Layer-isolated tests (no dependencies between layers)
|
||||||
|
- `tests/integration/`: End-to-end flow tests
|
||||||
|
- `tests/fixtures/`: Mock objects and test data
|
||||||
|
|
||||||
|
## Naming Conventions
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Module files: `snake_case.py` (e.g., `model_manager.py`)
|
||||||
|
- Entry points: `__main__.py` for packages, standalone scripts at package root
|
||||||
|
- Config files: `snake_case.yaml` (e.g., `personality.yaml`)
|
||||||
|
- Test files: `test_*.py` (e.g., `test_conversation.py`)
|
||||||
|
|
||||||
|
**Directories:**
|
||||||
|
- Feature areas: `snake_case` (e.g., `src/selfmod/`)
|
||||||
|
- No abbreviations except `selfmod` (self-modification) which is project standard
|
||||||
|
- Each layer is a top-level directory under `src/`
|
||||||
|
|
||||||
|
**Functions/Classes:**
|
||||||
|
- Classes: `PascalCase` (e.g., `ModelManager`, `ConversationEngine`)
|
||||||
|
- Functions: `snake_case` (e.g., `generate_response()`, `validate_code()`)
|
||||||
|
- Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_CONTEXT_TOKENS`)
|
||||||
|
- Private methods/functions: prefix with `_` (e.g., `_internal_method()`)
|
||||||
|
|
||||||
|
**Types:**
|
||||||
|
- Use type hints throughout: `def process(msg: str) -> str:`
|
||||||
|
- Complex types in `src/utils/types.py` or local to module
|
||||||
|
|
||||||
|
## Where to Add New Code
|
||||||
|
|
||||||
|
**New Feature (e.g., new communication interface like Slack):**
|
||||||
|
- Primary code: `src/interfaces/slack_adapter.py` (new adapter following discord_bot.py pattern)
|
||||||
|
- Tests: `tests/unit/test_slack_adapter.py` and `tests/integration/test_slack_interface.py`
|
||||||
|
- Configuration: Add to `src/interfaces/__init__.py` imports and `config/interfaces.yaml` if needed
|
||||||
|
- Entry hook: Modify `src/mai.py` to initialize new adapter
|
||||||
|
|
||||||
|
**New Component/Module (e.g., advanced memory with graph databases):**
|
||||||
|
- Implementation: `src/memory/graph_store.py` (new module in appropriate layer)
|
||||||
|
- Interface: Follow existing patterns (e.g., inherit from `src/memory/store.py` base)
|
||||||
|
- Tests: Corresponding test in `tests/unit/test_memory.py` or new file if complex
|
||||||
|
- Integration: Modify `src/mai.py` initialization to use new component with feature flag
|
||||||
|
|
||||||
|
**Utilities (e.g., new helper function):**
|
||||||
|
- Shared helpers: `src/utils/helpers.py` (functions) or new file like `src/utils/math_utils.py` if substantial
|
||||||
|
- Internal helpers: Keep in the module where used (don't over-extract)
|
||||||
|
- Tests: Add to `tests/unit/test_utils.py`
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
- Static rules: Add to appropriate YAML in `config/`
|
||||||
|
- Dynamic config: Load in `src/utils/config.py`
|
||||||
|
- Env-driven: Add to `.env.example` with documentation
|
||||||
|
|
||||||
|
## Special Directories
|
||||||
|
|
||||||
|
**tests/fixtures/:**
|
||||||
|
- Purpose: Reusable test data and mock objects
|
||||||
|
- Generated: No, hand-created
|
||||||
|
- Committed: Yes, part of repository
|
||||||
|
|
||||||
|
**config/:**
|
||||||
|
- Purpose: Non-code configuration
|
||||||
|
- Generated: No, hand-maintained
|
||||||
|
- Committed: Yes, except secrets (use `.env`)
|
||||||
|
|
||||||
|
**.env (not committed):**
|
||||||
|
- Purpose: Local environment overrides and secrets
|
||||||
|
- Generated: No, copied from `.env.example` and filled locally
|
||||||
|
- Committed: No (in .gitignore)
|
||||||
|
|
||||||
|
**docker/:**
|
||||||
|
- Purpose: Sandbox environment for safe execution
|
||||||
|
- Generated: No, hand-maintained
|
||||||
|
- Committed: Yes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Structure analysis: 2026-01-26*
|
||||||
415
.planning/codebase/TESTING.md
Normal file
415
.planning/codebase/TESTING.md
Normal file
@@ -0,0 +1,415 @@
|
|||||||
|
# Testing Patterns
|
||||||
|
|
||||||
|
**Analysis Date:** 2026-01-26
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
**Note:** This codebase is in planning phase. No tests have been written yet. These patterns are **prescriptive** for the Mai project and should be applied from the first test file forward.
|
||||||
|
|
||||||
|
## Test Framework
|
||||||
|
|
||||||
|
**Runner:**
|
||||||
|
- **pytest** - Test discovery and execution
|
||||||
|
- Version: Latest stable (6.x or higher)
|
||||||
|
- Config: `pytest.ini` or `pyproject.toml` (create with initial setup)
|
||||||
|
|
||||||
|
**Assertion Library:**
|
||||||
|
- Built-in `assert` statements
|
||||||
|
- `pytest` fixtures for setup/teardown
|
||||||
|
- `pytest.raises()` for exception testing
|
||||||
|
|
||||||
|
**Run Commands:**
|
||||||
|
```bash
|
||||||
|
pytest # Run all tests in tests/ directory
|
||||||
|
pytest -v # Verbose output with test names
|
||||||
|
pytest -k "test_memory" # Run tests matching pattern
|
||||||
|
pytest --cov=src # Generate coverage report
|
||||||
|
pytest --cov=src --cov-report=html # Generate HTML coverage
|
||||||
|
pytest -x # Stop on first failure
|
||||||
|
pytest -s # Show print output during tests
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test File Organization
|
||||||
|
|
||||||
|
**Location:**
|
||||||
|
- **Co-located pattern**: Test files live next to source files
|
||||||
|
- Structure: `src/[module]/test_[component].py`
|
||||||
|
- All tests in a single directory: `tests/` with mirrored structure
|
||||||
|
|
||||||
|
**Recommended pattern for Mai:**
|
||||||
|
```
|
||||||
|
src/
|
||||||
|
├── memory/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── storage.py
|
||||||
|
│ └── test_storage.py # Co-located tests
|
||||||
|
├── models/
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── manager.py
|
||||||
|
│ └── test_manager.py
|
||||||
|
└── safety/
|
||||||
|
├── __init__.py
|
||||||
|
├── sandbox.py
|
||||||
|
└── test_sandbox.py
|
||||||
|
```
|
||||||
|
|
||||||
|
**Naming:**
|
||||||
|
- Test files: `test_*.py` or `*_test.py`
|
||||||
|
- Test classes: `TestComponentName`
|
||||||
|
- Test functions: `test_specific_behavior_with_context`
|
||||||
|
- Example: `test_retrieves_conversation_history_within_token_limit`
|
||||||
|
|
||||||
|
**Test Organization:**
|
||||||
|
- One test class per component being tested
|
||||||
|
- Group related tests in a single class
|
||||||
|
- One assertion per test (or tightly related assertions)
|
||||||
|
|
||||||
|
## Test Structure
|
||||||
|
|
||||||
|
**Suite Organization:**
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
from src.memory.storage import ConversationStorage
|
||||||
|
|
||||||
|
class TestConversationStorage:
|
||||||
|
"""Test suite for ConversationStorage."""
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def storage(self) -> ConversationStorage:
|
||||||
|
"""Provide a storage instance for testing."""
|
||||||
|
return ConversationStorage(path=":memory:") # Use in-memory DB
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def sample_conversation(self) -> dict:
|
||||||
|
"""Provide sample conversation data."""
|
||||||
|
return {
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello"},
|
||||||
|
{"role": "assistant", "content": "Hi there"},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
def test_stores_and_retrieves_conversation(self, storage, sample_conversation):
|
||||||
|
"""Test that conversations can be stored and retrieved."""
|
||||||
|
conversation_id = storage.store(sample_conversation)
|
||||||
|
retrieved = storage.get(conversation_id)
|
||||||
|
assert retrieved == sample_conversation
|
||||||
|
|
||||||
|
def test_raises_error_on_missing_conversation(self, storage):
|
||||||
|
"""Test that missing conversations raise appropriate error."""
|
||||||
|
with pytest.raises(MemoryError):
|
||||||
|
storage.get("nonexistent_id")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Patterns:**
|
||||||
|
|
||||||
|
- **Setup pattern**: Use `@pytest.fixture` for setup, avoid `setUp()` methods
|
||||||
|
- **Teardown pattern**: Use fixture cleanup (yield pattern)
|
||||||
|
- **Assertion pattern**: One logical assertion per test (may involve multiple `assert` statements on related data)
|
||||||
|
|
||||||
|
```python
|
||||||
|
@pytest.fixture
|
||||||
|
def model_manager():
|
||||||
|
"""Set up model manager and clean up after test."""
|
||||||
|
manager = ModelManager()
|
||||||
|
manager.initialize()
|
||||||
|
yield manager
|
||||||
|
manager.shutdown() # Cleanup
|
||||||
|
|
||||||
|
def test_loads_available_models(model_manager):
|
||||||
|
"""Test model discovery and loading."""
|
||||||
|
models = model_manager.list_available()
|
||||||
|
assert len(models) > 0
|
||||||
|
assert all(isinstance(m, str) for m in models)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Async Testing
|
||||||
|
|
||||||
|
**Pattern:**
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_async_model_invocation():
|
||||||
|
"""Test async model inference."""
|
||||||
|
manager = ModelManager()
|
||||||
|
response = await manager.generate("test prompt")
|
||||||
|
assert len(response) > 0
|
||||||
|
assert isinstance(response, str)
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_concurrent_memory_access():
|
||||||
|
"""Test that memory handles concurrent access."""
|
||||||
|
storage = ConversationStorage()
|
||||||
|
tasks = [
|
||||||
|
storage.store({"id": i, "text": f"msg {i}"})
|
||||||
|
for i in range(10)
|
||||||
|
]
|
||||||
|
ids = await asyncio.gather(*tasks)
|
||||||
|
assert len(ids) == 10
|
||||||
|
```
|
||||||
|
|
||||||
|
- Use `@pytest.mark.asyncio` decorator
|
||||||
|
- Use `async def` for test function signature
|
||||||
|
- Use `await` for async calls
|
||||||
|
- Can mix async fixtures and sync fixtures
|
||||||
|
|
||||||
|
## Mocking
|
||||||
|
|
||||||
|
**Framework:** `unittest.mock` (Python standard library)
|
||||||
|
|
||||||
|
**Patterns:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
from unittest.mock import Mock, AsyncMock, patch, MagicMock
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
def test_handles_model_error():
|
||||||
|
"""Test error handling when model fails."""
|
||||||
|
mock_model = Mock()
|
||||||
|
mock_model.generate.side_effect = RuntimeError("Model offline")
|
||||||
|
|
||||||
|
manager = ModelManager(model=mock_model)
|
||||||
|
with pytest.raises(ModelError):
|
||||||
|
manager.invoke("prompt")
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_retries_on_transient_failure():
|
||||||
|
"""Test retry logic for transient failures."""
|
||||||
|
mock_api = AsyncMock()
|
||||||
|
mock_api.call.side_effect = [
|
||||||
|
Exception("Temporary failure"),
|
||||||
|
"success"
|
||||||
|
]
|
||||||
|
|
||||||
|
result = await retry_with_backoff(mock_api.call, max_retries=2)
|
||||||
|
assert result == "success"
|
||||||
|
assert mock_api.call.call_count == 2
|
||||||
|
|
||||||
|
@patch("src.models.manager.requests.get")
|
||||||
|
def test_fetches_model_list(mock_get):
|
||||||
|
"""Test fetching model list from API."""
|
||||||
|
mock_get.return_value.json.return_value = {"models": ["model1", "model2"]}
|
||||||
|
|
||||||
|
manager = ModelManager()
|
||||||
|
models = manager.get_remote_models()
|
||||||
|
assert models == ["model1", "model2"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**What to Mock:**
|
||||||
|
- External API calls (Discord, LMStudio API)
|
||||||
|
- Database operations (SQLite in production, use in-memory for tests)
|
||||||
|
- File I/O (use temporary directories)
|
||||||
|
- Slow operations (model inference can be stubbed)
|
||||||
|
- System resources (CPU, RAM monitoring)
|
||||||
|
|
||||||
|
**What NOT to Mock:**
|
||||||
|
- Core business logic (the logic you're testing)
|
||||||
|
- Data structure operations (dict, list operations)
|
||||||
|
- Internal module calls within the same component
|
||||||
|
- Internal helper functions
|
||||||
|
|
||||||
|
## Fixtures and Factories
|
||||||
|
|
||||||
|
**Test Data Pattern:**
|
||||||
|
|
||||||
|
```python
|
||||||
|
# conftest.py - shared fixtures
|
||||||
|
import pytest
|
||||||
|
from pathlib import Path
|
||||||
|
from src.memory.storage import ConversationStorage
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def temp_db():
|
||||||
|
"""Provide a temporary SQLite database."""
|
||||||
|
db_path = Path("/tmp/test_mai.db")
|
||||||
|
yield db_path
|
||||||
|
if db_path.exists():
|
||||||
|
db_path.unlink()
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def conversation_factory():
|
||||||
|
"""Factory for creating test conversations."""
|
||||||
|
def _make_conversation(num_messages: int = 3) -> dict:
|
||||||
|
messages = []
|
||||||
|
for i in range(num_messages):
|
||||||
|
role = "user" if i % 2 == 0 else "assistant"
|
||||||
|
messages.append({
|
||||||
|
"role": role,
|
||||||
|
"content": f"Message {i+1}",
|
||||||
|
"timestamp": f"2026-01-26T{i:02d}:00:00Z"
|
||||||
|
})
|
||||||
|
return {"messages": messages}
|
||||||
|
return _make_conversation
|
||||||
|
|
||||||
|
def test_stores_long_conversation(temp_db, conversation_factory):
|
||||||
|
"""Test storing conversations with many messages."""
|
||||||
|
storage = ConversationStorage(path=temp_db)
|
||||||
|
long_convo = conversation_factory(num_messages=100)
|
||||||
|
|
||||||
|
conv_id = storage.store(long_convo)
|
||||||
|
retrieved = storage.get(conv_id)
|
||||||
|
assert len(retrieved["messages"]) == 100
|
||||||
|
```
|
||||||
|
|
||||||
|
**Location:**
|
||||||
|
- Shared fixtures: `tests/conftest.py` (pytest auto-discovers)
|
||||||
|
- Component-specific fixtures: In test files or subdirectory `conftest.py` files
|
||||||
|
- Factories: In `tests/factories.py` or within `conftest.py`
|
||||||
|
|
||||||
|
## Coverage
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- **Target: 80% code coverage minimum** for core modules
|
||||||
|
- Critical paths (safety, memory, inference): 90%+ coverage
|
||||||
|
- UI/CLI: 70% (lower due to interaction complexity)
|
||||||
|
|
||||||
|
**View Coverage:**
|
||||||
|
```bash
|
||||||
|
pytest --cov=src --cov-report=term-missing
|
||||||
|
pytest --cov=src --cov-report=html
|
||||||
|
# Then open htmlcov/index.html in browser
|
||||||
|
```
|
||||||
|
|
||||||
|
**Configure in `pyproject.toml`:**
|
||||||
|
```toml
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["src", "tests"]
|
||||||
|
addopts = "--cov=src --cov-report=term-missing --cov-report=html"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Types
|
||||||
|
|
||||||
|
**Unit Tests:**
|
||||||
|
- Scope: Single function or class method
|
||||||
|
- Dependencies: Mocked
|
||||||
|
- Speed: Fast (<100ms per test)
|
||||||
|
- Location: `test_component.py` in source directory
|
||||||
|
- Example: `test_tokenizer_splits_input_correctly`
|
||||||
|
|
||||||
|
**Integration Tests:**
|
||||||
|
- Scope: Multiple components working together
|
||||||
|
- Dependencies: Real services (in-memory DB, local files)
|
||||||
|
- Speed: Medium (100ms - 1s per test)
|
||||||
|
- Location: `tests/integration/test_*.py`
|
||||||
|
- Example: `test_conversation_engine_with_memory_retrieval`
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tests/integration/test_conversation_flow.py
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_full_conversation_with_memory():
|
||||||
|
"""Test complete conversation flow including memory retrieval."""
|
||||||
|
memory = ConversationStorage(path=":memory:")
|
||||||
|
engine = ConversationEngine(memory=memory)
|
||||||
|
|
||||||
|
# Store context
|
||||||
|
memory.store({"id": "ctx1", "content": "User prefers Python"})
|
||||||
|
|
||||||
|
# Have conversation
|
||||||
|
response = await engine.chat("What language should I use?")
|
||||||
|
|
||||||
|
# Verify context was used
|
||||||
|
assert "Python" in response or "python" in response.lower()
|
||||||
|
```
|
||||||
|
|
||||||
|
**E2E Tests:**
|
||||||
|
- Scope: Full system end-to-end
|
||||||
|
- Framework: **Not required for v1** (added in v2)
|
||||||
|
- Would test: CLI input → Model → Discord output
|
||||||
|
- Deferred until Discord/CLI interfaces complete
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
**Error Testing:**
|
||||||
|
```python
|
||||||
|
def test_invalid_input_raises_validation_error():
|
||||||
|
"""Test that validation catches malformed input."""
|
||||||
|
with pytest.raises(ValueError) as exc_info:
|
||||||
|
storage.store({"invalid": "structure"})
|
||||||
|
assert "missing required field" in str(exc_info.value)
|
||||||
|
|
||||||
|
def test_logs_error_details():
|
||||||
|
"""Test that errors log useful debugging info."""
|
||||||
|
with patch("src.logger") as mock_logger:
|
||||||
|
try:
|
||||||
|
risky_operation()
|
||||||
|
except OperationError:
|
||||||
|
pass
|
||||||
|
mock_logger.error.assert_called_once()
|
||||||
|
call_args = mock_logger.error.call_args
|
||||||
|
assert "operation_id" in str(call_args)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Performance Testing:**
|
||||||
|
```python
|
||||||
|
def test_memory_retrieval_within_performance_budget(benchmark):
|
||||||
|
"""Test that memory queries complete within time budget."""
|
||||||
|
storage = ConversationStorage()
|
||||||
|
query = "what did we discuss earlier"
|
||||||
|
|
||||||
|
result = benchmark(storage.retrieve_similar, query)
|
||||||
|
assert len(result) > 0
|
||||||
|
|
||||||
|
# Run with: pytest --benchmark-only
|
||||||
|
```
|
||||||
|
|
||||||
|
**Data Validation Testing:**
|
||||||
|
```python
|
||||||
|
@pytest.mark.parametrize("input_val,expected", [
|
||||||
|
("hello", "hello"),
|
||||||
|
("HELLO", "hello"),
|
||||||
|
(" hello ", "hello"),
|
||||||
|
("", ValueError),
|
||||||
|
])
|
||||||
|
def test_normalizes_input(input_val, expected):
|
||||||
|
"""Test input normalization with multiple cases."""
|
||||||
|
if isinstance(expected, type) and issubclass(expected, Exception):
|
||||||
|
with pytest.raises(expected):
|
||||||
|
normalize(input_val)
|
||||||
|
else:
|
||||||
|
assert normalize(input_val) == expected
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
**pytest.ini (create at project root):**
|
||||||
|
```ini
|
||||||
|
[pytest]
|
||||||
|
testpaths = src tests
|
||||||
|
addopts = -v --tb=short --strict-markers
|
||||||
|
markers =
|
||||||
|
asyncio: marks async tests
|
||||||
|
slow: marks slow tests
|
||||||
|
integration: marks integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alternative: pyproject.toml:**
|
||||||
|
```toml
|
||||||
|
[tool.pytest.ini_options]
|
||||||
|
testpaths = ["src", "tests"]
|
||||||
|
addopts = "-v --tb=short"
|
||||||
|
markers = [
|
||||||
|
"asyncio: async test",
|
||||||
|
"slow: slow test",
|
||||||
|
"integration: integration test",
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test Execution in CI/CD
|
||||||
|
|
||||||
|
**GitHub Actions workflow (when created):**
|
||||||
|
```yaml
|
||||||
|
- name: Run tests
|
||||||
|
run: pytest --cov=src --cov-report=xml
|
||||||
|
|
||||||
|
- name: Upload coverage
|
||||||
|
uses: codecov/codecov-action@v3
|
||||||
|
with:
|
||||||
|
files: ./coverage.xml
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Testing guide: 2026-01-26*
|
||||||
|
*Status: Prescriptive for Mai v1 implementation*
|
||||||
Reference in New Issue
Block a user