docs: map existing codebase
- STACK.md - Technologies and dependencies - ARCHITECTURE.md - System design and patterns - STRUCTURE.md - Directory layout - CONVENTIONS.md - Code style and patterns - TESTING.md - Test structure - INTEGRATIONS.md - External services - CONCERNS.md - Technical debt and issues
This commit is contained in:
297
.planning/codebase/CONCERNS.md
Normal file
297
.planning/codebase/CONCERNS.md
Normal file
@@ -0,0 +1,297 @@
|
||||
# Codebase Concerns
|
||||
|
||||
**Analysis Date:** 2026-01-26
|
||||
|
||||
## Tech Debt
|
||||
|
||||
**Incomplete Memory System Integration:**
|
||||
- Issue: Memory manager gracefully initializes but may fail silently when dependencies are missing
|
||||
- Files: `src/mai/memory/manager.py`
|
||||
- Impact: Memory features degrade ungracefully; users don't know compression or retrieval is disabled
|
||||
- Fix approach: Add explicit logging and health checks on startup, expose memory system status in CLI
|
||||
|
||||
**Large Monolithic Memory Manager:**
|
||||
- Issue: MemoryManager is 1036 lines with multiple responsibilities (storage, compression, retrieval orchestration)
|
||||
- Files: `src/mai/memory/manager.py`
|
||||
- Impact: Difficult to test individual memory subsystems; changes affect multiple concerns simultaneously
|
||||
- Fix approach: Extract retrieval delegation and compression orchestration into separate coordinator classes
|
||||
|
||||
**Conversation Engine Complexity:**
|
||||
- Issue: ConversationEngine is 648 lines handling timing, state, decomposition, reasoning, interruption, and metrics
|
||||
- Files: `src/mai/conversation/engine.py`
|
||||
- Impact: High cognitive load for maintainers; hard to isolate bugs in specific subsystems
|
||||
- Fix approach: Separate concerns into focused orchestrator (engine) and behavior modules (timing/reasoning/decomposition are already separated but loosely coupled)
|
||||
|
||||
**Permission/Approval System Fragility:**
|
||||
- Issue: ApprovalSystem uses regex pattern matching for risk analysis with hardcoded patterns
|
||||
- Files: `src/mai/sandbox/approval_system.py`
|
||||
- Impact: Pattern-matching approach is fragile (false positives/negatives); patterns not maintainable as code evolves
|
||||
- Fix approach: Replace regex with AST-based code analysis for more reliable risk detection; move risk patterns to configuration
|
||||
|
||||
**Docker Executor Dependency Chain:**
|
||||
- Issue: DockerExecutor falls back silently to unavailable state if Docker isn't installed
|
||||
- Files: `src/mai/sandbox/docker_executor.py`
|
||||
- Impact: Approval system thinks code is sandboxed when Docker is missing; security false sense of safety
|
||||
- Fix approach: Require explicit Docker availability check at startup; block code execution if Docker unavailable and user requests sandboxing
|
||||
|
||||
## Known Bugs
|
||||
|
||||
**Session Persistence Restoration:**
|
||||
- Symptoms: "ConversationState object has no attribute 'set_conversation_history'" error when restarting CLI
|
||||
- Files: `src/mai/conversation/state.py`, `src/app/__main__.py`
|
||||
- Trigger: Start conversation, exit CLI, restart CLI session
|
||||
- Workaround: None - session restoration broken; users lose conversation history
|
||||
- Status: Identified in Phase 6 UAT but remediation code not applied (commit c70ee88 "Complete fresh slate" removed implementation)
|
||||
|
||||
**Session File Feedback Missing:**
|
||||
- Symptoms: Users don't see where/when session files are created
|
||||
- Files: `src/app/__main__.py`
|
||||
- Trigger: Create new session or use /session command
|
||||
- Workaround: Manually check ~/.mai/session.json directory
|
||||
- Status: Identified in Phase 6 UAT as major issue (test 3 failed)
|
||||
|
||||
**Resource Display Color Coding:**
|
||||
- Symptoms: Resource monitoring displays plain text instead of color-coded status indicators
|
||||
- Files: `src/app/__main__.py`
|
||||
- Trigger: Run CLI and observe resource display during conversation
|
||||
- Workaround: Parse output manually to understand resource status
|
||||
- Status: Identified in Phase 6 UAT as minor issue (test 5 failed); root cause: Rich console loses color output in non-terminal environments
|
||||
|
||||
## Security Considerations
|
||||
|
||||
**Approval System Risk Analysis Insufficient:**
|
||||
- Risk: Regex-based risk detection can be bypassed with obfuscated code (e.g., string concatenation to build dangerous commands)
|
||||
- Files: `src/mai/sandbox/approval_system.py`
|
||||
- Current mitigation: Hardcoded high-risk patterns (os.system, exec, eval); fallback to block on unrecognized patterns
|
||||
- Recommendations:
|
||||
- Implement AST-based code analysis for more reliable detection
|
||||
- Add code deobfuscation step before risk analysis
|
||||
- Create risk assessment database with test cases and known bypasses
|
||||
- Require explicit docker verification before allowing code execution
|
||||
|
||||
**Docker Fallback Security Gap:**
|
||||
- Risk: Code could execute without actual sandboxing if Docker unavailable, creating false sense of security
|
||||
- Files: `src/mai/sandbox/docker_executor.py`
|
||||
- Current mitigation: AuditLogger records all execution; approval system presents requests regardless
|
||||
- Recommendations:
|
||||
- Fail-safe: Block code execution if Docker unavailable and user hasn't explicitly allowed non-sandboxed execution
|
||||
- Add warning dialog explaining sandbox unavailability
|
||||
- Log all non-sandboxed execution attempts explicitly
|
||||
- Require explicit override from user with confirmation
|
||||
|
||||
**Approval Preference Learning Risk:**
|
||||
- Risk: User can set "auto_allow" on risky code patterns; once learned, code execution auto-approves without user intervention
|
||||
- Files: `src/mai/sandbox/approval_system.py` (lines with `user_preferences` and `auto_allow`)
|
||||
- Current mitigation: Auto-allow only applies to LOW risk level code
|
||||
- Recommendations:
|
||||
- Require explicit user confirmation before enabling auto-allow (not just responding "a")
|
||||
- Log all auto-approved executions in audit trail with reason
|
||||
- Add periodic review mechanism for auto-allow rules (e.g., "You have X auto-approved rules, review them?" on startup)
|
||||
- Restrict auto-allow to strictly limited operation types (print, basic math, not file operations)
|
||||
|
||||
## Performance Bottlenecks
|
||||
|
||||
**Memory Retrieval Search Not Optimized:**
|
||||
- Problem: ContextRetriever does full database scans for semantic similarity without indexing
|
||||
- Files: `src/mai/memory/retrieval.py`
|
||||
- Cause: Vector similarity search likely using brute-force nearest-neighbor without FAISS or similar
|
||||
- Improvement path:
|
||||
- Add FAISS vector index for semantic search acceleration
|
||||
- Implement result caching for frequent queries
|
||||
- Add search result pagination to avoid loading entire result sets
|
||||
- Benchmark retrieval latency and set targets (e.g., <500ms for top-10 similar conversations)
|
||||
|
||||
**Conversation State History Accumulation:**
|
||||
- Problem: ConversationState.conversation_history grows unbounded during long sessions
|
||||
- Files: `src/mai/conversation/state.py`
|
||||
- Cause: No automatic truncation or archival of old turns; all conversation turns kept in memory
|
||||
- Improvement path:
|
||||
- Implement sliding window of recent turns (e.g., keep last 50 turns in memory)
|
||||
- Archive old turns to disk and load on demand
|
||||
- Add compression trigger at configurable message count
|
||||
- Monitor memory usage and alert when conversation history exceeds threshold
|
||||
|
||||
**Memory Manager Compression Not Scheduled:**
|
||||
- Problem: Manual `compress_conversation()` calls required; no automatic compression scheduling
|
||||
- Files: `src/mai/memory/manager.py`
|
||||
- Cause: Compression is triggered manually or not at all; no background task or event-driven compression
|
||||
- Improvement path:
|
||||
- Implement background compression task triggered by conversation age or message count
|
||||
- Add periodic compression sweep for all old conversations
|
||||
- Make compression interval configurable (e.g., compress every 500 messages or 24 hours)
|
||||
- Track compression effectiveness and adjust thresholds
|
||||
|
||||
## Fragile Areas
|
||||
|
||||
**Ollama Integration Dependency:**
|
||||
- Files: `src/mai/model/ollama_client.py`, `src/mai/core/interface.py`
|
||||
- Why fragile: Hard-coded Ollama endpoint assumption; no fallback model provider; no retry logic for model inference
|
||||
- Safe modification:
|
||||
- Use dependency injection for model provider (interface-based)
|
||||
- Add configurable model provider endpoints
|
||||
- Implement retry logic with exponential backoff for transient failures
|
||||
- Add model availability detection at startup
|
||||
- Test coverage: Limited tests for model switching and unavailability scenarios
|
||||
|
||||
**Git Integration Fragility:**
|
||||
- Files: `src/mai/git/committer.py`, `src/mai/git/workflow.py`
|
||||
- Why fragile: Assumes clean git state; no handling for merge conflicts, detached HEAD, or dirty working directory
|
||||
- Safe modification:
|
||||
- Add pre-commit git status validation
|
||||
- Handle merge conflict detection and defer commits
|
||||
- Implement conflict resolution strategy (manual review or aborting)
|
||||
- Test against all git states (detached HEAD, dirty working tree, conflicted merge)
|
||||
- Test coverage: No tests for edge cases like merge conflicts
|
||||
|
||||
**Conversation State Serialization Round-Trip:**
|
||||
- Files: `src/mai/conversation/state.py`, `src/mai/models/conversation.py`
|
||||
- Why fragile: ConversationTurn -> Ollama message -> ConversationTurn conversion can lose context
|
||||
- Safe modification:
|
||||
- Add comprehensive unit tests for serialization round-trip
|
||||
- Document serialization format and invariants
|
||||
- Add validation after deserialization (verify message count, order, role integrity)
|
||||
- Create fixture tests with edge cases (unicode, very long messages, special characters)
|
||||
- Test coverage: No existing tests for message serialization/deserialization
|
||||
|
||||
**Docker Configuration Hardcoding:**
|
||||
- Files: `src/mai/sandbox/docker_executor.py`
|
||||
- Why fragile: Docker image names, CPU limits, memory limits hardcoded as class constants
|
||||
- Safe modification:
|
||||
- Move Docker config to configuration file
|
||||
- Add validation on startup that Docker limits match system resources
|
||||
- Document all Docker configuration assumptions
|
||||
- Make limits tunable per system resource profile
|
||||
- Test coverage: Docker integration tests likely mocked; no testing on actual Docker variations
|
||||
|
||||
## Scaling Limits
|
||||
|
||||
**Memory Database Size Growth:**
|
||||
- Current capacity: SQLite with no explicit limits; storage grows with every conversation
|
||||
- Limit: SQLite performance degrades significantly above ~1GB; queries become slow
|
||||
- Scaling path:
|
||||
- Implement database rotation (archive old conversations, start new DB periodically)
|
||||
- Add migration path to PostgreSQL for production deployments
|
||||
- Implement automatic old conversation archival (move to cold storage after 30 days)
|
||||
- Add database vacuum and index optimization on scheduled basis
|
||||
|
||||
**Conversation Context Window Management:**
|
||||
- Current capacity: Model context window determined by Ollama model selection (varies)
|
||||
- Limit: ConversationEngine doesn't prevent context overflow; will fail when history exceeds model limit
|
||||
- Scaling path:
|
||||
- Track token count of conversation history and refuse new messages before overflow
|
||||
- Implement automatic compression trigger at 80% context usage
|
||||
- Add model switching logic to use larger-context models if available
|
||||
- Document context budget requirements per model
|
||||
|
||||
**Approval History Unbounded Growth:**
|
||||
- Current capacity: ApprovalSystem.approval_history list grows indefinitely
|
||||
- Limit: Memory accumulation over time; each approval decision stored in memory forever
|
||||
- Scaling path:
|
||||
- Archive approval history to database after threshold (e.g., 1000 decisions)
|
||||
- Implement approval history rotation with configurable retention
|
||||
- Add aggregate statistics (approval patterns) instead of storing raw history
|
||||
- Clean up approval history on startup or scheduled task
|
||||
|
||||
## Dependencies at Risk
|
||||
|
||||
**Ollama Dependency and Model Availability:**
|
||||
- Risk: Hard requirement on Ollama being available and having models installed
|
||||
- Impact: Mai cannot function without Ollama; no fallback to cloud inference or other providers
|
||||
- Migration plan:
|
||||
- Implement abstract model provider interface
|
||||
- Add support for OpenAI/other cloud models as fallback (even if v1 is offline-first)
|
||||
- Document minimum Ollama model requirements
|
||||
- Add diagnostic tool to check Ollama health on startup
|
||||
|
||||
**Docker Dependency for Sandboxing:**
|
||||
- Risk: Docker required for code execution safety; no alternative sandbox implementations
|
||||
- Impact: Users without Docker can't safely execute generated code; no graceful degradation
|
||||
- Migration plan:
|
||||
- Implement abstract executor interface (not just DockerExecutor)
|
||||
- Add noop executor for testing
|
||||
- Consider lightweight alternatives (seccomp, chroot, or bubblewrap) for Linux systems
|
||||
- Add explicit warning if Docker unavailable
|
||||
|
||||
**Rich Library Terminal Detection:**
|
||||
- Risk: Rich disables colors in non-terminal environments; users see degraded UX
|
||||
- Impact: Resource monitoring and status displays lack visual feedback in non-terminal contexts
|
||||
- Migration plan:
|
||||
- Use Console(force_terminal=True) to force color output when desired
|
||||
- Add configuration option for color preference
|
||||
- Implement fallback emoji/unicode indicators for non-color environments
|
||||
- Test in various terminal emulators and SSH sessions
|
||||
|
||||
## Missing Critical Features
|
||||
|
||||
**Session Data Portability:**
|
||||
- Problem: Session files are JSON but no export/import mechanism; can't backup or migrate sessions
|
||||
- Blocks: Users can't back up conversations; losing ~/.mai/session.json loses all context
|
||||
- Fix: Add export/import commands (/export, /import) and document session file format
|
||||
|
||||
**Conversation Memory Persistence:**
|
||||
- Problem: Conversation history is session-scoped (stored in memory); not saved to memory system
|
||||
- Blocks: Long-term pattern learning relies on memory system but conversations aren't automatically stored
|
||||
- Fix: Implement automatic conversation archival to memory system after session ends
|
||||
|
||||
**User Preference Learning Audit Trail:**
|
||||
- Problem: User preferences for auto-approval learned silently; no visibility into what patterns auto-approve
|
||||
- Blocks: Users can't audit their own auto-approval rules; hard to recover from accidentally enabling auto-allow
|
||||
- Fix: Add /preferences or /audit command to show all learned rules and allow revocation
|
||||
|
||||
**Resource Constraint Graceful Degradation:**
|
||||
- Problem: System shows resource usage but doesn't adapt model selection or conversation behavior
|
||||
- Blocks: Mai can't suggest switching to smaller models when resources tight
|
||||
- Fix: Implement resource-aware model recommendation system
|
||||
|
||||
**Approval Change Logging:**
|
||||
- Problem: Approval decisions not tracked in git; can't audit "who approved what when"
|
||||
- Blocks: No accountability trail for approval decisions
|
||||
- Fix: Log all approval decisions to git with commit messages including timestamp and user
|
||||
|
||||
## Test Coverage Gaps
|
||||
|
||||
**Docker Executor Network Isolation:**
|
||||
- What's not tested: Whether network actually restricted in Docker containers
|
||||
- Files: `src/mai/sandbox/docker_executor.py`
|
||||
- Risk: Code might have network access despite supposed isolation
|
||||
- Priority: High (security-critical)
|
||||
|
||||
**Session Persistence Edge Cases:**
|
||||
- What's not tested: Very large conversations (1000+ messages), unicode characters, special characters
|
||||
- Files: `src/mai/conversation/state.py`, session persistence code
|
||||
- Risk: Session files corrupt or lose data with edge case inputs
|
||||
- Priority: High (data loss)
|
||||
|
||||
**Approval System Obfuscation Bypass:**
|
||||
- What's not tested: Obfuscated code patterns, string concatenation attacks, bytecode approaches
|
||||
- Files: `src/mai/sandbox/approval_system.py`
|
||||
- Risk: Risky code could slip through as "low risk" via obfuscation
|
||||
- Priority: High (security-critical)
|
||||
|
||||
**Memory Compression Round-Trip Data Loss:**
|
||||
- What's not tested: Whether compressed conversations can be exactly reconstructed
|
||||
- Files: `src/mai/memory/compression.py`, `src/mai/memory/storage.py`
|
||||
- Risk: Compression could lose important context patterns; compression metrics may be misleading
|
||||
- Priority: Medium (data integrity)
|
||||
|
||||
**Model Switching During Active Conversation:**
|
||||
- What's not tested: Switching models mid-conversation, context migration, embedding space changes
|
||||
- Files: `src/mai/model/switcher.py`, `src/mai/conversation/engine.py`
|
||||
- Risk: Context might not transfer correctly when models switch
|
||||
- Priority: Medium (feature reliability)
|
||||
|
||||
**Offline Queue Conflict Resolution:**
|
||||
- What's not tested: What happens when offline messages conflict with new context when reconnecting
|
||||
- Files: `src/mai/conversation/engine.py` (offline queueing)
|
||||
- Risk: Offline messages might create incoherent conversation when reconnected
|
||||
- Priority: Medium (conversation coherence)
|
||||
|
||||
**Resource Detector System Resource Edge Cases:**
|
||||
- What's not tested: GPU detection on systems with unusual hardware, CPU count on virtual systems
|
||||
- Files: `src/mai/model/resource_detector.py`
|
||||
- Risk: Wrong model selection due to misdetected resources
|
||||
- Priority: Low (graceful degradation usually handles this)
|
||||
|
||||
---
|
||||
|
||||
*Concerns audit: 2026-01-26*
|
||||
Reference in New Issue
Block a user