Files
Mai/.planning/codebase/CONCERNS.md
Mai Development f238a958a0 docs: map existing codebase
- STACK.md - Technologies and dependencies
- ARCHITECTURE.md - System design and patterns
- STRUCTURE.md - Directory layout
- CONVENTIONS.md - Code style and patterns
- TESTING.md - Test structure
- INTEGRATIONS.md - External services
- CONCERNS.md - Technical debt and issues
2026-01-26 23:14:44 -05:00

16 KiB

Codebase Concerns

Analysis Date: 2026-01-26

Tech Debt

Incomplete Memory System Integration:

  • Issue: Memory manager gracefully initializes but may fail silently when dependencies are missing
  • Files: src/mai/memory/manager.py
  • Impact: Memory features degrade ungracefully; users don't know compression or retrieval is disabled
  • Fix approach: Add explicit logging and health checks on startup, expose memory system status in CLI

Large Monolithic Memory Manager:

  • Issue: MemoryManager is 1036 lines with multiple responsibilities (storage, compression, retrieval orchestration)
  • Files: src/mai/memory/manager.py
  • Impact: Difficult to test individual memory subsystems; changes affect multiple concerns simultaneously
  • Fix approach: Extract retrieval delegation and compression orchestration into separate coordinator classes

Conversation Engine Complexity:

  • Issue: ConversationEngine is 648 lines handling timing, state, decomposition, reasoning, interruption, and metrics
  • Files: src/mai/conversation/engine.py
  • Impact: High cognitive load for maintainers; hard to isolate bugs in specific subsystems
  • Fix approach: Separate concerns into focused orchestrator (engine) and behavior modules (timing/reasoning/decomposition are already separated but loosely coupled)

Permission/Approval System Fragility:

  • Issue: ApprovalSystem uses regex pattern matching for risk analysis with hardcoded patterns
  • Files: src/mai/sandbox/approval_system.py
  • Impact: Pattern-matching approach is fragile (false positives/negatives); patterns not maintainable as code evolves
  • Fix approach: Replace regex with AST-based code analysis for more reliable risk detection; move risk patterns to configuration

Docker Executor Dependency Chain:

  • Issue: DockerExecutor falls back silently to unavailable state if Docker isn't installed
  • Files: src/mai/sandbox/docker_executor.py
  • Impact: Approval system thinks code is sandboxed when Docker is missing; security false sense of safety
  • Fix approach: Require explicit Docker availability check at startup; block code execution if Docker unavailable and user requests sandboxing

Known Bugs

Session Persistence Restoration:

  • Symptoms: "ConversationState object has no attribute 'set_conversation_history'" error when restarting CLI
  • Files: src/mai/conversation/state.py, src/app/__main__.py
  • Trigger: Start conversation, exit CLI, restart CLI session
  • Workaround: None - session restoration broken; users lose conversation history
  • Status: Identified in Phase 6 UAT but remediation code not applied (commit c70ee88 "Complete fresh slate" removed implementation)

Session File Feedback Missing:

  • Symptoms: Users don't see where/when session files are created
  • Files: src/app/__main__.py
  • Trigger: Create new session or use /session command
  • Workaround: Manually check ~/.mai/session.json directory
  • Status: Identified in Phase 6 UAT as major issue (test 3 failed)

Resource Display Color Coding:

  • Symptoms: Resource monitoring displays plain text instead of color-coded status indicators
  • Files: src/app/__main__.py
  • Trigger: Run CLI and observe resource display during conversation
  • Workaround: Parse output manually to understand resource status
  • Status: Identified in Phase 6 UAT as minor issue (test 5 failed); root cause: Rich console loses color output in non-terminal environments

Security Considerations

Approval System Risk Analysis Insufficient:

  • Risk: Regex-based risk detection can be bypassed with obfuscated code (e.g., string concatenation to build dangerous commands)
  • Files: src/mai/sandbox/approval_system.py
  • Current mitigation: Hardcoded high-risk patterns (os.system, exec, eval); fallback to block on unrecognized patterns
  • Recommendations:
    • Implement AST-based code analysis for more reliable detection
    • Add code deobfuscation step before risk analysis
    • Create risk assessment database with test cases and known bypasses
    • Require explicit docker verification before allowing code execution

Docker Fallback Security Gap:

  • Risk: Code could execute without actual sandboxing if Docker unavailable, creating false sense of security
  • Files: src/mai/sandbox/docker_executor.py
  • Current mitigation: AuditLogger records all execution; approval system presents requests regardless
  • Recommendations:
    • Fail-safe: Block code execution if Docker unavailable and user hasn't explicitly allowed non-sandboxed execution
    • Add warning dialog explaining sandbox unavailability
    • Log all non-sandboxed execution attempts explicitly
    • Require explicit override from user with confirmation

Approval Preference Learning Risk:

  • Risk: User can set "auto_allow" on risky code patterns; once learned, code execution auto-approves without user intervention
  • Files: src/mai/sandbox/approval_system.py (lines with user_preferences and auto_allow)
  • Current mitigation: Auto-allow only applies to LOW risk level code
  • Recommendations:
    • Require explicit user confirmation before enabling auto-allow (not just responding "a")
    • Log all auto-approved executions in audit trail with reason
    • Add periodic review mechanism for auto-allow rules (e.g., "You have X auto-approved rules, review them?" on startup)
    • Restrict auto-allow to strictly limited operation types (print, basic math, not file operations)

Performance Bottlenecks

Memory Retrieval Search Not Optimized:

  • Problem: ContextRetriever does full database scans for semantic similarity without indexing
  • Files: src/mai/memory/retrieval.py
  • Cause: Vector similarity search likely using brute-force nearest-neighbor without FAISS or similar
  • Improvement path:
    • Add FAISS vector index for semantic search acceleration
    • Implement result caching for frequent queries
    • Add search result pagination to avoid loading entire result sets
    • Benchmark retrieval latency and set targets (e.g., <500ms for top-10 similar conversations)

Conversation State History Accumulation:

  • Problem: ConversationState.conversation_history grows unbounded during long sessions
  • Files: src/mai/conversation/state.py
  • Cause: No automatic truncation or archival of old turns; all conversation turns kept in memory
  • Improvement path:
    • Implement sliding window of recent turns (e.g., keep last 50 turns in memory)
    • Archive old turns to disk and load on demand
    • Add compression trigger at configurable message count
    • Monitor memory usage and alert when conversation history exceeds threshold

Memory Manager Compression Not Scheduled:

  • Problem: Manual compress_conversation() calls required; no automatic compression scheduling
  • Files: src/mai/memory/manager.py
  • Cause: Compression is triggered manually or not at all; no background task or event-driven compression
  • Improvement path:
    • Implement background compression task triggered by conversation age or message count
    • Add periodic compression sweep for all old conversations
    • Make compression interval configurable (e.g., compress every 500 messages or 24 hours)
    • Track compression effectiveness and adjust thresholds

Fragile Areas

Ollama Integration Dependency:

  • Files: src/mai/model/ollama_client.py, src/mai/core/interface.py
  • Why fragile: Hard-coded Ollama endpoint assumption; no fallback model provider; no retry logic for model inference
  • Safe modification:
    • Use dependency injection for model provider (interface-based)
    • Add configurable model provider endpoints
    • Implement retry logic with exponential backoff for transient failures
    • Add model availability detection at startup
  • Test coverage: Limited tests for model switching and unavailability scenarios

Git Integration Fragility:

  • Files: src/mai/git/committer.py, src/mai/git/workflow.py
  • Why fragile: Assumes clean git state; no handling for merge conflicts, detached HEAD, or dirty working directory
  • Safe modification:
    • Add pre-commit git status validation
    • Handle merge conflict detection and defer commits
    • Implement conflict resolution strategy (manual review or aborting)
    • Test against all git states (detached HEAD, dirty working tree, conflicted merge)
  • Test coverage: No tests for edge cases like merge conflicts

Conversation State Serialization Round-Trip:

  • Files: src/mai/conversation/state.py, src/mai/models/conversation.py
  • Why fragile: ConversationTurn -> Ollama message -> ConversationTurn conversion can lose context
  • Safe modification:
    • Add comprehensive unit tests for serialization round-trip
    • Document serialization format and invariants
    • Add validation after deserialization (verify message count, order, role integrity)
    • Create fixture tests with edge cases (unicode, very long messages, special characters)
  • Test coverage: No existing tests for message serialization/deserialization

Docker Configuration Hardcoding:

  • Files: src/mai/sandbox/docker_executor.py
  • Why fragile: Docker image names, CPU limits, memory limits hardcoded as class constants
  • Safe modification:
    • Move Docker config to configuration file
    • Add validation on startup that Docker limits match system resources
    • Document all Docker configuration assumptions
    • Make limits tunable per system resource profile
  • Test coverage: Docker integration tests likely mocked; no testing on actual Docker variations

Scaling Limits

Memory Database Size Growth:

  • Current capacity: SQLite with no explicit limits; storage grows with every conversation
  • Limit: SQLite performance degrades significantly above ~1GB; queries become slow
  • Scaling path:
    • Implement database rotation (archive old conversations, start new DB periodically)
    • Add migration path to PostgreSQL for production deployments
    • Implement automatic old conversation archival (move to cold storage after 30 days)
    • Add database vacuum and index optimization on scheduled basis

Conversation Context Window Management:

  • Current capacity: Model context window determined by Ollama model selection (varies)
  • Limit: ConversationEngine doesn't prevent context overflow; will fail when history exceeds model limit
  • Scaling path:
    • Track token count of conversation history and refuse new messages before overflow
    • Implement automatic compression trigger at 80% context usage
    • Add model switching logic to use larger-context models if available
    • Document context budget requirements per model

Approval History Unbounded Growth:

  • Current capacity: ApprovalSystem.approval_history list grows indefinitely
  • Limit: Memory accumulation over time; each approval decision stored in memory forever
  • Scaling path:
    • Archive approval history to database after threshold (e.g., 1000 decisions)
    • Implement approval history rotation with configurable retention
    • Add aggregate statistics (approval patterns) instead of storing raw history
    • Clean up approval history on startup or scheduled task

Dependencies at Risk

Ollama Dependency and Model Availability:

  • Risk: Hard requirement on Ollama being available and having models installed
  • Impact: Mai cannot function without Ollama; no fallback to cloud inference or other providers
  • Migration plan:
    • Implement abstract model provider interface
    • Add support for OpenAI/other cloud models as fallback (even if v1 is offline-first)
    • Document minimum Ollama model requirements
    • Add diagnostic tool to check Ollama health on startup

Docker Dependency for Sandboxing:

  • Risk: Docker required for code execution safety; no alternative sandbox implementations
  • Impact: Users without Docker can't safely execute generated code; no graceful degradation
  • Migration plan:
    • Implement abstract executor interface (not just DockerExecutor)
    • Add noop executor for testing
    • Consider lightweight alternatives (seccomp, chroot, or bubblewrap) for Linux systems
    • Add explicit warning if Docker unavailable

Rich Library Terminal Detection:

  • Risk: Rich disables colors in non-terminal environments; users see degraded UX
  • Impact: Resource monitoring and status displays lack visual feedback in non-terminal contexts
  • Migration plan:
    • Use Console(force_terminal=True) to force color output when desired
    • Add configuration option for color preference
    • Implement fallback emoji/unicode indicators for non-color environments
    • Test in various terminal emulators and SSH sessions

Missing Critical Features

Session Data Portability:

  • Problem: Session files are JSON but no export/import mechanism; can't backup or migrate sessions
  • Blocks: Users can't back up conversations; losing ~/.mai/session.json loses all context
  • Fix: Add export/import commands (/export, /import) and document session file format

Conversation Memory Persistence:

  • Problem: Conversation history is session-scoped (stored in memory); not saved to memory system
  • Blocks: Long-term pattern learning relies on memory system but conversations aren't automatically stored
  • Fix: Implement automatic conversation archival to memory system after session ends

User Preference Learning Audit Trail:

  • Problem: User preferences for auto-approval learned silently; no visibility into what patterns auto-approve
  • Blocks: Users can't audit their own auto-approval rules; hard to recover from accidentally enabling auto-allow
  • Fix: Add /preferences or /audit command to show all learned rules and allow revocation

Resource Constraint Graceful Degradation:

  • Problem: System shows resource usage but doesn't adapt model selection or conversation behavior
  • Blocks: Mai can't suggest switching to smaller models when resources tight
  • Fix: Implement resource-aware model recommendation system

Approval Change Logging:

  • Problem: Approval decisions not tracked in git; can't audit "who approved what when"
  • Blocks: No accountability trail for approval decisions
  • Fix: Log all approval decisions to git with commit messages including timestamp and user

Test Coverage Gaps

Docker Executor Network Isolation:

  • What's not tested: Whether network actually restricted in Docker containers
  • Files: src/mai/sandbox/docker_executor.py
  • Risk: Code might have network access despite supposed isolation
  • Priority: High (security-critical)

Session Persistence Edge Cases:

  • What's not tested: Very large conversations (1000+ messages), unicode characters, special characters
  • Files: src/mai/conversation/state.py, session persistence code
  • Risk: Session files corrupt or lose data with edge case inputs
  • Priority: High (data loss)

Approval System Obfuscation Bypass:

  • What's not tested: Obfuscated code patterns, string concatenation attacks, bytecode approaches
  • Files: src/mai/sandbox/approval_system.py
  • Risk: Risky code could slip through as "low risk" via obfuscation
  • Priority: High (security-critical)

Memory Compression Round-Trip Data Loss:

  • What's not tested: Whether compressed conversations can be exactly reconstructed
  • Files: src/mai/memory/compression.py, src/mai/memory/storage.py
  • Risk: Compression could lose important context patterns; compression metrics may be misleading
  • Priority: Medium (data integrity)

Model Switching During Active Conversation:

  • What's not tested: Switching models mid-conversation, context migration, embedding space changes
  • Files: src/mai/model/switcher.py, src/mai/conversation/engine.py
  • Risk: Context might not transfer correctly when models switch
  • Priority: Medium (feature reliability)

Offline Queue Conflict Resolution:

  • What's not tested: What happens when offline messages conflict with new context when reconnecting
  • Files: src/mai/conversation/engine.py (offline queueing)
  • Risk: Offline messages might create incoherent conversation when reconnected
  • Priority: Medium (conversation coherence)

Resource Detector System Resource Edge Cases:

  • What's not tested: GPU detection on systems with unusual hardware, CPU count on virtual systems
  • Files: src/mai/model/resource_detector.py
  • Risk: Wrong model selection due to misdetected resources
  • Priority: Low (graceful degradation usually handles this)

Concerns audit: 2026-01-26