docs: map existing codebase

- STACK.md - Technologies and dependencies - ARCHITECTURE.md - System design and patterns - STRUCTURE.md - Directory layout - CONVENTIONS.md - Code style and patterns - TESTING.md - Test structure - INTEGRATIONS.md - External services - CONCERNS.md - Technical debt and issues
2026-01-26 23:14:44 -05:00
parent b1d71bc22b
commit f238a958a0
7 changed files with 1667 additions and 0 deletions
--- a/.planning/codebase/ARCHITECTURE.md
+++ b/.planning/codebase/ARCHITECTURE.md
@@ -0,0 +1,177 @@
+# Architecture
+
+**Analysis Date:** 2026-01-26
+
+## Pattern Overview
+
+**Overall:** Layered modular architecture with clear separation of concerns
+
+**Key Characteristics:**
+- Modular layer separation (Model Interface, Memory, Conversation, Interfaces, Safety, Core Personality)
+- Local-first, offline-capable design with graceful degradation
+- Plugin-like interface system allowing CLI and Discord without tight coupling
+- Sandboxed execution environment for self-improvement code
+- Bidirectional feedback loops between conversation, memory, and personality
+
+## Layers
+
+**Model Interface (Inference Layer):**
+- Purpose: Abstract model inference operations and handle model switching
+- Location: `src/models/`
+- Contains: Model adapters, resource monitoring, context management
+- Depends on: Local Ollama/LMStudio, system resource API
+- Used by: Conversation engine, core Mai reasoning
+
+**Memory System (Persistence Layer):**
+- Purpose: Store and retrieve conversation history, patterns, learned behaviors
+- Location: `src/memory/`
+- Contains: SQLite operations, vector search, compression logic, pattern extraction
+- Depends on: Local SQLite database, embeddings generation
+- Used by: Conversation engine for context retrieval, personality learning
+
+**Conversation Engine (Reasoning Layer):**
+- Purpose: Orchestrate multi-turn conversations with context awareness
+- Location: `src/conversation/`
+- Contains: Turn handling, context window management, clarifying question logic, reasoning transparency
+- Depends on: Model Interface, Memory System, Personality System
+- Used by: Interface layers (CLI, Discord)
+
+**Personality System (Behavior Layer):**
+- Purpose: Enforce core values and enable personality adaptation
+- Location: `src/personality/`
+- Contains: Core personality rules, learned behavior layers, guardrails, values enforcement
+- Depends on: Configuration files (YAML), Memory System for learned patterns
+- Used by: Conversation Engine for decision making and refusal logic
+
+**Safety & Execution Sandbox (Security Layer):**
+- Purpose: Validate and execute generated code safely with risk assessment
+- Location: `src/safety/`
+- Contains: Risk analysis, Docker sandbox management, AST validation, audit logging
+- Depends on: Docker runtime, code analysis libraries
+- Used by: Self-improvement system for generated code execution
+
+**Self-Improvement System (Autonomous Layer):**
+- Purpose: Analyze own code, generate improvements, manage review and approval workflow
+- Location: `src/selfmod/`
+- Contains: Code analysis, improvement generation, review coordination, git integration
+- Depends on: Safety layer, second-agent review API, git operations, code parser
+- Used by: Core Mai autonomous operation
+
+**Interface Adapters (Presentation Layer):**
+- Purpose: Translate between external communication channels and core conversation engine
+- Location: `src/interfaces/`
+- Contains: CLI handler, Discord bot, message queuing, approval workflow
+- Depends on: Conversation Engine, self-improvement system
+- Used by: External communication channels (terminal, Discord)
+
+## Data Flow
+
+**Conversation Flow:**
+
+1. User message arrives via interface (CLI or Discord)
+2. Message queued if offline, held in memory if online
+3. Interface adapter passes to Conversation Engine
+4. Conversation Engine queries Memory System for relevant context
+5. Context + message passed to Model Interface with system prompt (includes personality)
+6. Model generates response
+7. Response returned to Conversation Engine
+8. Conversation Engine stores turn in Memory System
+9. Response sent back through interface to user
+10. Memory System may trigger asynchronous compression if history grows
+
+**Self-Improvement Flow:**
+
+1. Self-Improvement System analyzes own code (triggered by timer or explicit request)
+2. Generates potential improvements as Python code patches
+3. Performs AST validation and basic static analysis
+4. Submits for second-agent review with risk classification
+5. If LOW risk: auto-approved, sent to Safety layer for execution
+6. If MEDIUM risk: user approval required via CLI or Discord reactions
+7. If HIGH/BLOCKED risk: blocked, logged, user notified
+8. Approved changes executed in Docker sandbox with resource limits
+9. Execution results captured, logged, committed to git with clear message
+10. Breaking changes require explicit user approval before commit
+
+**State Management:**
+- Conversation state: Maintained in Memory System as persisted history
+- Model state: Loaded fresh per request, no state persistence between calls
+- Personality state: Mix of code-enforced rules and learned behavior layers in Memory
+- Resource state: Monitored continuously, triggering model downgrade if limits approached
+- Approval state: Tracked in git commits, audit log, and in-memory queue
+
+## Key Abstractions
+
+**ModelAdapter:**
+- Purpose: Abstract different model providers (Ollama local models)
+- Examples: `src/models/ollama_adapter.py`, `src/models/model_manager.py`
+- Pattern: Strategy pattern with resource-aware selection logic
+
+**ContextWindow:**
+- Purpose: Manage token budget and conversation history within model limits
+- Examples: `src/conversation/context_manager.py`
+- Pattern: Intelligent windowing with semantic importance weighting
+
+**MemoryStore:**
+- Purpose: Unified interface to conversation history, patterns, and learned behaviors
+- Examples: `src/memory/store.py`, `src/memory/vector_search.py`
+- Pattern: Repository pattern with multiple index types
+
+**PersonalityRules:**
+- Purpose: Encode Mai's core values as evaluable constraints
+- Examples: `src/personality/core_rules.py`, `config/personality.yaml`
+- Pattern: Rule engine with value-based decision making
+
+**SandboxExecutor:**
+- Purpose: Execute generated code safely with resource limits and audit trail
+- Examples: `src/safety/executor.py`, `src/safety/risk_analyzer.py`
+- Pattern: Facade wrapping Docker API with security checks
+
+**ApprovalWorkflow:**
+- Purpose: Coordinate user and agent approval for code changes
+- Examples: `src/interfaces/approval_handler.py`, `src/selfmod/reviewer.py`
+- Pattern: State machine with async notification coordination
+
+## Entry Points
+
+**CLI Entry:**
+- Location: `src/interfaces/cli.py` / `__main__.py`
+- Triggers: `python -m mai` or `mai` command
+- Responsibilities: Initialize conversation session, handle user input loop, display responses, manage approval prompts
+
+**Discord Entry:**
+- Location: `src/interfaces/discord_bot.py`
+- Triggers: Discord message events
+- Responsibilities: Extract message context, route to conversation engine, format response, handle reactions for approvals
+
+**Self-Improvement Entry:**
+- Location: `src/selfmod/scheduler.py`
+- Triggers: Timer-based (periodic analysis) or explicit trigger from conversation
+- Responsibilities: Analyze code, generate improvements, initiate review workflow
+
+**Core Mai Entry:**
+- Location: `src/mai.py` (main class)
+- Triggers: System startup
+- Responsibilities: Initialize all systems (models, memory, personality), coordinate between layers
+
+## Error Handling
+
+**Strategy:** Graceful degradation with clear user communication
+
+**Patterns:**
+- Model unavailable: Fall back to smaller model if available, notify user of reduced capabilities
+- Memory retrieval failure: Continue conversation without historical context, log error
+- Network error: Queue offline messages, retry on reconnection (Discord only)
+- Unsafe code generated: Block execution, log with risk analysis, notify user
+- Syntax error in generated code: Reject change, log, generate new proposal
+
+## Cross-Cutting Concerns
+
+**Logging:** Structured logging with severity levels throughout codebase. Use Python `logging` module with JSON formatter for production. Log all: model selections, memory operations, safety decisions, approval workflows, code changes.
+
+**Validation:** Input validation at interface boundaries. AST validation for generated code. Type hints throughout codebase with mypy enforcement.
+
+**Authentication:** None required for local CLI. Discord bot authenticated via token (environment variable). API calls between services use simple function calls (single-process model).
+
+---
+
+*Architecture analysis: 2026-01-26*
--- a/.planning/codebase/CONCERNS.md
+++ b/.planning/codebase/CONCERNS.md
@@ -0,0 +1,297 @@
+# Codebase Concerns
+
+**Analysis Date:** 2026-01-26
+
+## Tech Debt
+
+**Incomplete Memory System Integration:**
+- Issue: Memory manager gracefully initializes but may fail silently when dependencies are missing
+- Files: `src/mai/memory/manager.py`
+- Impact: Memory features degrade ungracefully; users don't know compression or retrieval is disabled
+- Fix approach: Add explicit logging and health checks on startup, expose memory system status in CLI
+
+**Large Monolithic Memory Manager:**
+- Issue: MemoryManager is 1036 lines with multiple responsibilities (storage, compression, retrieval orchestration)
+- Files: `src/mai/memory/manager.py`
+- Impact: Difficult to test individual memory subsystems; changes affect multiple concerns simultaneously
+- Fix approach: Extract retrieval delegation and compression orchestration into separate coordinator classes
+
+**Conversation Engine Complexity:**
+- Issue: ConversationEngine is 648 lines handling timing, state, decomposition, reasoning, interruption, and metrics
+- Files: `src/mai/conversation/engine.py`
+- Impact: High cognitive load for maintainers; hard to isolate bugs in specific subsystems
+- Fix approach: Separate concerns into focused orchestrator (engine) and behavior modules (timing/reasoning/decomposition are already separated but loosely coupled)
+
+**Permission/Approval System Fragility:**
+- Issue: ApprovalSystem uses regex pattern matching for risk analysis with hardcoded patterns
+- Files: `src/mai/sandbox/approval_system.py`
+- Impact: Pattern-matching approach is fragile (false positives/negatives); patterns not maintainable as code evolves
+- Fix approach: Replace regex with AST-based code analysis for more reliable risk detection; move risk patterns to configuration
+
+**Docker Executor Dependency Chain:**
+- Issue: DockerExecutor falls back silently to unavailable state if Docker isn't installed
+- Files: `src/mai/sandbox/docker_executor.py`
+- Impact: Approval system thinks code is sandboxed when Docker is missing; security false sense of safety
+- Fix approach: Require explicit Docker availability check at startup; block code execution if Docker unavailable and user requests sandboxing
+
+## Known Bugs
+
+**Session Persistence Restoration:**
+- Symptoms: "ConversationState object has no attribute 'set_conversation_history'" error when restarting CLI
+- Files: `src/mai/conversation/state.py`, `src/app/__main__.py`
+- Trigger: Start conversation, exit CLI, restart CLI session
+- Workaround: None - session restoration broken; users lose conversation history
+- Status: Identified in Phase 6 UAT but remediation code not applied (commit c70ee88 "Complete fresh slate" removed implementation)
+
+**Session File Feedback Missing:**
+- Symptoms: Users don't see where/when session files are created
+- Files: `src/app/__main__.py`
+- Trigger: Create new session or use /session command
+- Workaround: Manually check ~/.mai/session.json directory
+- Status: Identified in Phase 6 UAT as major issue (test 3 failed)
+
+**Resource Display Color Coding:**
+- Symptoms: Resource monitoring displays plain text instead of color-coded status indicators
+- Files: `src/app/__main__.py`
+- Trigger: Run CLI and observe resource display during conversation
+- Workaround: Parse output manually to understand resource status
+- Status: Identified in Phase 6 UAT as minor issue (test 5 failed); root cause: Rich console loses color output in non-terminal environments
+
+## Security Considerations
+
+**Approval System Risk Analysis Insufficient:**
+- Risk: Regex-based risk detection can be bypassed with obfuscated code (e.g., string concatenation to build dangerous commands)
+- Files: `src/mai/sandbox/approval_system.py`
+- Current mitigation: Hardcoded high-risk patterns (os.system, exec, eval); fallback to block on unrecognized patterns
+- Recommendations:
+  - Implement AST-based code analysis for more reliable detection
+  - Add code deobfuscation step before risk analysis
+  - Create risk assessment database with test cases and known bypasses
+  - Require explicit docker verification before allowing code execution
+
+**Docker Fallback Security Gap:**
+- Risk: Code could execute without actual sandboxing if Docker unavailable, creating false sense of security
+- Files: `src/mai/sandbox/docker_executor.py`
+- Current mitigation: AuditLogger records all execution; approval system presents requests regardless
+- Recommendations:
+  - Fail-safe: Block code execution if Docker unavailable and user hasn't explicitly allowed non-sandboxed execution
+  - Add warning dialog explaining sandbox unavailability
+  - Log all non-sandboxed execution attempts explicitly
+  - Require explicit override from user with confirmation
+
+**Approval Preference Learning Risk:**
+- Risk: User can set "auto_allow" on risky code patterns; once learned, code execution auto-approves without user intervention
+- Files: `src/mai/sandbox/approval_system.py` (lines with `user_preferences` and `auto_allow`)
+- Current mitigation: Auto-allow only applies to LOW risk level code
+- Recommendations:
+  - Require explicit user confirmation before enabling auto-allow (not just responding "a")
+  - Log all auto-approved executions in audit trail with reason
+  - Add periodic review mechanism for auto-allow rules (e.g., "You have X auto-approved rules, review them?" on startup)
+  - Restrict auto-allow to strictly limited operation types (print, basic math, not file operations)
+
+## Performance Bottlenecks
+
+**Memory Retrieval Search Not Optimized:**
+- Problem: ContextRetriever does full database scans for semantic similarity without indexing
+- Files: `src/mai/memory/retrieval.py`
+- Cause: Vector similarity search likely using brute-force nearest-neighbor without FAISS or similar
+- Improvement path:
+  - Add FAISS vector index for semantic search acceleration
+  - Implement result caching for frequent queries
+  - Add search result pagination to avoid loading entire result sets
+  - Benchmark retrieval latency and set targets (e.g., <500ms for top-10 similar conversations)
+
+**Conversation State History Accumulation:**
+- Problem: ConversationState.conversation_history grows unbounded during long sessions
+- Files: `src/mai/conversation/state.py`
+- Cause: No automatic truncation or archival of old turns; all conversation turns kept in memory
+- Improvement path:
+  - Implement sliding window of recent turns (e.g., keep last 50 turns in memory)
+  - Archive old turns to disk and load on demand
+  - Add compression trigger at configurable message count
+  - Monitor memory usage and alert when conversation history exceeds threshold
+
+**Memory Manager Compression Not Scheduled:**
+- Problem: Manual `compress_conversation()` calls required; no automatic compression scheduling
+- Files: `src/mai/memory/manager.py`
+- Cause: Compression is triggered manually or not at all; no background task or event-driven compression
+- Improvement path:
+  - Implement background compression task triggered by conversation age or message count
+  - Add periodic compression sweep for all old conversations
+  - Make compression interval configurable (e.g., compress every 500 messages or 24 hours)
+  - Track compression effectiveness and adjust thresholds
+
+## Fragile Areas
+
+**Ollama Integration Dependency:**
+- Files: `src/mai/model/ollama_client.py`, `src/mai/core/interface.py`
+- Why fragile: Hard-coded Ollama endpoint assumption; no fallback model provider; no retry logic for model inference
+- Safe modification:
+  - Use dependency injection for model provider (interface-based)
+  - Add configurable model provider endpoints
+  - Implement retry logic with exponential backoff for transient failures
+  - Add model availability detection at startup
+- Test coverage: Limited tests for model switching and unavailability scenarios
+
+**Git Integration Fragility:**
+- Files: `src/mai/git/committer.py`, `src/mai/git/workflow.py`
+- Why fragile: Assumes clean git state; no handling for merge conflicts, detached HEAD, or dirty working directory
+- Safe modification:
+  - Add pre-commit git status validation
+  - Handle merge conflict detection and defer commits
+  - Implement conflict resolution strategy (manual review or aborting)
+  - Test against all git states (detached HEAD, dirty working tree, conflicted merge)
+- Test coverage: No tests for edge cases like merge conflicts
+
+**Conversation State Serialization Round-Trip:**
+- Files: `src/mai/conversation/state.py`, `src/mai/models/conversation.py`
+- Why fragile: ConversationTurn -> Ollama message -> ConversationTurn conversion can lose context
+- Safe modification:
+  - Add comprehensive unit tests for serialization round-trip
+  - Document serialization format and invariants
+  - Add validation after deserialization (verify message count, order, role integrity)
+  - Create fixture tests with edge cases (unicode, very long messages, special characters)
+- Test coverage: No existing tests for message serialization/deserialization
+
+**Docker Configuration Hardcoding:**
+- Files: `src/mai/sandbox/docker_executor.py`
+- Why fragile: Docker image names, CPU limits, memory limits hardcoded as class constants
+- Safe modification:
+  - Move Docker config to configuration file
+  - Add validation on startup that Docker limits match system resources
+  - Document all Docker configuration assumptions
+  - Make limits tunable per system resource profile
+- Test coverage: Docker integration tests likely mocked; no testing on actual Docker variations
+
+## Scaling Limits
+
+**Memory Database Size Growth:**
+- Current capacity: SQLite with no explicit limits; storage grows with every conversation
+- Limit: SQLite performance degrades significantly above ~1GB; queries become slow
+- Scaling path:
+  - Implement database rotation (archive old conversations, start new DB periodically)
+  - Add migration path to PostgreSQL for production deployments
+  - Implement automatic old conversation archival (move to cold storage after 30 days)
+  - Add database vacuum and index optimization on scheduled basis
+
+**Conversation Context Window Management:**
+- Current capacity: Model context window determined by Ollama model selection (varies)
+- Limit: ConversationEngine doesn't prevent context overflow; will fail when history exceeds model limit
+- Scaling path:
+  - Track token count of conversation history and refuse new messages before overflow
+  - Implement automatic compression trigger at 80% context usage
+  - Add model switching logic to use larger-context models if available
+  - Document context budget requirements per model
+
+**Approval History Unbounded Growth:**
+- Current capacity: ApprovalSystem.approval_history list grows indefinitely
+- Limit: Memory accumulation over time; each approval decision stored in memory forever
+- Scaling path:
+  - Archive approval history to database after threshold (e.g., 1000 decisions)
+  - Implement approval history rotation with configurable retention
+  - Add aggregate statistics (approval patterns) instead of storing raw history
+  - Clean up approval history on startup or scheduled task
+
+## Dependencies at Risk
+
+**Ollama Dependency and Model Availability:**
+- Risk: Hard requirement on Ollama being available and having models installed
+- Impact: Mai cannot function without Ollama; no fallback to cloud inference or other providers
+- Migration plan:
+  - Implement abstract model provider interface
+  - Add support for OpenAI/other cloud models as fallback (even if v1 is offline-first)
+  - Document minimum Ollama model requirements
+  - Add diagnostic tool to check Ollama health on startup
+
+**Docker Dependency for Sandboxing:**
+- Risk: Docker required for code execution safety; no alternative sandbox implementations
+- Impact: Users without Docker can't safely execute generated code; no graceful degradation
+- Migration plan:
+  - Implement abstract executor interface (not just DockerExecutor)
+  - Add noop executor for testing
+  - Consider lightweight alternatives (seccomp, chroot, or bubblewrap) for Linux systems
+  - Add explicit warning if Docker unavailable
+
+**Rich Library Terminal Detection:**
+- Risk: Rich disables colors in non-terminal environments; users see degraded UX
+- Impact: Resource monitoring and status displays lack visual feedback in non-terminal contexts
+- Migration plan:
+  - Use Console(force_terminal=True) to force color output when desired
+  - Add configuration option for color preference
+  - Implement fallback emoji/unicode indicators for non-color environments
+  - Test in various terminal emulators and SSH sessions
+
+## Missing Critical Features
+
+**Session Data Portability:**
+- Problem: Session files are JSON but no export/import mechanism; can't backup or migrate sessions
+- Blocks: Users can't back up conversations; losing ~/.mai/session.json loses all context
+- Fix: Add export/import commands (/export, /import) and document session file format
+
+**Conversation Memory Persistence:**
+- Problem: Conversation history is session-scoped (stored in memory); not saved to memory system
+- Blocks: Long-term pattern learning relies on memory system but conversations aren't automatically stored
+- Fix: Implement automatic conversation archival to memory system after session ends
+
+**User Preference Learning Audit Trail:**
+- Problem: User preferences for auto-approval learned silently; no visibility into what patterns auto-approve
+- Blocks: Users can't audit their own auto-approval rules; hard to recover from accidentally enabling auto-allow
+- Fix: Add /preferences or /audit command to show all learned rules and allow revocation
+
+**Resource Constraint Graceful Degradation:**
+- Problem: System shows resource usage but doesn't adapt model selection or conversation behavior
+- Blocks: Mai can't suggest switching to smaller models when resources tight
+- Fix: Implement resource-aware model recommendation system
+
+**Approval Change Logging:**
+- Problem: Approval decisions not tracked in git; can't audit "who approved what when"
+- Blocks: No accountability trail for approval decisions
+- Fix: Log all approval decisions to git with commit messages including timestamp and user
+
+## Test Coverage Gaps
+
+**Docker Executor Network Isolation:**
+- What's not tested: Whether network actually restricted in Docker containers
+- Files: `src/mai/sandbox/docker_executor.py`
+- Risk: Code might have network access despite supposed isolation
+- Priority: High (security-critical)
+
+**Session Persistence Edge Cases:**
+- What's not tested: Very large conversations (1000+ messages), unicode characters, special characters
+- Files: `src/mai/conversation/state.py`, session persistence code
+- Risk: Session files corrupt or lose data with edge case inputs
+- Priority: High (data loss)
+
+**Approval System Obfuscation Bypass:**
+- What's not tested: Obfuscated code patterns, string concatenation attacks, bytecode approaches
+- Files: `src/mai/sandbox/approval_system.py`
+- Risk: Risky code could slip through as "low risk" via obfuscation
+- Priority: High (security-critical)
+
+**Memory Compression Round-Trip Data Loss:**
+- What's not tested: Whether compressed conversations can be exactly reconstructed
+- Files: `src/mai/memory/compression.py`, `src/mai/memory/storage.py`
+- Risk: Compression could lose important context patterns; compression metrics may be misleading
+- Priority: Medium (data integrity)
+
+**Model Switching During Active Conversation:**
+- What's not tested: Switching models mid-conversation, context migration, embedding space changes
+- Files: `src/mai/model/switcher.py`, `src/mai/conversation/engine.py`
+- Risk: Context might not transfer correctly when models switch
+- Priority: Medium (feature reliability)
+
+**Offline Queue Conflict Resolution:**
+- What's not tested: What happens when offline messages conflict with new context when reconnecting
+- Files: `src/mai/conversation/engine.py` (offline queueing)
+- Risk: Offline messages might create incoherent conversation when reconnected
+- Priority: Medium (conversation coherence)
+
+**Resource Detector System Resource Edge Cases:**
+- What's not tested: GPU detection on systems with unusual hardware, CPU count on virtual systems
+- Files: `src/mai/model/resource_detector.py`
+- Risk: Wrong model selection due to misdetected resources
+- Priority: Low (graceful degradation usually handles this)
+
+---
+
+*Concerns audit: 2026-01-26*
--- a/.planning/codebase/CONVENTIONS.md
+++ b/.planning/codebase/CONVENTIONS.md
@@ -0,0 +1,298 @@
+# Coding Conventions
+
+**Analysis Date:** 2026-01-26
+
+## Status
+
+**Note:** This codebase is in planning phase. No source code has been written yet. These conventions are **prescriptive** for the Mai project and should be applied to all code from the first commit forward.
+
+## Naming Patterns
+
+**Files:**
+- Python modules: `lowercase_with_underscores.py` (PEP 8)
+- Configuration files: `config.yaml`, `.env.example`
+- Test files: `test_module_name.py` (co-located with source)
+- Example: `src/memory/storage.py`, `src/memory/test_storage.py`
+
+**Functions:**
+- Use `snake_case` for all function names (PEP 8)
+- Private functions: Prefix with single underscore `_private_function()`
+- Async functions: Use `async def async_operation()` naming
+- Example: `def get_conversation_history()`, `async def stream_response()`
+
+**Variables:**
+- Use `snake_case` for all variable names
+- Constants: `UPPERCASE_WITH_UNDERSCORES`
+- Private module variables: Prefix with `_`
+- Example: `conversation_history`, `MAX_CONTEXT_TOKENS`, `_internal_cache`
+
+**Types:**
+- Classes: `PascalCase`
+- Enums: `PascalCase` (inherit from `Enum`)
+- TypedDict: `PascalCase` with `Dict` suffix
+- Example: `class ConversationManager`, `class ErrorLevel(Enum)`, `class MemoryConfigDict(TypedDict)`
+
+**Directories:**
+- Core modules: `src/[module_name]/` (lowercase, plural when appropriate)
+- Example: `src/models/`, `src/memory/`, `src/safety/`, `src/interfaces/`
+
+## Code Style
+
+**Formatting:**
+- Tool: **Ruff** (formatter and linter)
+- Line length: 88 characters (Ruff default)
+- Quote style: Double quotes (`"string"`)
+- Indentation: 4 spaces (no tabs)
+
+**Linting:**
+- Tool: **Ruff**
+- Configuration enforced via `.ruff.toml` (when created)
+- All imports must pass ruff checks
+- No unused imports allowed
+- Type hints required for public functions
+
+**Python Version:**
+- Minimum: Python 3.10+
+- Use modern type hints: `from typing import *`
+- Use `str | None` instead of `Optional[str]` (union syntax)
+
+## Import Organization
+
+**Order:**
+1. Standard library imports (`import os`, `import sys`)
+2. Third-party imports (`import discord`, `import numpy`)
+3. Local imports (`from src.memory import Storage`)
+4. Blank line between each group
+
+**Example:**
+```python
+import asyncio
+import json
+from pathlib import Path
+from typing import Optional
+
+import discord
+from dotenv import load_dotenv
+
+from src.memory import ConversationStorage
+from src.models import ModelManager
+```
+
+**Path Aliases:**
+- Use relative imports from `src/` root
+- Avoid deep relative imports (no `../../../`)
+- Example: `from src.safety import SandboxExecutor` not `from ...safety import SandboxExecutor`
+
+## Error Handling
+
+**Patterns:**
+- Define domain-specific exceptions in `src/exceptions.py`
+- Use exception hierarchy (base `MaiException`, specific subclasses)
+- Always include context in exceptions (error code, details, suggestions)
+- Example:
+
+```python
+class MaiException(Exception):
+    """Base exception for Mai framework."""
+    def __init__(self, code: str, message: str, details: dict | None = None):
+        self.code = code
+        self.message = message
+        self.details = details or {}
+        super().__init__(f"[{code}] {message}")
+
+class ModelError(MaiException):
+    """Raised when model inference fails."""
+    pass
+
+class MemoryError(MaiException):
+    """Raised when memory operations fail."""
+    pass
+```
+
+- Log before raising (see Logging section)
+- Use context managers for cleanup (async context managers for async code)
+- Never catch bare `Exception` - catch specific exceptions
+
+## Logging
+
+**Framework:** `logging` module (Python standard library)
+
+**Patterns:**
+- Create logger per module: `logger = logging.getLogger(__name__)`
+- Log levels guide:
+  - `DEBUG`: Detailed diagnostic info (token counts, decision trees)
+  - `INFO`: Significant operational events (conversation started, model loaded)
+  - `WARNING`: Unexpected but handled conditions (fallback triggered, retry)
+  - `ERROR`: Failed operation (model error, memory access failed)
+  - `CRITICAL`: System-level failures (cannot recover)
+- Structured logging preferred (include operation context)
+- Example:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+
+async def invoke_model(prompt: str, model: str) -> str:
+    logger.debug(f"Invoking model={model} with token_count={len(prompt.split())}")
+    try:
+        response = await model_manager.generate(prompt)
+        logger.info(f"Model response generated, length={len(response)}")
+        return response
+    except ModelError as e:
+        logger.error(f"Model invocation failed: {e.code}", exc_info=True)
+        raise
+```
+
+## Comments
+
+**When to Comment:**
+- Complex logic requiring explanation (multi-step algorithms, non-obvious decisions)
+- Important context that code alone cannot convey (why a workaround exists)
+- Do NOT comment obvious code (`x = 1  # set x to 1` is noise)
+- Do NOT duplicate what the code already says
+
+**JSDoc/Docstrings:**
+- Use Google-style docstrings for all public functions/classes
+- Include return type even if type hints exist (for readability)
+- Example:
+
+```python
+async def get_memory_context(
+    query: str,
+    max_tokens: int = 2000,
+) -> str:
+    """Retrieve relevant memory context for a query.
+
+    Performs vector similarity search on conversation history,
+    compresses results to fit token budget, and returns formatted context.
+
+    Args:
+        query: The search query for memory retrieval.
+        max_tokens: Maximum tokens in returned context (default 2000).
+
+    Returns:
+        Formatted memory context as markdown-structured string.
+
+    Raises:
+        MemoryError: If database query fails or storage is corrupted.
+    """
+```
+
+## Function Design
+
+**Size:**
+- Target: Functions under 50 lines (hard limit: 100 lines)
+- Break complex logic into smaller helper functions
+- One responsibility per function (single responsibility principle)
+
+**Parameters:**
+- Maximum 4 positional parameters
+- Use keyword-only arguments for optional params: `def func(required, *, optional=None)`
+- Use dataclasses or TypedDict for complex parameter groups
+- Example:
+
+```python
+# Good: Clear structure
+async def approve_change(
+    change_id: str,
+    *,
+    reviewer_id: str,
+    decision: Literal["approve", "reject"],
+    reason: str | None = None,
+) -> None:
+    pass
+
+# Bad: Too many params
+async def approve_change(change_id, reviewer_id, decision, reason, timestamp, context, metadata):
+    pass
+```
+
+**Return Values:**
+- Explicitly return values (no implicit `None` returns unless documented)
+- Use `Optional[T]` or `T | None` in type hints for nullable returns
+- Prefer returning data objects over tuples: return `Result` not `(status, data, error)`
+- Async functions return awaitable, not callbacks
+
+## Module Design
+
+**Exports:**
+- Define `__all__` in each module to be explicit about public API
+- Example in `src/memory/__init__.py`:
+
+```python
+from src.memory.storage import ConversationStorage
+from src.memory.compression import MemoryCompressor
+
+__all__ = ["ConversationStorage", "MemoryCompressor"]
+```
+
+**Barrel Files:**
+- Use `__init__.py` to export key classes/functions from submodules
+- Keep import chains shallow (max 2 levels deep)
+- Example structure:
+  ```
+  src/
+  ├── memory/
+  │   ├── __init__.py (exports Storage, Compressor)
+  │   ├── storage.py
+  │   └── compression.py
+  ```
+
+**Async/Await:**
+- All I/O operations (database, API calls, file I/O) must be async
+- Use `asyncio` for concurrency, not threading
+- Async context managers for resource management:
+
+```python
+async def process_request(prompt: str) -> str:
+    async with model_manager.get_session() as session:
+        response = await session.generate(prompt)
+        return response
+```
+
+## Type Hints
+
+**Requirements:**
+- All public function signatures must have type hints
+- Use `from __future__ import annotations` for forward references
+- Prefer union syntax: `str | None` over `Optional[str]`
+- Use `Literal` for string enums: `Literal["approve", "reject"]`
+- Example:
+
+```python
+from __future__ import annotations
+from typing import Literal
+
+def evaluate_risk(code: str) -> Literal["LOW", "MEDIUM", "HIGH", "BLOCKED"]:
+    """Evaluate code risk level."""
+    pass
+```
+
+## Configuration
+
+**Pattern:**
+- Use YAML for human-editable config files
+- Use environment variables for secrets (never commit `.env`)
+- Validation at import time (fail fast if config invalid)
+- Example:
+
+```python
+# config.py
+import os
+from pathlib import Path
+
+class Config:
+    DEBUG = os.getenv("DEBUG", "false").lower() == "true"
+    MODELS_PATH = Path(os.getenv("MODELS_PATH", "~/.mai/models")).expanduser()
+    MAX_CONTEXT_TOKENS = int(os.getenv("MAX_CONTEXT_TOKENS", "8000"))
+
+    # Validate on import
+    if not MODELS_PATH.exists():
+        raise RuntimeError(f"Models path does not exist: {MODELS_PATH}")
+```
+
+---
+
+*Convention guide: 2026-01-26*
+*Status: Prescriptive for Mai v1 implementation*
--- a/.planning/codebase/INTEGRATIONS.md
+++ b/.planning/codebase/INTEGRATIONS.md
@@ -0,0 +1,129 @@
+# External Integrations
+
+**Analysis Date:** 2026-01-26
+
+## APIs & External Services
+
+**Model Inference:**
+- LMStudio - Local model server for inference and model switching
+  - SDK/Client: LMStudio Python API
+  - Auth: None (local service, no authentication required)
+  - Configuration: model_path env var, endpoint URL
+
+- Ollama - Alternative local model management system
+  - SDK/Client: Ollama REST API (HTTP)
+  - Auth: None (local service)
+  - Purpose: Model loading, switching, inference with resource detection
+
+**Communication & Approvals:**
+- Discord - Bot interface for conversation and change approvals
+  - SDK/Client: discord.py library
+  - Auth: DISCORD_BOT_TOKEN env variable
+  - Purpose: Multi-turn conversations, approval reactions (thumbs up/down), status updates
+
+## Data Storage
+
+**Databases:**
+- SQLite3 (local file-based)
+  - Connection: Local file path, no remote connection
+  - Client: Python sqlite3 (stdlib) or SQLAlchemy ORM
+  - Purpose: Persistent conversation history, memory compression, learned patterns
+  - Location: Local filesystem (.db files)
+
+**File Storage:**
+- Local filesystem only - Git-tracked code changes, conversation history backups
+- No cloud storage integration in v1
+
+**Caching:**
+- In-memory caching for current conversation context
+- Redis: Not used in v1 (local-first constraint)
+- Model context window management: Token-based cache within model inference
+
+## Authentication & Identity
+
+**Auth Provider:**
+- Custom local auth - No external identity provider
+- Implementation:
+  - Discord user ID as conversation context identifier
+  - Optional local password/PIN for CLI access
+  - No OAuth/cloud identity providers (offline-first requirement)
+
+## Monitoring & Observability
+
+**Error Tracking:**
+- None (local only, no error reporting service)
+- Local audit logging to SQLite instead
+
+**Logs:**
+- File-based logging to `.logs/` directory
+- Format: Structured JSON logs with timestamp, level, context
+- Rotation: Size-based or time-based rotation strategy
+- No external log aggregation (offline-first)
+
+## CI/CD & Deployment
+
+**Hosting:**
+- Local machine only (desktop/laptop with RTX 3060+)
+- No cloud hosting in v1
+
+**CI Pipeline:**
+- GitHub Actions for Discord webhook on push
+  - Workflow: `.github/workflows/discord_sync.yml`
+  - Trigger: Push events
+  - Action: POST to Discord webhook for notification
+
+**Git Integration:**
+- All Mai's self-modifications committed automatically with git
+- Local git repo tracking all code changes
+- Commit messages include decision context and review results
+
+## Environment Configuration
+
+**Required env vars:**
+- `DISCORD_BOT_TOKEN` - Discord bot authentication
+- `LMSTUDIO_ENDPOINT` - LMStudio API URL (default: localhost:8000)
+- `OLLAMA_ENDPOINT` - Ollama API URL (optional alternative, default: localhost:11434)
+- `DISCORD_USER_ID` - User Discord ID for approval requests
+- `MEMORY_DB_PATH` - SQLite database file location
+- `MODEL_CACHE_DIR` - Directory for model files
+- `CPU_CORES_AVAILABLE` - System CPU count for resource management
+- `GPU_VRAM_AVAILABLE` - VRAM in GB for model selection
+- `SANDBOX_DOCKER_IMAGE` - Docker image ID for code sandbox execution
+
+**Secrets location:**
+- `.env` file (Python-dotenv) for local development
+- Environment variables for production/runtime
+- Git-ignored: `.env` not committed
+
+## Webhooks & Callbacks
+
+**Incoming:**
+- Discord message webhooks - Handled by discord.py bot event listeners
+- No external webhook endpoints in v1
+
+**Outgoing:**
+- Discord webhook for git notifications (configured in GitHub Actions)
+- Endpoint: Stored in GitHub secrets as WEBHOOK
+- Triggered on: git push events
+- Payload: Git commit information (author, message, timestamp)
+
+**Model Callback Handling:**
+- LMStudio streaming callbacks for token-by-token responses
+- Ollama streaming responses for incremental model output
+
+## Code Execution Sandbox
+
+**Sandbox Environment:**
+- Docker container with resource limits
+  - SDK: Docker SDK for Python (docker-py)
+  - Environment: Isolated Linux container
+  - Resource limits: CPU cores, RAM, network restrictions
+
+**Risk Assessment:**
+- Multi-level risk evaluation (LOW/MEDIUM/HIGH/BLOCKED)
+- AST validation before container execution
+- Second-agent review via Claude/OpenCode API
+
+---
+
+*Integration audit: 2026-01-26*
--- a/.planning/codebase/STACK.md
+++ b/.planning/codebase/STACK.md
@@ -0,0 +1,93 @@
+# Technology Stack
+
+**Analysis Date:** 2026-01-26
+
+## Languages
+
+**Primary:**
+- Python 3.x - Core Mai agent codebase, local model inference, self-improvement system
+
+**Secondary:**
+- YAML - Configuration files for personality, behavior settings
+- JSON - Configuration, metadata, API responses
+- SQL - Memory storage and retrieval queries
+
+## Runtime
+
+**Environment:**
+- Python (local execution, no remote runtime)
+- LMStudio or Ollama - Local model inference server
+
+**Package Manager:**
+- pip - Python package management
+- Lockfile: requirements.txt or poetry.lock (typical Python approach)
+
+## Frameworks
+
+**Core:**
+- No web framework for v1 (CLI/Discord only)
+
+**Model Inference:**
+- LMStudio Python SDK - Local model switching and inference
+- Ollama API - Alternative local model management per requirements
+
+**Discord Integration:**
+- discord.py - Discord bot API client
+
+**CLI:**
+- Click or Typer - Command-line interface building
+
+**Testing:**
+- pytest - Unit/integration test framework
+- pytest-asyncio - Async test support for Discord bot testing
+
+**Build/Dev:**
+- Git - Version control for Mai's own code changes
+- Docker - Sandbox execution environment for safety
+
+## Key Dependencies
+
+**Critical:**
+- LMStudio Python Client - Model loading, switching, inference with token management
+- discord.py - Discord bot functionality for approval workflows
+- SQLite3 - Lightweight persistent storage (Python stdlib)
+- Docker SDK for Python - Sandbox execution management
+
+**Infrastructure:**
+- requests - HTTP client for Discord API fallback and Ollama API communication
+- PyYAML - Personality configuration parsing
+- pydantic - Data validation for internal structures
+- python-dotenv - Environment variable management for secrets
+- GitPython - Programmatic git operations for committing self-improvements
+
+## Configuration
+
+**Environment:**
+- .env file - Discord bot token, model paths, resource thresholds
+- environment variables - Runtime configuration loaded at startup
+- personality.yaml - Core personality values and learned behavior layers
+- config.json - Resource limits, model preferences, memory settings
+
+**Build:**
+- setup.py or pyproject.toml - Package metadata and dependency declaration
+- Dockerfile - Sandbox execution environment specification
+- .dockerignore - Docker build optimization
+
+## Platform Requirements
+
+**Development:**
+- Python 3.8+ (for type hints and async/await)
+- Git (for version control and self-modification tracking)
+- Docker (for sandbox execution environment)
+- LMStudio or Ollama running locally (for model inference)
+
+**Production (Runtime):**
+- RTX 3060 GPU minimum (per project constraints)
+- 16GB+ RAM (for model loading and context management)
+- Linux/macOS/Windows with Python 3.8+
+- Docker daemon (for sandboxed code execution)
+- Local LMStudio/Ollama instance (no cloud models)
+
+---
+
+*Stack analysis: 2026-01-26*
--- a/.planning/codebase/STRUCTURE.md
+++ b/.planning/codebase/STRUCTURE.md
@@ -0,0 +1,258 @@
+# Codebase Structure
+
+**Analysis Date:** 2026-01-26
+
+## Directory Layout
+
+```
+mai/
+├── src/
+│   ├── __main__.py              # CLI entry point
+│   ├── mai.py                   # Core Mai class, orchestration
+│   ├── models/
+│   │   ├── __init__.py
+│   │   ├── adapter.py           # Base model adapter interface
+│   │   ├── ollama_adapter.py    # Ollama/LMStudio implementation
+│   │   ├── model_manager.py     # Model selection and switching logic
+│   │   └── resource_monitor.py  # CPU, RAM, GPU tracking
+│   ├── memory/
+│   │   ├── __init__.py
+│   │   ├── store.py             # SQLite conversation store
+│   │   ├── vector_search.py     # Semantic similarity search
+│   │   ├── compression.py       # History compression and summarization
+│   │   └── pattern_extractor.py # Learning and pattern recognition
+│   ├── conversation/
+│   │   ├── __init__.py
+│   │   ├── engine.py            # Main conversation orchestration
+│   │   ├── context_manager.py   # Token budget and window management
+│   │   ├── turn_handler.py      # Single turn processing
+│   │   └── reasoning.py         # Reasoning transparency and clarification
+│   ├── personality/
+│   │   ├── __init__.py
+│   │   ├── core_rules.py        # Unshakeable core values enforcement
+│   │   ├── learned_behaviors.py # Personality adaptation from interactions
+│   │   ├── guardrails.py        # Safety constraints and refusal logic
+│   │   └── config_loader.py     # YAML personality configuration
+│   ├── safety/
+│   │   ├── __init__.py
+│   │   ├── executor.py          # Docker sandbox execution wrapper
+│   │   ├── risk_analyzer.py     # Risk classification (LOW/MEDIUM/HIGH/BLOCKED)
+│   │   ├── ast_validator.py     # Syntax and import validation
+│   │   └── audit_log.py         # Immutable execution history
+│   ├── selfmod/
+│   │   ├── __init__.py
+│   │   ├── analyzer.py          # Code analysis and improvement detection
+│   │   ├── generator.py         # Improvement code generation
+│   │   ├── scheduler.py         # Periodic and on-demand analysis trigger
+│   │   ├── reviewer.py          # Second-agent review coordination
+│   │   └── git_manager.py       # Git commit integration
+│   ├── interfaces/
+│   │   ├── __init__.py
+│   │   ├── cli.py               # CLI chat interface
+│   │   ├── discord_bot.py       # Discord bot implementation
+│   │   ├── message_handler.py   # Shared message processing
+│   │   ├── approval_handler.py  # Change approval workflow
+│   │   └── offline_queue.py     # Message queueing during disconnection
+│   └── utils/
+│       ├── __init__.py
+│       ├── config.py            # Configuration loading
+│       ├── logging.py           # Structured logging setup
+│       ├── validators.py        # Input validation helpers
+│       └── helpers.py           # Shared utility functions
+├── config/
+│   ├── personality.yaml         # Core personality configuration
+│   ├── models.yaml              # Model definitions and resource limits
+│   ├── safety_rules.yaml        # Risk assessment rules
+│   └── logging.yaml             # Logging configuration
+├── tests/
+│   ├── unit/
+│   │   ├── test_models.py
+│   │   ├── test_memory.py
+│   │   ├── test_conversation.py
+│   │   ├── test_personality.py
+│   │   ├── test_safety.py
+│   │   └── test_selfmod.py
+│   ├── integration/
+│   │   ├── test_conversation_flow.py
+│   │   ├── test_selfmod_workflow.py
+│   │   └── test_interfaces.py
+│   └── fixtures/
+│       ├── mock_models.py
+│       ├── test_data.py
+│       └── sample_conversations.json
+├── scripts/
+│   ├── setup_ollama.py          # Initial model downloading
+│   ├── init_db.py               # Database schema initialization
+│   └── verify_environment.py    # Pre-flight checks
+├── docker/
+│   └── Dockerfile               # Sandbox execution environment
+├── .env.example                 # Environment variables template
+├── pyproject.toml               # Project metadata and dependencies
+├── requirements.txt             # Python dependencies
+├── pytest.ini                   # Test configuration
+├── Makefile                     # Development commands
+└── README.md                    # Project overview
+```
+
+## Directory Purposes
+
+**src/:**
+- Purpose: All application code
+- Contains: Python modules organized by architectural layer
+- Key files: `mai.py` (core), `__main__.py` (CLI entry)
+
+**src/models/:**
+- Purpose: Model inference abstraction
+- Contains: Adapter interfaces, Ollama client, resource monitoring
+- Key files: `model_manager.py` (selection logic), `resource_monitor.py` (constraints)
+
+**src/memory/:**
+- Purpose: Persistent storage and retrieval
+- Contains: SQLite operations, vector search, compression
+- Key files: `store.py` (main interface), `vector_search.py` (semantic search)
+
+**src/conversation/:**
+- Purpose: Multi-turn conversation orchestration
+- Contains: Turn handling, context windowing, reasoning transparency
+- Key files: `engine.py` (main coordinator), `context_manager.py` (token budget)
+
+**src/personality/:**
+- Purpose: Values enforcement and personality adaptation
+- Contains: Core rules, learned behaviors, guardrails
+- Key files: `core_rules.py` (unshakeable values), `learned_behaviors.py` (adaptation)
+
+**src/safety/:**
+- Purpose: Code execution sandboxing and risk assessment
+- Contains: Docker wrapper, AST validation, risk classification, audit logging
+- Key files: `executor.py` (sandbox wrapper), `risk_analyzer.py` (classification)
+
+**src/selfmod/:**
+- Purpose: Autonomous code improvement and review
+- Contains: Code analysis, improvement generation, approval workflow
+- Key files: `analyzer.py` (detection), `reviewer.py` (second-agent coordination)
+
+**src/interfaces/:**
+- Purpose: External communication adapters
+- Contains: CLI handler, Discord bot, approval system
+- Key files: `cli.py` (terminal UI), `discord_bot.py` (Discord integration)
+
+**src/utils/:**
+- Purpose: Shared utilities and helpers
+- Contains: Configuration loading, logging, validation
+- Key files: `config.py` (env/file loading), `logging.py` (structured logs)
+
+**config/:**
+- Purpose: Non-code configuration files
+- Contains: YAML personality, models, safety rules definitions
+- Key files: `personality.yaml` (core values), `models.yaml` (resource profiles)
+
+**tests/:**
+- Purpose: Test suites organized by type
+- Contains: Unit tests (layer isolation), integration tests (flows), fixtures (test data)
+- Key files: Each test file mirrors `src/` structure
+
+**scripts/:**
+- Purpose: One-off setup and maintenance scripts
+- Contains: Database initialization, environment verification
+- Key files: `setup_ollama.py` (first-time model setup)
+
+**docker/:**
+- Purpose: Container configuration for sandboxed execution
+- Contains: Dockerfile for isolation environment
+- Key files: `Dockerfile` (build recipe)
+
+## Key File Locations
+
+**Entry Points:**
+- `src/__main__.py`: CLI entry, `python -m mai` launches here
+- `src/interfaces/discord_bot.py`: Discord bot main loop
+- `src/mai.py`: Core Mai class, system initialization
+
+**Configuration:**
+- `config/personality.yaml`: Core values, interaction patterns, refusal rules
+- `config/models.yaml`: Available models, resource requirements, context windows
+- `.env.example`: Required environment variables template
+
+**Core Logic:**
+- `src/mai.py`: Main orchestration
+- `src/conversation/engine.py`: Conversation turn processing
+- `src/selfmod/analyzer.py`: Improvement opportunity detection
+- `src/safety/executor.py`: Safe code execution
+
+**Testing:**
+- `tests/unit/`: Layer-isolated tests (no dependencies between layers)
+- `tests/integration/`: End-to-end flow tests
+- `tests/fixtures/`: Mock objects and test data
+
+## Naming Conventions
+
+**Files:**
+- Module files: `snake_case.py` (e.g., `model_manager.py`)
+- Entry points: `__main__.py` for packages, standalone scripts at package root
+- Config files: `snake_case.yaml` (e.g., `personality.yaml`)
+- Test files: `test_*.py` (e.g., `test_conversation.py`)
+
+**Directories:**
+- Feature areas: `snake_case` (e.g., `src/selfmod/`)
+- No abbreviations except `selfmod` (self-modification) which is project standard
+- Each layer is a top-level directory under `src/`
+
+**Functions/Classes:**
+- Classes: `PascalCase` (e.g., `ModelManager`, `ConversationEngine`)
+- Functions: `snake_case` (e.g., `generate_response()`, `validate_code()`)
+- Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_CONTEXT_TOKENS`)
+- Private methods/functions: prefix with `_` (e.g., `_internal_method()`)
+
+**Types:**
+- Use type hints throughout: `def process(msg: str) -> str:`
+- Complex types in `src/utils/types.py` or local to module
+
+## Where to Add New Code
+
+**New Feature (e.g., new communication interface like Slack):**
+- Primary code: `src/interfaces/slack_adapter.py` (new adapter following discord_bot.py pattern)
+- Tests: `tests/unit/test_slack_adapter.py` and `tests/integration/test_slack_interface.py`
+- Configuration: Add to `src/interfaces/__init__.py` imports and `config/interfaces.yaml` if needed
+- Entry hook: Modify `src/mai.py` to initialize new adapter
+
+**New Component/Module (e.g., advanced memory with graph databases):**
+- Implementation: `src/memory/graph_store.py` (new module in appropriate layer)
+- Interface: Follow existing patterns (e.g., inherit from `src/memory/store.py` base)
+- Tests: Corresponding test in `tests/unit/test_memory.py` or new file if complex
+- Integration: Modify `src/mai.py` initialization to use new component with feature flag
+
+**Utilities (e.g., new helper function):**
+- Shared helpers: `src/utils/helpers.py` (functions) or new file like `src/utils/math_utils.py` if substantial
+- Internal helpers: Keep in the module where used (don't over-extract)
+- Tests: Add to `tests/unit/test_utils.py`
+
+**Configuration:**
+- Static rules: Add to appropriate YAML in `config/`
+- Dynamic config: Load in `src/utils/config.py`
+- Env-driven: Add to `.env.example` with documentation
+
+## Special Directories
+
+**tests/fixtures/:**
+- Purpose: Reusable test data and mock objects
+- Generated: No, hand-created
+- Committed: Yes, part of repository
+
+**config/:**
+- Purpose: Non-code configuration
+- Generated: No, hand-maintained
+- Committed: Yes, except secrets (use `.env`)
+
+**.env (not committed):**
+- Purpose: Local environment overrides and secrets
+- Generated: No, copied from `.env.example` and filled locally
+- Committed: No (in .gitignore)
+
+**docker/:**
+- Purpose: Sandbox environment for safe execution
+- Generated: No, hand-maintained
+- Committed: Yes
+
+---
+
+*Structure analysis: 2026-01-26*
--- a/.planning/codebase/TESTING.md
+++ b/.planning/codebase/TESTING.md
@@ -0,0 +1,415 @@
+# Testing Patterns
+
+**Analysis Date:** 2026-01-26
+
+## Status
+
+**Note:** This codebase is in planning phase. No tests have been written yet. These patterns are **prescriptive** for the Mai project and should be applied from the first test file forward.
+
+## Test Framework
+
+**Runner:**
+- **pytest** - Test discovery and execution
+- Version: Latest stable (6.x or higher)
+- Config: `pytest.ini` or `pyproject.toml` (create with initial setup)
+
+**Assertion Library:**
+- Built-in `assert` statements
+- `pytest` fixtures for setup/teardown
+- `pytest.raises()` for exception testing
+
+**Run Commands:**
+```bash
+pytest                          # Run all tests in tests/ directory
+pytest -v                       # Verbose output with test names
+pytest -k "test_memory"         # Run tests matching pattern
+pytest --cov=src                # Generate coverage report
+pytest --cov=src --cov-report=html  # Generate HTML coverage
+pytest -x                       # Stop on first failure
+pytest -s                       # Show print output during tests
+```
+
+## Test File Organization
+
+**Location:**
+- **Co-located pattern**: Test files live next to source files
+- Structure: `src/[module]/test_[component].py`
+- All tests in a single directory: `tests/` with mirrored structure
+
+**Recommended pattern for Mai:**
+```
+src/
+├── memory/
+│   ├── __init__.py
+│   ├── storage.py
+│   └── test_storage.py          # Co-located tests
+├── models/
+│   ├── __init__.py
+│   ├── manager.py
+│   └── test_manager.py
+└── safety/
+    ├── __init__.py
+    ├── sandbox.py
+    └── test_sandbox.py
+```
+
+**Naming:**
+- Test files: `test_*.py` or `*_test.py`
+- Test classes: `TestComponentName`
+- Test functions: `test_specific_behavior_with_context`
+- Example: `test_retrieves_conversation_history_within_token_limit`
+
+**Test Organization:**
+- One test class per component being tested
+- Group related tests in a single class
+- One assertion per test (or tightly related assertions)
+
+## Test Structure
+
+**Suite Organization:**
+```python
+import pytest
+from src.memory.storage import ConversationStorage
+
+class TestConversationStorage:
+    """Test suite for ConversationStorage."""
+
+    @pytest.fixture
+    def storage(self) -> ConversationStorage:
+        """Provide a storage instance for testing."""
+        return ConversationStorage(path=":memory:")  # Use in-memory DB
+
+    @pytest.fixture
+    def sample_conversation(self) -> dict:
+        """Provide sample conversation data."""
+        return {
+            "messages": [
+                {"role": "user", "content": "Hello"},
+                {"role": "assistant", "content": "Hi there"},
+            ]
+        }
+
+    def test_stores_and_retrieves_conversation(self, storage, sample_conversation):
+        """Test that conversations can be stored and retrieved."""
+        conversation_id = storage.store(sample_conversation)
+        retrieved = storage.get(conversation_id)
+        assert retrieved == sample_conversation
+
+    def test_raises_error_on_missing_conversation(self, storage):
+        """Test that missing conversations raise appropriate error."""
+        with pytest.raises(MemoryError):
+            storage.get("nonexistent_id")
+```
+
+**Patterns:**
+
+- **Setup pattern**: Use `@pytest.fixture` for setup, avoid `setUp()` methods
+- **Teardown pattern**: Use fixture cleanup (yield pattern)
+- **Assertion pattern**: One logical assertion per test (may involve multiple `assert` statements on related data)
+
+```python
+@pytest.fixture
+def model_manager():
+    """Set up model manager and clean up after test."""
+    manager = ModelManager()
+    manager.initialize()
+    yield manager
+    manager.shutdown()  # Cleanup
+
+def test_loads_available_models(model_manager):
+    """Test model discovery and loading."""
+    models = model_manager.list_available()
+    assert len(models) > 0
+    assert all(isinstance(m, str) for m in models)
+```
+
+## Async Testing
+
+**Pattern:**
+```python
+import pytest
+import asyncio
+
+@pytest.mark.asyncio
+async def test_async_model_invocation():
+    """Test async model inference."""
+    manager = ModelManager()
+    response = await manager.generate("test prompt")
+    assert len(response) > 0
+    assert isinstance(response, str)
+
+@pytest.mark.asyncio
+async def test_concurrent_memory_access():
+    """Test that memory handles concurrent access."""
+    storage = ConversationStorage()
+    tasks = [
+        storage.store({"id": i, "text": f"msg {i}"})
+        for i in range(10)
+    ]
+    ids = await asyncio.gather(*tasks)
+    assert len(ids) == 10
+```
+
+- Use `@pytest.mark.asyncio` decorator
+- Use `async def` for test function signature
+- Use `await` for async calls
+- Can mix async fixtures and sync fixtures
+
+## Mocking
+
+**Framework:** `unittest.mock` (Python standard library)
+
+**Patterns:**
+
+```python
+from unittest.mock import Mock, AsyncMock, patch, MagicMock
+import pytest
+
+def test_handles_model_error():
+    """Test error handling when model fails."""
+    mock_model = Mock()
+    mock_model.generate.side_effect = RuntimeError("Model offline")
+
+    manager = ModelManager(model=mock_model)
+    with pytest.raises(ModelError):
+        manager.invoke("prompt")
+
+@pytest.mark.asyncio
+async def test_retries_on_transient_failure():
+    """Test retry logic for transient failures."""
+    mock_api = AsyncMock()
+    mock_api.call.side_effect = [
+        Exception("Temporary failure"),
+        "success"
+    ]
+
+    result = await retry_with_backoff(mock_api.call, max_retries=2)
+    assert result == "success"
+    assert mock_api.call.call_count == 2
+
+@patch("src.models.manager.requests.get")
+def test_fetches_model_list(mock_get):
+    """Test fetching model list from API."""
+    mock_get.return_value.json.return_value = {"models": ["model1", "model2"]}
+
+    manager = ModelManager()
+    models = manager.get_remote_models()
+    assert models == ["model1", "model2"]
+```
+
+**What to Mock:**
+- External API calls (Discord, LMStudio API)
+- Database operations (SQLite in production, use in-memory for tests)
+- File I/O (use temporary directories)
+- Slow operations (model inference can be stubbed)
+- System resources (CPU, RAM monitoring)
+
+**What NOT to Mock:**
+- Core business logic (the logic you're testing)
+- Data structure operations (dict, list operations)
+- Internal module calls within the same component
+- Internal helper functions
+
+## Fixtures and Factories
+
+**Test Data Pattern:**
+
+```python
+# conftest.py - shared fixtures
+import pytest
+from pathlib import Path
+from src.memory.storage import ConversationStorage
+
+@pytest.fixture
+def temp_db():
+    """Provide a temporary SQLite database."""
+    db_path = Path("/tmp/test_mai.db")
+    yield db_path
+    if db_path.exists():
+        db_path.unlink()
+
+@pytest.fixture
+def conversation_factory():
+    """Factory for creating test conversations."""
+    def _make_conversation(num_messages: int = 3) -> dict:
+        messages = []
+        for i in range(num_messages):
+            role = "user" if i % 2 == 0 else "assistant"
+            messages.append({
+                "role": role,
+                "content": f"Message {i+1}",
+                "timestamp": f"2026-01-26T{i:02d}:00:00Z"
+            })
+        return {"messages": messages}
+    return _make_conversation
+
+def test_stores_long_conversation(temp_db, conversation_factory):
+    """Test storing conversations with many messages."""
+    storage = ConversationStorage(path=temp_db)
+    long_convo = conversation_factory(num_messages=100)
+
+    conv_id = storage.store(long_convo)
+    retrieved = storage.get(conv_id)
+    assert len(retrieved["messages"]) == 100
+```
+
+**Location:**
+- Shared fixtures: `tests/conftest.py` (pytest auto-discovers)
+- Component-specific fixtures: In test files or subdirectory `conftest.py` files
+- Factories: In `tests/factories.py` or within `conftest.py`
+
+## Coverage
+
+**Requirements:**
+- **Target: 80% code coverage minimum** for core modules
+- Critical paths (safety, memory, inference): 90%+ coverage
+- UI/CLI: 70% (lower due to interaction complexity)
+
+**View Coverage:**
+```bash
+pytest --cov=src --cov-report=term-missing
+pytest --cov=src --cov-report=html
+# Then open htmlcov/index.html in browser
+```
+
+**Configure in `pyproject.toml`:**
+```toml
+[tool.pytest.ini_options]
+testpaths = ["src", "tests"]
+addopts = "--cov=src --cov-report=term-missing --cov-report=html"
+```
+
+## Test Types
+
+**Unit Tests:**
+- Scope: Single function or class method
+- Dependencies: Mocked
+- Speed: Fast (<100ms per test)
+- Location: `test_component.py` in source directory
+- Example: `test_tokenizer_splits_input_correctly`
+
+**Integration Tests:**
+- Scope: Multiple components working together
+- Dependencies: Real services (in-memory DB, local files)
+- Speed: Medium (100ms - 1s per test)
+- Location: `tests/integration/test_*.py`
+- Example: `test_conversation_engine_with_memory_retrieval`
+
+```python
+# tests/integration/test_conversation_flow.py
+@pytest.mark.asyncio
+async def test_full_conversation_with_memory():
+    """Test complete conversation flow including memory retrieval."""
+    memory = ConversationStorage(path=":memory:")
+    engine = ConversationEngine(memory=memory)
+
+    # Store context
+    memory.store({"id": "ctx1", "content": "User prefers Python"})
+
+    # Have conversation
+    response = await engine.chat("What language should I use?")
+
+    # Verify context was used
+    assert "Python" in response or "python" in response.lower()
+```
+
+**E2E Tests:**
+- Scope: Full system end-to-end
+- Framework: **Not required for v1** (added in v2)
+- Would test: CLI input → Model → Discord output
+- Deferred until Discord/CLI interfaces complete
+
+## Common Patterns
+
+**Error Testing:**
+```python
+def test_invalid_input_raises_validation_error():
+    """Test that validation catches malformed input."""
+    with pytest.raises(ValueError) as exc_info:
+        storage.store({"invalid": "structure"})
+    assert "missing required field" in str(exc_info.value)
+
+def test_logs_error_details():
+    """Test that errors log useful debugging info."""
+    with patch("src.logger") as mock_logger:
+        try:
+            risky_operation()
+        except OperationError:
+            pass
+        mock_logger.error.assert_called_once()
+        call_args = mock_logger.error.call_args
+        assert "operation_id" in str(call_args)
+```
+
+**Performance Testing:**
+```python
+def test_memory_retrieval_within_performance_budget(benchmark):
+    """Test that memory queries complete within time budget."""
+    storage = ConversationStorage()
+    query = "what did we discuss earlier"
+
+    result = benchmark(storage.retrieve_similar, query)
+    assert len(result) > 0
+
+# Run with: pytest --benchmark-only
+```
+
+**Data Validation Testing:**
+```python
+@pytest.mark.parametrize("input_val,expected", [
+    ("hello", "hello"),
+    ("HELLO", "hello"),
+    ("  hello  ", "hello"),
+    ("", ValueError),
+])
+def test_normalizes_input(input_val, expected):
+    """Test input normalization with multiple cases."""
+    if isinstance(expected, type) and issubclass(expected, Exception):
+        with pytest.raises(expected):
+            normalize(input_val)
+    else:
+        assert normalize(input_val) == expected
+```
+
+## Configuration
+
+**pytest.ini (create at project root):**
+```ini
+[pytest]
+testpaths = src tests
+addopts = -v --tb=short --strict-markers
+markers =
+    asyncio: marks async tests
+    slow: marks slow tests
+    integration: marks integration tests
+```
+
+**Alternative: pyproject.toml:**
+```toml
+[tool.pytest.ini_options]
+testpaths = ["src", "tests"]
+addopts = "-v --tb=short"
+markers = [
+    "asyncio: async test",
+    "slow: slow test",
+    "integration: integration test",
+]
+```
+
+## Test Execution in CI/CD
+
+**GitHub Actions workflow (when created):**
+```yaml
+- name: Run tests
+  run: pytest --cov=src --cov-report=xml
+
+- name: Upload coverage
+  uses: codecov/codecov-action@v3
+  with:
+    files: ./coverage.xml
+```
+
+---
+
+*Testing guide: 2026-01-26*
+*Status: Prescriptive for Mai v1 implementation*