Mai/.planning/codebase/TESTING.md

# Testing Patterns

**Analysis Date:** 2026-01-26

## Status

**Note:** This codebase is in planning phase. No tests have been written yet. These patterns are **prescriptive** for the Mai project and should be applied from the first test file forward.

## Test Framework

**Runner:**
- **pytest** - Test discovery and execution
- Version: Latest stable (6.x or higher)
- Config: `pytest.ini` or `pyproject.toml` (create with initial setup)

**Assertion Library:**
- Built-in `assert` statements
- `pytest` fixtures for setup/teardown
- `pytest.raises()` for exception testing

**Run Commands:**
```bash
pytest                          # Run all tests in tests/ directory
pytest -v                       # Verbose output with test names
pytest -k "test_memory"         # Run tests matching pattern
pytest --cov=src                # Generate coverage report
pytest --cov=src --cov-report=html  # Generate HTML coverage
pytest -x                       # Stop on first failure
pytest -s                       # Show print output during tests
```

## Test File Organization

**Location:**
- **Co-located pattern**: Test files live next to source files
- Structure: `src/[module]/test_[component].py`
- All tests in a single directory: `tests/` with mirrored structure

**Recommended pattern for Mai:**
```
src/
├── memory/
│   ├── __init__.py
│   ├── storage.py
│   └── test_storage.py          # Co-located tests
├── models/
│   ├── __init__.py
│   ├── manager.py
│   └── test_manager.py
└── safety/
    ├── __init__.py
    ├── sandbox.py
    └── test_sandbox.py
```

**Naming:**
- Test files: `test_*.py` or `*_test.py`
- Test classes: `TestComponentName`
- Test functions: `test_specific_behavior_with_context`
- Example: `test_retrieves_conversation_history_within_token_limit`

**Test Organization:**
- One test class per component being tested
- Group related tests in a single class
- One assertion per test (or tightly related assertions)

## Test Structure

**Suite Organization:**
```python
import pytest
from src.memory.storage import ConversationStorage

class TestConversationStorage:
    """Test suite for ConversationStorage."""

    @pytest.fixture
    def storage(self) -> ConversationStorage:
        """Provide a storage instance for testing."""
        return ConversationStorage(path=":memory:")  # Use in-memory DB

    @pytest.fixture
    def sample_conversation(self) -> dict:
        """Provide sample conversation data."""
        return {
            "messages": [
                {"role": "user", "content": "Hello"},
                {"role": "assistant", "content": "Hi there"},
            ]
        }

    def test_stores_and_retrieves_conversation(self, storage, sample_conversation):
        """Test that conversations can be stored and retrieved."""
        conversation_id = storage.store(sample_conversation)
        retrieved = storage.get(conversation_id)
        assert retrieved == sample_conversation

    def test_raises_error_on_missing_conversation(self, storage):
        """Test that missing conversations raise appropriate error."""
        with pytest.raises(MemoryError):
            storage.get("nonexistent_id")
```

**Patterns:**

- **Setup pattern**: Use `@pytest.fixture` for setup, avoid `setUp()` methods
- **Teardown pattern**: Use fixture cleanup (yield pattern)
- **Assertion pattern**: One logical assertion per test (may involve multiple `assert` statements on related data)

```python
@pytest.fixture
def model_manager():
    """Set up model manager and clean up after test."""
    manager = ModelManager()
    manager.initialize()
    yield manager
    manager.shutdown()  # Cleanup

def test_loads_available_models(model_manager):
    """Test model discovery and loading."""
    models = model_manager.list_available()
    assert len(models) > 0
    assert all(isinstance(m, str) for m in models)
```

## Async Testing

**Pattern:**
```python
import pytest
import asyncio

@pytest.mark.asyncio
async def test_async_model_invocation():
    """Test async model inference."""
    manager = ModelManager()
    response = await manager.generate("test prompt")
    assert len(response) > 0
    assert isinstance(response, str)

@pytest.mark.asyncio
async def test_concurrent_memory_access():
    """Test that memory handles concurrent access."""
    storage = ConversationStorage()
    tasks = [
        storage.store({"id": i, "text": f"msg {i}"})
        for i in range(10)
    ]
    ids = await asyncio.gather(*tasks)
    assert len(ids) == 10
```

- Use `@pytest.mark.asyncio` decorator
- Use `async def` for test function signature
- Use `await` for async calls
- Can mix async fixtures and sync fixtures

## Mocking

**Framework:** `unittest.mock` (Python standard library)

**Patterns:**

```python
from unittest.mock import Mock, AsyncMock, patch, MagicMock
import pytest

def test_handles_model_error():
    """Test error handling when model fails."""
    mock_model = Mock()
    mock_model.generate.side_effect = RuntimeError("Model offline")

    manager = ModelManager(model=mock_model)
    with pytest.raises(ModelError):
        manager.invoke("prompt")

@pytest.mark.asyncio
async def test_retries_on_transient_failure():
    """Test retry logic for transient failures."""
    mock_api = AsyncMock()
    mock_api.call.side_effect = [
        Exception("Temporary failure"),
        "success"
    ]

    result = await retry_with_backoff(mock_api.call, max_retries=2)
    assert result == "success"
    assert mock_api.call.call_count == 2

@patch("src.models.manager.requests.get")
def test_fetches_model_list(mock_get):
    """Test fetching model list from API."""
    mock_get.return_value.json.return_value = {"models": ["model1", "model2"]}

    manager = ModelManager()
    models = manager.get_remote_models()
    assert models == ["model1", "model2"]
```

**What to Mock:**
- External API calls (Discord, LMStudio API)
- Database operations (SQLite in production, use in-memory for tests)
- File I/O (use temporary directories)
- Slow operations (model inference can be stubbed)
- System resources (CPU, RAM monitoring)

**What NOT to Mock:**
- Core business logic (the logic you're testing)
- Data structure operations (dict, list operations)
- Internal module calls within the same component
- Internal helper functions

## Fixtures and Factories

**Test Data Pattern:**

```python
# conftest.py - shared fixtures
import pytest
from pathlib import Path
from src.memory.storage import ConversationStorage

@pytest.fixture
def temp_db():
    """Provide a temporary SQLite database."""
    db_path = Path("/tmp/test_mai.db")
    yield db_path
    if db_path.exists():
        db_path.unlink()

@pytest.fixture
def conversation_factory():
    """Factory for creating test conversations."""
    def _make_conversation(num_messages: int = 3) -> dict:
        messages = []
        for i in range(num_messages):
            role = "user" if i % 2 == 0 else "assistant"
            messages.append({
                "role": role,
                "content": f"Message {i+1}",
                "timestamp": f"2026-01-26T{i:02d}:00:00Z"
            })
        return {"messages": messages}
    return _make_conversation

def test_stores_long_conversation(temp_db, conversation_factory):
    """Test storing conversations with many messages."""
    storage = ConversationStorage(path=temp_db)
    long_convo = conversation_factory(num_messages=100)

    conv_id = storage.store(long_convo)
    retrieved = storage.get(conv_id)
    assert len(retrieved["messages"]) == 100
```

**Location:**
- Shared fixtures: `tests/conftest.py` (pytest auto-discovers)
- Component-specific fixtures: In test files or subdirectory `conftest.py` files
- Factories: In `tests/factories.py` or within `conftest.py`

## Coverage

**Requirements:**
- **Target: 80% code coverage minimum** for core modules
- Critical paths (safety, memory, inference): 90%+ coverage
- UI/CLI: 70% (lower due to interaction complexity)

**View Coverage:**
```bash
pytest --cov=src --cov-report=term-missing
pytest --cov=src --cov-report=html
# Then open htmlcov/index.html in browser
```

**Configure in `pyproject.toml`:**
```toml
[tool.pytest.ini_options]
testpaths = ["src", "tests"]
addopts = "--cov=src --cov-report=term-missing --cov-report=html"
```

## Test Types

**Unit Tests:**
- Scope: Single function or class method
- Dependencies: Mocked
- Speed: Fast (<100ms per test)
- Location: `test_component.py` in source directory
- Example: `test_tokenizer_splits_input_correctly`

**Integration Tests:**
- Scope: Multiple components working together
- Dependencies: Real services (in-memory DB, local files)
- Speed: Medium (100ms - 1s per test)
- Location: `tests/integration/test_*.py`
- Example: `test_conversation_engine_with_memory_retrieval`

```python
# tests/integration/test_conversation_flow.py
@pytest.mark.asyncio
async def test_full_conversation_with_memory():
    """Test complete conversation flow including memory retrieval."""
    memory = ConversationStorage(path=":memory:")
    engine = ConversationEngine(memory=memory)

    # Store context
    memory.store({"id": "ctx1", "content": "User prefers Python"})

    # Have conversation
    response = await engine.chat("What language should I use?")

    # Verify context was used
    assert "Python" in response or "python" in response.lower()
```

**E2E Tests:**
- Scope: Full system end-to-end
- Framework: **Not required for v1** (added in v2)
- Would test: CLI input → Model → Discord output
- Deferred until Discord/CLI interfaces complete

## Common Patterns

**Error Testing:**
```python
def test_invalid_input_raises_validation_error():
    """Test that validation catches malformed input."""
    with pytest.raises(ValueError) as exc_info:
        storage.store({"invalid": "structure"})
    assert "missing required field" in str(exc_info.value)

def test_logs_error_details():
    """Test that errors log useful debugging info."""
    with patch("src.logger") as mock_logger:
        try:
            risky_operation()
        except OperationError:
            pass
        mock_logger.error.assert_called_once()
        call_args = mock_logger.error.call_args
        assert "operation_id" in str(call_args)
```

**Performance Testing:**
```python
def test_memory_retrieval_within_performance_budget(benchmark):
    """Test that memory queries complete within time budget."""
    storage = ConversationStorage()
    query = "what did we discuss earlier"

    result = benchmark(storage.retrieve_similar, query)
    assert len(result) > 0

# Run with: pytest --benchmark-only
```

**Data Validation Testing:**
```python
@pytest.mark.parametrize("input_val,expected", [
    ("hello", "hello"),
    ("HELLO", "hello"),
    ("  hello  ", "hello"),
    ("", ValueError),
])
def test_normalizes_input(input_val, expected):
    """Test input normalization with multiple cases."""
    if isinstance(expected, type) and issubclass(expected, Exception):
        with pytest.raises(expected):
            normalize(input_val)
    else:
        assert normalize(input_val) == expected
```

## Configuration

**pytest.ini (create at project root):**
```ini
[pytest]
testpaths = src tests
addopts = -v --tb=short --strict-markers
markers =
    asyncio: marks async tests
    slow: marks slow tests
    integration: marks integration tests
```

**Alternative: pyproject.toml:**
```toml
[tool.pytest.ini_options]
testpaths = ["src", "tests"]
addopts = "-v --tb=short"
markers = [
    "asyncio: async test",
    "slow: slow test",
    "integration: integration test",
]
```

## Test Execution in CI/CD

**GitHub Actions workflow (when created):**
```yaml
- name: Run tests
  run: pytest --cov=src --cov-report=xml

- name: Upload coverage
  uses: codecov/codecov-action@v3
  with:
    files: ./coverage.xml
```

---

*Testing guide: 2026-01-26*
*Status: Prescriptive for Mai v1 implementation*