- STACK.md - Technologies and dependencies - ARCHITECTURE.md - System design and patterns - STRUCTURE.md - Directory layout - CONVENTIONS.md - Code style and patterns - TESTING.md - Test structure - INTEGRATIONS.md - External services - CONCERNS.md - Technical debt and issues
12 KiB
Testing Patterns
Analysis Date: 2026-01-26
Status
Note: This codebase is in planning phase. No tests have been written yet. These patterns are prescriptive for the Mai project and should be applied from the first test file forward.
Test Framework
Runner:
- pytest - Test discovery and execution
- Version: Latest stable (6.x or higher)
- Config:
pytest.iniorpyproject.toml(create with initial setup)
Assertion Library:
- Built-in
assertstatements pytestfixtures for setup/teardownpytest.raises()for exception testing
Run Commands:
pytest # Run all tests in tests/ directory
pytest -v # Verbose output with test names
pytest -k "test_memory" # Run tests matching pattern
pytest --cov=src # Generate coverage report
pytest --cov=src --cov-report=html # Generate HTML coverage
pytest -x # Stop on first failure
pytest -s # Show print output during tests
Test File Organization
Location:
- Co-located pattern: Test files live next to source files
- Structure:
src/[module]/test_[component].py - All tests in a single directory:
tests/with mirrored structure
Recommended pattern for Mai:
src/
├── memory/
│ ├── __init__.py
│ ├── storage.py
│ └── test_storage.py # Co-located tests
├── models/
│ ├── __init__.py
│ ├── manager.py
│ └── test_manager.py
└── safety/
├── __init__.py
├── sandbox.py
└── test_sandbox.py
Naming:
- Test files:
test_*.pyor*_test.py - Test classes:
TestComponentName - Test functions:
test_specific_behavior_with_context - Example:
test_retrieves_conversation_history_within_token_limit
Test Organization:
- One test class per component being tested
- Group related tests in a single class
- One assertion per test (or tightly related assertions)
Test Structure
Suite Organization:
import pytest
from src.memory.storage import ConversationStorage
class TestConversationStorage:
"""Test suite for ConversationStorage."""
@pytest.fixture
def storage(self) -> ConversationStorage:
"""Provide a storage instance for testing."""
return ConversationStorage(path=":memory:") # Use in-memory DB
@pytest.fixture
def sample_conversation(self) -> dict:
"""Provide sample conversation data."""
return {
"messages": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
}
def test_stores_and_retrieves_conversation(self, storage, sample_conversation):
"""Test that conversations can be stored and retrieved."""
conversation_id = storage.store(sample_conversation)
retrieved = storage.get(conversation_id)
assert retrieved == sample_conversation
def test_raises_error_on_missing_conversation(self, storage):
"""Test that missing conversations raise appropriate error."""
with pytest.raises(MemoryError):
storage.get("nonexistent_id")
Patterns:
- Setup pattern: Use
@pytest.fixturefor setup, avoidsetUp()methods - Teardown pattern: Use fixture cleanup (yield pattern)
- Assertion pattern: One logical assertion per test (may involve multiple
assertstatements on related data)
@pytest.fixture
def model_manager():
"""Set up model manager and clean up after test."""
manager = ModelManager()
manager.initialize()
yield manager
manager.shutdown() # Cleanup
def test_loads_available_models(model_manager):
"""Test model discovery and loading."""
models = model_manager.list_available()
assert len(models) > 0
assert all(isinstance(m, str) for m in models)
Async Testing
Pattern:
import pytest
import asyncio
@pytest.mark.asyncio
async def test_async_model_invocation():
"""Test async model inference."""
manager = ModelManager()
response = await manager.generate("test prompt")
assert len(response) > 0
assert isinstance(response, str)
@pytest.mark.asyncio
async def test_concurrent_memory_access():
"""Test that memory handles concurrent access."""
storage = ConversationStorage()
tasks = [
storage.store({"id": i, "text": f"msg {i}"})
for i in range(10)
]
ids = await asyncio.gather(*tasks)
assert len(ids) == 10
- Use
@pytest.mark.asynciodecorator - Use
async deffor test function signature - Use
awaitfor async calls - Can mix async fixtures and sync fixtures
Mocking
Framework: unittest.mock (Python standard library)
Patterns:
from unittest.mock import Mock, AsyncMock, patch, MagicMock
import pytest
def test_handles_model_error():
"""Test error handling when model fails."""
mock_model = Mock()
mock_model.generate.side_effect = RuntimeError("Model offline")
manager = ModelManager(model=mock_model)
with pytest.raises(ModelError):
manager.invoke("prompt")
@pytest.mark.asyncio
async def test_retries_on_transient_failure():
"""Test retry logic for transient failures."""
mock_api = AsyncMock()
mock_api.call.side_effect = [
Exception("Temporary failure"),
"success"
]
result = await retry_with_backoff(mock_api.call, max_retries=2)
assert result == "success"
assert mock_api.call.call_count == 2
@patch("src.models.manager.requests.get")
def test_fetches_model_list(mock_get):
"""Test fetching model list from API."""
mock_get.return_value.json.return_value = {"models": ["model1", "model2"]}
manager = ModelManager()
models = manager.get_remote_models()
assert models == ["model1", "model2"]
What to Mock:
- External API calls (Discord, LMStudio API)
- Database operations (SQLite in production, use in-memory for tests)
- File I/O (use temporary directories)
- Slow operations (model inference can be stubbed)
- System resources (CPU, RAM monitoring)
What NOT to Mock:
- Core business logic (the logic you're testing)
- Data structure operations (dict, list operations)
- Internal module calls within the same component
- Internal helper functions
Fixtures and Factories
Test Data Pattern:
# conftest.py - shared fixtures
import pytest
from pathlib import Path
from src.memory.storage import ConversationStorage
@pytest.fixture
def temp_db():
"""Provide a temporary SQLite database."""
db_path = Path("/tmp/test_mai.db")
yield db_path
if db_path.exists():
db_path.unlink()
@pytest.fixture
def conversation_factory():
"""Factory for creating test conversations."""
def _make_conversation(num_messages: int = 3) -> dict:
messages = []
for i in range(num_messages):
role = "user" if i % 2 == 0 else "assistant"
messages.append({
"role": role,
"content": f"Message {i+1}",
"timestamp": f"2026-01-26T{i:02d}:00:00Z"
})
return {"messages": messages}
return _make_conversation
def test_stores_long_conversation(temp_db, conversation_factory):
"""Test storing conversations with many messages."""
storage = ConversationStorage(path=temp_db)
long_convo = conversation_factory(num_messages=100)
conv_id = storage.store(long_convo)
retrieved = storage.get(conv_id)
assert len(retrieved["messages"]) == 100
Location:
- Shared fixtures:
tests/conftest.py(pytest auto-discovers) - Component-specific fixtures: In test files or subdirectory
conftest.pyfiles - Factories: In
tests/factories.pyor withinconftest.py
Coverage
Requirements:
- Target: 80% code coverage minimum for core modules
- Critical paths (safety, memory, inference): 90%+ coverage
- UI/CLI: 70% (lower due to interaction complexity)
View Coverage:
pytest --cov=src --cov-report=term-missing
pytest --cov=src --cov-report=html
# Then open htmlcov/index.html in browser
Configure in pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["src", "tests"]
addopts = "--cov=src --cov-report=term-missing --cov-report=html"
Test Types
Unit Tests:
- Scope: Single function or class method
- Dependencies: Mocked
- Speed: Fast (<100ms per test)
- Location:
test_component.pyin source directory - Example:
test_tokenizer_splits_input_correctly
Integration Tests:
- Scope: Multiple components working together
- Dependencies: Real services (in-memory DB, local files)
- Speed: Medium (100ms - 1s per test)
- Location:
tests/integration/test_*.py - Example:
test_conversation_engine_with_memory_retrieval
# tests/integration/test_conversation_flow.py
@pytest.mark.asyncio
async def test_full_conversation_with_memory():
"""Test complete conversation flow including memory retrieval."""
memory = ConversationStorage(path=":memory:")
engine = ConversationEngine(memory=memory)
# Store context
memory.store({"id": "ctx1", "content": "User prefers Python"})
# Have conversation
response = await engine.chat("What language should I use?")
# Verify context was used
assert "Python" in response or "python" in response.lower()
E2E Tests:
- Scope: Full system end-to-end
- Framework: Not required for v1 (added in v2)
- Would test: CLI input → Model → Discord output
- Deferred until Discord/CLI interfaces complete
Common Patterns
Error Testing:
def test_invalid_input_raises_validation_error():
"""Test that validation catches malformed input."""
with pytest.raises(ValueError) as exc_info:
storage.store({"invalid": "structure"})
assert "missing required field" in str(exc_info.value)
def test_logs_error_details():
"""Test that errors log useful debugging info."""
with patch("src.logger") as mock_logger:
try:
risky_operation()
except OperationError:
pass
mock_logger.error.assert_called_once()
call_args = mock_logger.error.call_args
assert "operation_id" in str(call_args)
Performance Testing:
def test_memory_retrieval_within_performance_budget(benchmark):
"""Test that memory queries complete within time budget."""
storage = ConversationStorage()
query = "what did we discuss earlier"
result = benchmark(storage.retrieve_similar, query)
assert len(result) > 0
# Run with: pytest --benchmark-only
Data Validation Testing:
@pytest.mark.parametrize("input_val,expected", [
("hello", "hello"),
("HELLO", "hello"),
(" hello ", "hello"),
("", ValueError),
])
def test_normalizes_input(input_val, expected):
"""Test input normalization with multiple cases."""
if isinstance(expected, type) and issubclass(expected, Exception):
with pytest.raises(expected):
normalize(input_val)
else:
assert normalize(input_val) == expected
Configuration
pytest.ini (create at project root):
[pytest]
testpaths = src tests
addopts = -v --tb=short --strict-markers
markers =
asyncio: marks async tests
slow: marks slow tests
integration: marks integration tests
Alternative: pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["src", "tests"]
addopts = "-v --tb=short"
markers = [
"asyncio: async test",
"slow: slow test",
"integration: integration test",
]
Test Execution in CI/CD
GitHub Actions workflow (when created):
- name: Run tests
run: pytest --cov=src --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage.xml
Testing guide: 2026-01-26 Status: Prescriptive for Mai v1 implementation