Testing Guide¶

Guidelines for writing tests in Pivot.

Philosophy¶

Real assurance over passing tests. Tests prove correctness, not just exercise code paths. A test that mocks internal logic proves nothing - it just confirms mocks return what you told them to.

Structure¶

No class Test* - Use flat def test_* functions; group with comment separators if needed
No @pytest.mark.skip - If a test isn't ready, don't write it yet
No lazy imports - All imports at module level, never inside test functions
No duplicate library code - Import and test the real library, don't reimplement helpers

Naming¶

Files: test_<module>.py
Functions: test_<behavior> - e.g., test_helper_function_change_triggers_rerun not test_fingerprint

Module-Level Helpers¶

Inline functions inside tests do NOT capture module imports in closures - getclosurevars() can't see them. Always define helpers at module level with _helper_ prefix:

# Module level - works
def _helper_uses_math():
    return math.pi

# Inline in test - FAILS fingerprinting
def test_it():
    def uses_math():  # math won't be in closure!
        return math.pi

Parametrization¶

Consolidate repetitive tests with @pytest.mark.parametrize
Put data in parameters, not logic - No if/else in test bodies based on parameters
Use pytest.param(id="...") for readable test names
Consolidate when: Same test logic, different input data
Keep separate when: Different behaviors, complex assertions, unique edge cases

Fixtures¶

Global State (autouse)¶

conftest.py has autouse fixtures that reset state between tests: clean_registry, reset_pivot_state.

Never manually reset these in individual tests or create duplicate fixtures.

usefixtures¶

When a test needs a fixture's side effect but not its return value:

@pytest.mark.usefixtures("set_project_root")
def test_something(tmp_path: pathlib.Path) -> None:
    ...

Available Fixtures¶

tmp_pipeline_dir - Temporary directory for pipeline tests
sample_data_file - Create sample CSV
set_project_root - Set project root to tmp_path
git_repo - Create a git repo with commit function

Mocking¶

Use mocker or monkeypatch - Never manual assignment (they auto-restore)
Always autospec=True when mocking functions/methods (catches signature mismatches)
Exception: No autospec when patching to a literal value (None, {}, etc.)

Mock Boundaries Only¶

Mock external boundaries (network, filesystem in unit tests, time, randomness). Never mock internal functions to control return values.

Signs of circular mock testing:

Mock returns X, test asserts X is returned
Mocking the function you're trying to test
Mock setup mirrors the assertion exactly

Test Behavior, Not Implementation¶

No private attribute access in assertions - Use public interfaces
No position-based CLI output parsing - Use simple containment checks or --json output

Assertions¶

Use assertion messages that appear in failure output:

assert x, "Should have y"  # Good - message in failure output

# Don't use inline comments
assert x  # Should have y  # Bad - comment not shown

CLI Integration Tests¶

Every CLI command needs an integration test that:

Creates real filesystem (tmp_path or runner.isolated_filesystem())
Writes actual files (Python stages, data, .git)
Runs actual CLI via runner.invoke()
Verifies both output AND filesystem state

Required test cases:

Success paths
Error paths
Output formats (--json, --md)

Coverage¶

Minimum: 90%
Critical files (100%): fingerprint.py, lock.py, dag.py, scheduler.py
CLI/explain: 80-85% acceptable

Fingerprint Tests¶

All fingerprint tests live in tests/fingerprint/. Before modifying fingerprinting behavior, consult tests/fingerprint/README.md for the change detection matrix. Update it when adding tests.

Debugging¶

pytest -x       # Stop on first failure
pytest -s       # Show print statements
pytest --pdb    # Debugger on failure
pytest --lf     # Re-run failed tests

Cross-Process Tests¶

When testing multiprocessing behavior, use file-based state instead of shared memory:

# Bad - shared mutable state silently fails in multiprocessing
execution_log = list[str]()

def my_stage():
    execution_log.append("ran")  # Each process has its own copy!

# Good - file-based logging for cross-process communication
def my_stage():
    with open("log.txt", "a") as f:
        f.write("ran\n")

TypedDict Test Patterns¶

When tests construct TypedDict instances (like PipelineReloaded, StageCompleted), they must include all required fields:

# Good - includes all required fields
event: types.PipelineReloaded = {
    "type": "pipeline_reloaded",
    "stages": ["stage_a", "stage_b"],  # Required field
    "stages_added": ["stage_b"],
    "stages_removed": [],
    "stages_modified": [],
    "error": None,
}

# Bad - missing required "stages" field (basedpyright will catch this)
event: types.PipelineReloaded = {
    "type": "pipeline_reloaded",
    "stages_added": ["stage_b"],
    ...
}

When adding required fields to TypedDicts:

Search for all test usages: rg "TypedDictName" tests/
Update all instances with the new field
Run uv run basedpyright . locally before pushing