Fingerprinting¶
Pivot fingerprints your stage functions so it knows when code changes. Unlike file-level hashing (which triggers on whitespace or comment edits), Pivot's fingerprinting is Abstract Syntax Tree (AST) based — it captures the structure of your code, ignoring cosmetic changes.
What Gets Tracked¶
When you register a stage, Pivot builds a manifest — a dict mapping logical keys to hashes:
| Manifest key | What it represents |
|---|---|
self:<name> |
The stage function's own AST |
func:<name> |
Helper functions called by the stage |
class:<name> |
User-defined classes referenced by the stage |
mod:<module>.<attr> |
Attributes accessed on imported user modules |
const:<name> |
Global constants (primitives, frozen collections) |
schema:<name> |
Pydantic model JSON schemas |
loader:<class>:<method> |
Reader/Writer method ASTs and config |
How It Works¶
For each stage function, Pivot:
- Parses the AST —
inspect.getsource()→ast.parse()→ normalize (strip docstrings, renamedeftofunc) →ast.dump()→xxhash64 - Walks closure variables —
inspect.getclosurevars()finds globals and nonlocals the function references - Recurses transitively — every user-code callable found in step 2 gets the same treatment, and its dependencies are merged into the manifest
- Inspects type hints — user-defined classes in annotations (including Pydantic models) are fingerprinted
- Hashes Pydantic schemas —
model_json_schema()captures field types, defaults, and validators
The manifest is stored in the per-stage lock file. On the next run, the worker recomputes the manifest and compares it key-by-key against the locked version. Any difference triggers re-execution.
Transitive Tracking¶
If train() calls normalize() which calls clip_outliers(), all three
functions appear in the manifest. Changing clip_outliers triggers a
re-run of train even though train itself didn't change.
Module-level attributes are tracked too. If your stage does
from myproject import config and then accesses config.THRESHOLD, the
value of THRESHOLD is captured (via repr() for primitives, or AST hash
for callables).
What Triggers a Re-Run¶
| Change | Triggers re-run? |
|---|---|
| Rename a variable | Yes |
| Change a numeric constant | Yes |
| Add/remove a function argument | Yes |
| Change a helper function your stage calls | Yes |
| Change a Pydantic field type or default | Yes |
| Edit a docstring | No |
| Add/remove comments | No |
| Change whitespace/formatting | No |
| Change an unused import | No |
Surprises and Pitfalls¶
Lambdas¶
Lambda functions have no stable source location across Python runs. Pivot
falls back to id(func), which is non-deterministic — your stage
will re-run every time. Always use named functions:
# Bad — re-runs every time
pipeline.register(lambda data: data.dropna(), name="clean")
# Good — stable fingerprint
def clean(data):
return data.dropna()
pipeline.register(clean)
Mutable Captured Variables¶
Pivot cannot track mutations to mutable objects captured from an enclosing
scope. If your stage closes over a list, dict, or mutable instance, Pivot
raises StageDefinitionError at registration time:
config = {"threshold": 0.5} # Mutable dict
def process(data):
return data[data["score"] > config["threshold"]] # Error!
Fix: pass the value via StageParams or declare it as a
Dep input. For truly static config, use a frozen dataclass or
frozenset.
To suppress the check (at your own risk), set
core.unsafe_fingerprinting = true in your Pivot config or
PIVOT_UNSAFE_FINGERPRINTING=1 in the environment.
Dynamic Name Access¶
Pivot rejects patterns that bypass static analysis:
globals()/locals()— runtime namespace accessgetattr(obj, variable)— dynamic attribute lookupimportlib.import_module()— dynamic imports
All of these silently introduce dependencies that fingerprinting can't track. Use direct attribute access and static imports instead.
@pivot.no_fingerprint()¶
For stages where AST fingerprinting doesn't work (C extensions, generated
code, complex metaprogramming), opt out with the @pivot.no_fingerprint()
decorator:
import pivot
@pivot.no_fingerprint()
def external_model_stage(data):
...
@pivot.no_fingerprint(code_deps=["scripts/train.sh", "configs/model.yaml"])
def shell_stage(data):
...
With @pivot.no_fingerprint(), Pivot falls back to file-level hashing —
it hashes the entire source file containing the function. The optional
code_deps argument lets you list additional files that should be
considered part of the stage's code.
Use sparingly. File-level hashing is less precise: any change anywhere in the file triggers a re-run.
Performance¶
Fingerprinting runs once during pipeline discovery (single-threaded, in the coordinator process). Results are cached at two levels:
- In-memory —
WeakKeyDictionarycache avoids re-parsing the same function within a single process - Persistent — AST hashes and full manifests are cached in StateDB,
keyed by
(file_path, mtime, size, inode). If the source file hasn't changed, the cached hash is reused without parsing
For a project with 125 stages, fingerprinting typically completes in under 100ms on subsequent runs.
Python Version Dependency¶
AST fingerprinting means the Abstract Syntax Tree — not bytecode — is what gets hashed. AST structure can vary between Python versions (e.g., 3.12 vs 3.13 may parse some constructs differently), so the same source code can produce different fingerprints under different Python versions. This can cause unnecessary stage re-runs when switching versions.
To avoid surprises:
- Pin your Python version in your project (e.g.,
.python-version) - Use uv to manage your Python environment consistently
- All team members and CI should use the same Python version
Relationship to Other Concepts¶
The fingerprint manifest is one of three inputs to the caching skip-detection algorithm, alongside parameters and dependency hashes. A stage skips only when all three match.