Skip to content

Migrating from DVC

This guide helps you migrate an existing DVC pipeline to Pivot.

Key Differences

Feature DVC Pivot
Pipeline definition dvc.yaml pivot.yaml
Parameters params.yaml Python StageParams classes
Code tracking Manual deps: on .py files Automatic code fingerprinting
Stage execution Shell commands Python functions
Lock file Single dvc.lock Per-stage .pivot/stages/*.lock
Cache .dvc/cache/ .pivot/cache/

Concept Mapping

Stages

DVC:

# dvc.yaml
stages:
  preprocess:
    cmd: python scripts/preprocess.py
    deps:
      - data/raw.csv
      - scripts/preprocess.py
    outs:
      - data/processed.csv

Pivot:

# pivot.yaml
stages:
  preprocess:
    python: scripts.preprocess.run
    deps:
      raw: data/raw.csv
    outs:
      processed: data/processed.csv

# scripts/preprocess.py
from typing import Annotated, TypedDict

import pandas
from pivot import loaders, outputs


class PreprocessOutputs(TypedDict):
    processed: Annotated[pandas.DataFrame, outputs.Out("data/processed.csv", loaders.CSV())]


def run(
    raw: Annotated[pandas.DataFrame, outputs.Dep("data/raw.csv", loaders.CSV())],
) -> PreprocessOutputs:
    df = raw.dropna()
    return {"processed": df}

Note: Pivot automatically tracks code changes. You don't need to list .py files in deps.

Parameters

DVC:

# params.yaml
train:
  learning_rate: 0.01
  epochs: 100

# dvc.yaml
stages:
  train:
    cmd: python train.py
    params:
      - train.learning_rate
      - train.epochs

Pivot:

# train.py
from pivot.stage_def import StageParams


class TrainParams(StageParams):
    learning_rate: float = 0.01
    epochs: int = 100


def train(params: TrainParams, ...):
    print(f"LR: {params.learning_rate}")
    ...

# pivot.yaml
stages:
  train:
    python: train.train
    params:
      learning_rate: 0.05  # Override defaults

Benefits: - Type checking and IDE support - Validation at parse time - Parameter changes detected via fingerprinting

Metrics and Plots

DVC:

# dvc.yaml
stages:
  train:
    cmd: python train.py
    metrics:
      - metrics.json:
          cache: false
    plots:
      - plots/loss.png

Pivot:

class TrainOutputs(TypedDict):
    model: Annotated[pathlib.Path, outputs.Out("model.pkl", loaders.PathOnly())]
    metrics: Annotated[dict, outputs.Metric("metrics.json")]
    plot: Annotated[pathlib.Path, outputs.Plot("plots/loss.png", loaders.PathOnly())]

# pivot.yaml
stages:
  train:
    python: train.train
    outs:
      model: model.pkl
    metrics:
      metrics: metrics.json
    plots:
      plot: plots/loss.png

Remote Storage

DVC:

dvc remote add -d myremote s3://mybucket/cache
dvc push
dvc pull

Pivot:

pivot config set remotes.origin s3://mybucket/cache
pivot config set default_remote origin
pivot push
pivot pull

Migration Steps

Step 1: Create pivot.yaml

Convert your dvc.yaml to pivot.yaml:

# pivot.yaml
stages:
  preprocess:
    python: scripts.preprocess.run
    deps:
      raw: data/raw.csv
    outs:
      processed: data/processed.csv

  train:
    python: scripts.train.run
    deps:
      data: data/processed.csv
    outs:
      model: models/model.pkl
    metrics:
      metrics: metrics.json

Step 2: Convert Scripts to Functions

Transform shell-command scripts into Python functions:

Before (scripts/preprocess.py):

import pandas as pd

df = pd.read_csv("data/raw.csv")
df = df.dropna()
df.to_csv("data/processed.csv", index=False)

After (scripts/preprocess.py):

from typing import Annotated, TypedDict

import pandas
from pivot import loaders, outputs


class PreprocessOutputs(TypedDict):
    processed: Annotated[pandas.DataFrame, outputs.Out("data/processed.csv", loaders.CSV())]


def run(
    raw: Annotated[pandas.DataFrame, outputs.Dep("data/raw.csv", loaders.CSV())],
) -> PreprocessOutputs:
    df = raw.dropna()
    return {"processed": df}

Step 3: Convert Parameters to StageParams

Before (params.yaml + train.py):

import yaml

with open("params.yaml") as f:
    params = yaml.safe_load(f)["train"]

lr = params["learning_rate"]

After (train.py):

from pivot.stage_def import StageParams


class TrainParams(StageParams):
    learning_rate: float = 0.01
    epochs: int = 100


def run(params: TrainParams, ...):
    lr = params.learning_rate
    ...

Step 4: Configure Remote

# Copy remote URL from DVC
dvc remote list
# myremote  s3://mybucket/cache

# Configure in Pivot
pivot config set remotes.origin s3://mybucket/cache
pivot config set default_remote origin

Step 5: Run and Verify

# Run pipeline
pivot repro

# Compare outputs with DVC
diff data/processed.csv data/processed.csv.dvc_backup

Step 6: Export for Validation (Optional)

Pivot can export back to DVC format for validation:

pivot export

# Should show nothing needs to run
dvc repro --dry

Running Side-by-Side

During migration, you can run both tools:

# Run with Pivot
pivot repro

# Validate outputs match DVC
pivot export
dvc repro --dry  # Should show nothing to run

Export Command

Export Pivot pipeline to DVC format:

# Generate dvc.yaml
pivot export

# Custom output path
pivot export --output my-pipeline.yaml

# Export specific stages
pivot export preprocess train

Limitations of Export

The export captures:

  • Stage commands (as Python function calls)
  • Dependencies
  • Outputs (with cache/persist settings)
  • Metrics and plots

Not exported:

  • Automatic code fingerprinting (DVC doesn't support this)
  • Mutex groups
  • Pydantic parameter types (exported as plain values)

FAQs

Do I need to migrate all at once?

No. You can migrate stage by stage. As long as output paths match, downstream DVC stages can consume Pivot outputs.

What about my existing cache?

Pivot uses a different cache format. You'll need to re-run stages to populate the Pivot cache. Your DVC cache remains intact.

Can I use params.yaml with Pivot?

Pivot supports params.yaml for overrides, but the primary source should be Python StageParams classes. This gives you type checking and IDE support.

What about dvc plots and metrics?

Pivot has equivalent Metric and Plot output types. The workflow is similar:

# DVC
dvc metrics show
dvc plots show

# Pivot
pivot metrics show
pivot plots show