Build layer Layer 4 of 5 build layers ~10 min read
L4
Orchestration

Orchestration — what runs when.

This layer has one job: run the right thing at the right time, and tell you when it didn't. Everything else is overkill until you've outgrown it.

Our take

Most companies need cron with good logging. Not a DAG tool.

First, the words Orchestration / DAG

Orchestration is the conductor that tells every job when to start — and what to wait for.

A pipeline is a recipe. Some steps can happen in parallel (chop the onions while the water boils). Some have to wait (you can't plate until the rice is done). Orchestration is the kitchen manager who tracks what's finished, what's next, and what to do when the oven breaks.

A DAG is the recipe written as a dependency map. Ingest ordersbuild stg_ordersbuild fct_revenue. Arrows never point backwards. A DAG tool (Airflow, Dagster, Prefect) runs the map, reruns failed steps, and shows you a pretty graph.

Why start with cron
If your "graph" is basically one long line — ingest, then transform, then export — you don't need a DAG tool. You need a scheduler, a log file, and a Slack alert. Cron does that in 10 lines. Graduate to a DAG tool only when you have real fan-out (many consumers, many schedules, backfills, mixed languages).
01 Stage one · year 1

Your ingest tool's scheduler + your transform tool's CLI in CI.

Fivetran schedule dbt Cloud / run in CI 1 webhook

Ingestion runs hourly. Transform runs on successful ingestion. One webhook between them. That's the whole orchestrator.

Fits
< 50 models · one schedule · one consumer cadence
Ops
~0 hrs/week
Cost
already paid for
You graduate when
Multiple consumers need different SLAs from the same model.
Backfills become a first-class operation, not an afternoon.
Custom Python steps mixed in with SQL.
graduate
02 Stage two · real orchestrator

Asset-based scheduling. Think in tables that should be fresh, not tasks that should run.

Dagster Prefect Airflow (last resort)

Declare what each asset depends on and how fresh it needs to be. The orchestrator figures out the schedule, the backfill, the partial-rerun. You stop writing DAGs by hand.

Fits
50–500+ assets · multi-team · real SLAs
Ops
1–2 hrs/week, reclaimed in reliability
Cost
hosted: $ · self-run: eng time
Anti-pattern
Orchestrator as the first thing you set up. Three weeks on a DAG tool, nothing useful shipped. The orchestrator is a consequence of complexity, not a prerequisite for it.
Artifact orchestration/assets/shopify_orders.py ~18 lines · Dagster
# orchestration/assets/shopify_orders.py
from dagster import asset, AutoMaterializePolicy, FreshnessPolicy

@asset(
    key_prefix=["raw_shopify"],
    auto_materialize_policy=AutoMaterializePolicy.eager(),
    freshness_policy=FreshnessPolicy(maximum_lag_minutes=20),
    metadata={"owner": "ops-data-team", "pii": False},
)
def orders(context, klaviyo_api):
    """Pull Shopify orders updated in the last 24h. Lands in raw/."""
    rows = list(extract.fetch_paginated("orders", since=context.cursor or "24h"))
    context.log.info(f"Fetched {len(rows)} order rows")
    write_parquet(f"raw/shopify/orders/dt={today()}/", rows)
    return rows

Load-bearingThe FreshnessPolicy is the contract. If raw orders go > 20min stale, Dagster fires before the dashboard goes wrong. Owner + PII flag travel with the asset, so observability and access control inherit them automatically.