This layer has one job: run the right thing at the right time, and tell you when it didn't. Everything else is overkill until you've outgrown it.
Most companies need cron with good logging. Not a DAG tool.
A pipeline is a recipe. Some steps can happen in parallel (chop the onions while the water boils). Some have to wait (you can't plate until the rice is done). Orchestration is the kitchen manager who tracks what's finished, what's next, and what to do when the oven breaks.
A DAG is the recipe written as a dependency map. Ingest orders → build stg_orders → build fct_revenue. Arrows never point backwards. A DAG tool (Airflow, Dagster, Prefect) runs the map, reruns failed steps, and shows you a pretty graph.
Ingestion runs hourly. Transform runs on successful ingestion. One webhook between them. That's the whole orchestrator.
Declare what each asset depends on and how fresh it needs to be. The orchestrator figures out the schedule, the backfill, the partial-rerun. You stop writing DAGs by hand.
# orchestration/assets/shopify_orders.py
from dagster import asset, AutoMaterializePolicy, FreshnessPolicy
@asset(
key_prefix=["raw_shopify"],
auto_materialize_policy=AutoMaterializePolicy.eager(),
freshness_policy=FreshnessPolicy(maximum_lag_minutes=20),
metadata={"owner": "ops-data-team", "pii": False},
)
def orders(context, klaviyo_api):
"""Pull Shopify orders updated in the last 24h. Lands in raw/."""
rows = list(extract.fetch_paginated("orders", since=context.cursor or "24h"))
context.log.info(f"Fetched {len(rows)} order rows")
write_parquet(f"raw/shopify/orders/dt={today()}/", rows)
return rows
Load-bearingThe FreshnessPolicy is the contract. If raw orders go > 20min stale, Dagster fires before the dashboard goes wrong. Owner + PII flag travel with the asset, so observability and access control inherit them automatically.