L4 · Orchestration — Pipelines for AI playbook

Orchestration

Orchestration — what runs when.

This layer has one job: run the right thing at the right time, and tell you when it didn't. Everything else is overkill until you've outgrown it.

Our take

Most companies need cron with good logging. Not a DAG tool.

Pipelines for AI by Getting Automated

FREE RESOURCE .xlsx · 6 sheets · ~26 KB

L4 ORCHESTRATOR RUNBOOK

Every job. Named. Owned. Routed.

When a pipeline breaks at 6am, the question isn’t “what happened” — it’s “who fixes it.” This runbook is what we hand to operators after a build. 12 real pipelines pre-filled with SLAs, failure modes, fix steps, and routed alerts. The doc that turns 3am pages into 10-minute fixes.

WHAT'S INSIDE — 6 SHEETS

01Start here. Plain-English orientation. What's a DAG, what's a cron, when to use which.
02Cron vs DAG. Decision matrix: when cron is enough vs when a DAG tool earns its complexity.
03Job inventory. 12 real pipelines: Stalled Orders, Weekly Carrier Review, Daily KPI Brief + 25 blank rows.
04Alert routing. 7 tiers from Slack-only to page-on-call — with who gets what, why, when.
05Glossary. Every term defined plainly — DAG, asset, SLA, freshness, sensor, schedule.

No newsletter spam. Occasional ping when the playbook gets a real update. Reply “remove” and you’re out.

Cover 01 · Start here 02 · Cron vs DAG 03 · Job inventory 04 · Alert routing 05 · Glossary

Job

Schedule

Owner

SLA

Alert

Stalled Orders

every 15m

Ops

15 min

page

Daily KPI Brief

7:00 AM

Founder

2 hr

slack

Weekly Carrier Review

Mon 9am

Ops

1 day

LTV Refresh

nightly 2am

Marketing

12 hr

slack

Churn Scorer

hourly

1 hr

slack

Cost Audit

Sun 8am

Eng

1 day

Ready Sheet 03 · Job inventory 6 of 12 real pipelines · failure modes + fix steps in the .xlsx

First, the words Orchestration / DAG

Orchestration is the conductor that tells every job when to start — and what to wait for.

A pipeline is a recipe. Some steps can happen in parallel (chop the onions while the water boils). Some have to wait (you can't plate until the rice is done). Orchestration is the kitchen manager who tracks what's finished, what's next, and what to do when the oven breaks.

A DAG is the recipe written as a dependency map. Ingest orders → build stg_orders → build fct_revenue. Arrows never point backwards. A DAG tool (Airflow, Dagster, Prefect) runs the map, reruns failed steps, and shows you a pretty graph.

Why start with cron

If your "graph" is basically one long line — ingest, then transform, then export — you don't need a DAG tool. You need a scheduler, a log file, and a Slack alert. Cron does that in 10 lines. Graduate to a DAG tool only when you have real fan-out (many consumers, many schedules, backfills, mixed languages).

01 Stage one · year 1

Your ingest tool's scheduler + your transform tool's CLI in CI.

Fivetran schedule dbt Cloud / run in CI 1 webhook

Ingestion runs hourly. Transform runs on successful ingestion. One webhook between them. That's the whole orchestrator.

Fits: < 50 models · one schedule · one consumer cadence
Ops: ~0 hrs/week
Cost: already paid for

You graduate when

Multiple consumers need different SLAs from the same model.

Backfills become a first-class operation, not an afternoon.

Custom Python steps mixed in with SQL.

graduate

02 Stage two · real orchestrator

Asset-based scheduling. Think in tables that should be fresh, not tasks that should run.

Dagster Prefect Airflow (last resort)

Declare what each asset depends on and how fresh it needs to be. The orchestrator figures out the schedule, the backfill, the partial-rerun. You stop writing DAGs by hand.

Fits: 50–500+ assets · multi-team · real SLAs
Ops: 1–2 hrs/week, reclaimed in reliability
Cost: hosted: $ · self-run: eng time

Anti-pattern

Orchestrator as the first thing you set up. Three weeks on a DAG tool, nothing useful shipped. The orchestrator is a consequence of complexity, not a prerequisite for it.

Artifact orchestration/assets/shopify_orders.py ~18 lines · Dagster

# orchestration/assets/shopify_orders.py
from dagster import asset, AutoMaterializePolicy, FreshnessPolicy

@asset(
    key_prefix=["raw_shopify"],
    auto_materialize_policy=AutoMaterializePolicy.eager(),
    freshness_policy=FreshnessPolicy(maximum_lag_minutes=20),
    metadata={"owner": "ops-data-team", "pii": False},
)
def orders(context, klaviyo_api):
    """Pull Shopify orders updated in the last 24h. Lands in raw/."""
    rows = list(extract.fetch_paginated("orders", since=context.cursor or "24h"))
    context.log.info(f"Fetched {len(rows)} order rows")
    write_parquet(f"raw/shopify/orders/dt={today()}/", rows)
    return rows

Load-bearingThe FreshnessPolicy is the contract. If raw orders go > 20min stale, Dagster fires before the dashboard goes wrong. Owner + PII flag travel with the asset, so observability and access control inherit them automatically.