Back to main page
For builders · The playbook

The shape of a pipeline that an AI can actually use.

This page is for the technical people in the room. It's the architecture we keep arriving at — after shipping this for founders, ops teams, and internal tools. Not a framework. Not a vendor pitch. Just the opinions we've formed from doing the work.

If you're going to build this yourself: here's the map. If you'd rather we do it: that's at the bottom.

Opinionated · playbook ~12 min read Last updated · April 2026 What we deliberately skip
01 The shape

Every working AI-on-business-data system has the same shape. Five build layers + two cross-cutting disciplines.

The five layers are sequential — you build them in order. The two disciplines (Trust + Identity & Compliance) thread through every layer from L1. Skip one of the layers, the layer above it starts lying to you. Skip a discipline, the whole platform quietly rots in nine months.

fig 01The whole stack, end to end
D1 Trust observability · data quality · freshness · schema drift · output validation DESIGNED IN AT L1 · PRESENT AT L1, L2, L3, L4, L5 FROM SOURCES 10–30 per business CRM Billing / payments Product DB Ad platforms Support tickets Spreadsheets Carrier / 3PL Marketing tools Custom / internal 25-source rule audit them at L1 L1 Source Audit + inventory Map questions, not systems Per-source SLA, owner, rotation PRE-CODE whiteboard work ~10–30 sources flagged + owned L2 Ingest Three patterns: webhooks polling batch picked per source. Idempotent writes. Reconciliation backstops. raw.* to the lake L3 Model SOURCE OF TRUTH RAW as-is · append-only never edit by hand STAGING cleaned · typed 1:1 with source MARTS business concepts customer · order · churn one row, six systems SQL IN GIT · TESTED · VERSIONED L4 Orchestration Run the right thing at the right time. Step Functions, scheduled jobs, DAGs. Lane-aware Slack delivery. Runbook-linked alarms. 12 reports/day L5 AI Interface PAYOFF Tools, not SQL. Request envelope state machine. Eval harness. Governance gate. Cost per call. $0.012/extraction tracked from day 1 TO CONSUMERS humans + AI AI chat AI agents Operational dashboards Daily KPI brief Slack lanes Email reports BI / Sheets 12 named reports running today D2 Identity & Compliance auth · IAM · KMS · PII handling · multi-tenant isolation · audit trail DESIGNED IN AT L1 · PRESENT AT L1, L2, L3, L4, L5 5 BUILD LAYERS · 2 CROSS-CUTTING DISCIPLINES · 1 LAKE
Build layers (L1–L5)
Disciplines (D1, D2)
Lake — source of truth
Principle · 01

Model the business, not the apps.

Marts are business concepts — customer, deal, order — not database tables. If the AI has to join six things to answer a question, you haven't finished the job.

Principle · 02

The storage layer is the contract.

Every consumer reads from marts — whether they live in a lake or a warehouse. Change a source? Absorb it in staging. Dashboards and AI never see the chaos underneath.

Principle · 03

AI reads through tools, not SQL.

Giving an LLM raw table access is how you get confident, wrong answers on prod. Typed tools with named metrics are how you get trustworthy ones.

02 The layers

Drill into any layer. Each gets its own page.

Five build layers go in order. Two cross-cutting disciplines thread through every layer from L1. Each card below jumps to a focused page with the full prose, code, war stories, and a vendor question.

BUILD LAYERS Sequential. You build them in order.
L1Source

Map questions, not systems.

Audit + inventory of every source that holds business data. The exercise that determines whether the build succeeds or quietly fails.

  • Map questions, not systems
  • Expect 25 sources, not 5
  • Per-source SLA + owner + rotation, written at L1
Read L1 — Source
L2Ingest

Get data in. Reliably.

Three patterns picked per source — webhooks, polling, batch. Idempotent writes. Reconciliation backstops on every webhook.

  • One connector per source, owned by us
  • Idempotent extracts, append-only writes
  • Retries with jitter, monitored freshness
Read L2 — Ingest
L3Model

Start with a lake. Six systems → one row.

Storage tiers (raw / staging / marts) plus the SQL-in-git transform layer that joins sources into business concepts. The biggest layer; the source of truth.

  • Three zones: raw → staging → marts
  • Lake first, warehouse only when hot
  • Every model: a SELECT, reviewed, tested, versioned
Read L3 — Model
L4Orchestration

What runs when.

Step Functions / DAGs that run the right job at the right time and route the result to the right person via the right channel.

  • Start with cron + good logging
  • Every job has an owner + freshness contract
  • One alert channel humans actually watch
Read L4 — Orchestration
L5AI interface

Tools, not SQL. The payoff layer.

Tools (not raw SQL) that the LLM calls. Request envelope state machine. Eval harness. Human-approval gate for irreversible actions.

  • Tools, not raw SQL
  • 8–15 tools, not 80
  • Every response cites its source tables
Read L5 — AI interface
The shift
Data engineering stops feeling like a backlog of one-off exports and starts feeling like software — the moment every number in your business has a name, a test, and a git commit.
— The rule we build by
09 Rhythm

How we actually build this in four to six weeks.

Not a gantt chart. A sequence. Each week has one thing that has to be true by Friday.

Week 01 · Map
Find the decisions.
  • Interview the three or four people who actually run the business.
  • List the questions they ask weekly. List the ones they've given up on.
  • Inventory every system data lives in. Including the spreadsheets.
  • By Friday: a one-pager of marts we need to build, ranked.
Week 02 · Plumb
Raw data, landing clean.
  • Stand up ingestion for the top 4–6 sources.
  • Storage provisioned (S3 + Glue + Athena by default). Zones created.
  • Staging models, 1:1 with source. Basic tests.
  • By Friday: all critical raw data refreshes on a schedule, freshness alert live.
Week 03–04 · Model
Business concepts.
  • Build marts for each ranked concept (customer, deal, order…).
  • Match numbers against the reports the business currently trusts.
  • Wire up at least one dashboard per mart. Get someone to break it.
  • By Friday of 04: numbers tie out. Someone on the team vouches for each mart.
Week 05–06 · Interface
Make it usable.
  • Stand up the AI interface — 8–12 tools, covering the weekly questions.
  • Ship one live sync (sheet, Slack bot, embedded view) per key workflow.
  • Handover doc. Runbook. Who owns what.
  • By Friday of 06: the team uses this instead of whatever they used before.
Note
Week 01 is the one people want to skip. It's also the one that determines whether week 06 ships something useful. Do not skip.
10 Opinions

What we deliberately skip — and what we build instead.

Listing these here because they're usually the first bullets on someone else's architecture doc. Some we think are the wrong order of operations. Others — we disagree on principle and do the harder, more valuable version.

Our stack · AWS
We build on AWS and love it. Glue + S3 + Redshift/Athena + Lambda + Step Functions covers almost everything we ship. That said — there are a million ways to build this. Snowflake, BigQuery, Databricks, Fabric, or a laptop-sized setup on DuckDB all work. Pick the cloud you already pay for, the one your team knows, and go.
opinion, not religion
Not now
Vector DBs for business data

Your pipeline and your orders aren't a semantic-search problem. They're a joins-and-aggregates problem. Don't vectorize what you can query directly.

Not now
Real-time streaming

15-minute refreshes are real-time for 99% of business decisions. Streaming is real work you don't need yet. Ask if anyone has ever said "if only this were 30 seconds fresher."

Not now
Custom LLMs / fine-tuning

Fine-tuning fixes vocabulary, not data access. If your AI gives wrong answers, the fix is almost always upstream — in the marts, not the model.

We do build
A real data catalog — with agents that search it

A catalog isn't just docs. We build one that's queryable by agents — so AI can discover what exists, find the right mart, and connect the dots across systems. The catalog is the agent's memory of your business.

We do build
Custom connectors to the SaaS apps that matter

When the managed connector doesn't cover a custom object, a rate-limited endpoint, or the specific field your team lives in — we write the connector. It lands the data the way the business needs, not the way the vendor defaults to.

We do build
Agents that compose tools, not chatbots that answer questions

A single well-scoped agent, with tight tools and strong feedback loops, beats a five-agent swarm every time. Once one agent is solid, we add a second — carefully. Multi-agent is earned, not assumed.

11 Engineer FAQ

The questions every CTO emails after reading this. Answered up front.

Not exhaustive. The questions that come up in every evaluation call.

Q · Tooling

Why dbt and not SQLMesh / SQLFluff alone?

dbt is what the team you're hiring already knows. SQLMesh has a tighter execution model and a few features we like (virtual data environments, native Python models), and we'll use it on greenfield engagements where the client team is open to it. But on most builds, the consultant who picks up the project from us, the analyst who writes the next mart, and the platform engineer who debugs at 3am all know dbt. Pick the boring tool the team can run.

Q · Cloud

Why AWS specifically? Could you build this on Snowflake-only?

You can absolutely build this on Snowflake-only — or BigQuery-only, or Databricks-only — and we have. AWS is our default because S3 + Glue + Athena + Lambda gives us a lake-first architecture for cents on the dollar versus warehouse-first economics, and because most clients already have an AWS bill we can plug into.

If you're already deep on Snowflake, we'll build the marts in Snowflake and skip the lake split. If you're on Databricks, we'll lean into Delta Lake. The five build layers and two disciplines don't change; the SKUs do.

Q · Connectors

Why custom Python connectors over Fivetran / Airbyte / Stitch?

For the long tail — and where the actual value lives — managed connectors don't cover the surface. Custom objects, rate-limited or paginated quirks, that one field your team lives in, an internal API, an SFTP drop, a legacy DB. Fivetran covers the easy 60%. Custom Python covers the 40% that determines whether the AI can actually answer.

We're not religious about it: if Fivetran covers a source cleanly, we use it and pay the per-row fee. The moat is in writing the connector for the source that doesn't have a managed option.

Q · Agents

Why one tightly-scoped agent over a multi-agent swarm?

One well-scoped agent with a small toolset and tight feedback loops beats a five-agent swarm 95% of the time. Multi-agent introduces coordination overhead, error-propagation, debugging hell, and latency — for value that's usually a delta on top of "did the first agent already nail it." We add a second agent only when there's a clear failure mode the first one structurally can't handle (e.g. a planning agent that proposes a plan, a separate executor that runs it).

Q · Orchestration

Why Dagster over Airflow / Prefect?

Dagster's asset-based mental model fits how data actually flows — every asset has owners, freshness, lineage, metadata. Airflow's task-graph model treats data as a side-effect of tasks, which is backwards. Prefect 2/3 is closer to Dagster but lighter — fine for smaller setups. On bigger engagements where lineage and freshness contracts matter, Dagster wins.

If you're already running Airflow and have a team that knows it, we'll write Airflow DAGs. The orchestrator is the most replaceable layer in the stack.

Q · Real-time

Why not streaming / real-time?

15-minute refreshes are real-time for 99% of business decisions. Streaming adds Kafka, Flink/Spark Streaming, exactly-once semantics, and a permanent uplift in operational cost. We do streaming when there's a concrete decision a human or system needs to make in <1min. Otherwise: micro-batch every 5–15min, monitor freshness, ship.

Q · Lake format

Iceberg vs Delta vs Hudi?

Iceberg, in 2026, on AWS. Glue Catalog supports it natively, Athena queries it directly, and the ecosystem (dbt-iceberg, Trino, Snowflake/Databricks both read it) is the most portable. Delta is great if you're committed to Databricks; Hudi has the strongest streaming-upsert story but the smallest ecosystem. The honest answer: any of the three is fine — Iceberg is what we default to because the catalog story is cleanest on the cloud most of our clients use.

Q · Migrations

What if we already have an old warehouse / Looker setup we don't want to throw away?

You don't throw it away. The new foundation can read the old warehouse as a source — we don't ask anyone to rip out a working BI stack. The migration story we ship most often: existing Looker/Sigma/Tableau keeps running on the old marts; new AI workflows + new dashboards point at the new clean marts; over the next 6–12 months, dashboards drift toward the new foundation as people rebuild. The old stack stays useful while the new one earns its place.

That's the sauce. Now do something with it.

This is the shape of every pipeline we ship. Copy it. Disagree with it. Argue with us about it. The goal is that technical teams stop reinventing this stack from scratch every time — and start arguing about which layer to sharpen first.

If you'd rather we build it

4–6 weeks. One fixed price. You end up with the diagram above.

Same opinions, shipped on your stack. You'll be running the whole thing by week six — we document everything, and hand you the repo, the runbook, and the team that knows it.

Book a 30-min walkthrough →
Want more like this

I'm narrating each of these layers on YouTube.

One video per layer. Live code. Real war stories. The same opinions on this page, with more color. Subscribe if that's useful — no newsletter, no upsell, just the videos.

Subscribe on YouTube

Want the markdown source of this playbook to fork or annotate? Email me and I'll send it. Or stash this page — it’ll keep being updated as we ship more.