For Builders — The opinionated architecture

01 The shape

Every working AI-on-business-data system has the same shape. Five build layers + two cross-cutting disciplines.

The five layers are sequential — you build them in order. The two disciplines (Trust + Identity & Compliance) thread through every layer from L1. Skip one of the layers, the layer above it starts lying to you. Skip a discipline, the whole platform quietly rots in nine months.

fig 01The whole stack, end to end

Source

Audit + inventory the systems that hold the data. Most teams skip this and build a one-system answer to a 6-system question.

Ingest

Three patterns — real-time webhooks, scheduled polling, batch — picked per source. Idempotent writes, reconciliation backstops.

Model

Lake-first storage (raw / processed / metrics tiers) plus the schema work that turns 6 systems into one row about a single customer.

Orchestration

Step Functions, scheduled queries, lane-aware delivery. The chain that turns lake queries into reports the right person actually sees.

AI Interface

Request envelope state machine, eval harness, governance gate. The payoff layer where the lake turns into AI features your team uses every day.

Disciplines (cross-cutting · designed in at L1 · present at every layer)

D1 — Trust. Observability, data quality, freshness, schema drift. The discipline that prevents silent failures at every layer.
D2 — Identity & Compliance. Auth, IAM, PII handling, audit, multi-tenancy. The discipline that keeps the lake safe at every layer.

Build layers (L1–L5)

Disciplines (D1, D2)

Lake — source of truth

Principle · 01

Model the business, not the apps.

Marts are business concepts — customer, deal, order — not database tables. If the AI has to join six things to answer a question, you haven't finished the job.

Principle · 02

The storage layer is the contract.

Every consumer reads from marts — whether they live in a lake or a warehouse. Change a source? Absorb it in staging. Dashboards and AI never see the chaos underneath.

Principle · 03

AI reads through tools, not SQL.

Giving an LLM raw table access is how you get confident, wrong answers on prod. Typed tools with named metrics are how you get trustworthy ones.

02 The layers

Drill into any layer. Each gets its own page.

Five build layers go in order. Two cross-cutting disciplines thread through every layer from L1. Each card below jumps to a focused page with the full prose, code, war stories, and a vendor question.

BUILD LAYERS Sequential. You build them in order.

L1Source

Map questions, not systems.

Audit + inventory of every source that holds business data. The exercise that determines whether the build succeeds or quietly fails.

Map questions, not systems
Expect 25 sources, not 5
Per-source SLA + owner + rotation, written at L1

Read L1 — Source →

L2Ingest

Get data in. Reliably.

Three patterns picked per source — webhooks, polling, batch. Idempotent writes. Reconciliation backstops on every webhook.

One connector per source, owned by us
Idempotent extracts, append-only writes
Retries with jitter, monitored freshness

Read L2 — Ingest →

L3Model

Start with a lake. Six systems → one row.

Storage tiers (raw / staging / marts) plus the SQL-in-git transform layer that joins sources into business concepts. The biggest layer; the source of truth.

Three zones: raw → staging → marts
Lake first, warehouse only when hot
Every model: a SELECT, reviewed, tested, versioned

Read L3 — Model →

L4Orchestration

What runs when.

Step Functions / DAGs that run the right job at the right time and route the result to the right person via the right channel.

Start with cron + good logging
Every job has an owner + freshness contract
One alert channel humans actually watch

Read L4 — Orchestration →

L5AI interface

Tools, not SQL. The payoff layer.

Tools (not raw SQL) that the LLM calls. Request envelope state machine. Eval harness. Human-approval gate for irreversible actions.

Tools, not raw SQL
8–15 tools, not 80
Every response cites its source tables

Read L5 — AI interface →

CROSS-CUTTING DISCIPLINES Designed in at L1. Present at every layer. Not bolted on later.

D1Trust

The lake is correct. Or you have nothing.

Observability + data quality + freshness + schema-drift detection. Threads through L1 → L5. The discipline that prevents silent failures.

Four monitors, always on: freshness, volume, schema, nulls
Every alert names an owner + downstream blast radius
One alert channel humans actually watch

Read D1 — Trust →

D2Identity & Compliance

The lake is safe. Auth, IAM, PII, audit.

Auth + IAM + KMS + PII + multi-tenant + audit trail. Threads through L1 → L5. The discipline that keeps the platform compliance-audit-ready.

Credentials in Secrets Manager — never .env files
Multi-tenant isolation in the platform, not application code
Never PII in LLM prompts; references-by-ID only

Read D2 — Identity & Compliance →

09 Rhythm

How we actually build this in four to six weeks.

Not a gantt chart. A sequence. Each week has one thing that has to be true by Friday.

Week 01 · Map

Find the decisions.

Interview the three or four people who actually run the business.
List the questions they ask weekly. List the ones they've given up on.
Inventory every system data lives in. Including the spreadsheets.
By Friday: a one-pager of marts we need to build, ranked.

Week 02 · Plumb

Raw data, landing clean.

Stand up ingestion for the top 4–6 sources.
Storage provisioned (S3 + Glue + Athena by default). Zones created.
Staging models, 1:1 with source. Basic tests.
By Friday: all critical raw data refreshes on a schedule, freshness alert live.

Week 03–04 · Model

Business concepts.

Build marts for each ranked concept (customer, deal, order…).
Match numbers against the reports the business currently trusts.
Wire up at least one dashboard per mart. Get someone to break it.
By Friday of 04: numbers tie out. Someone on the team vouches for each mart.

Week 05–06 · Interface

Make it usable.

Stand up the AI interface — 8–12 tools, covering the weekly questions.
Ship one live sync (sheet, Slack bot, embedded view) per key workflow.
Handover doc. Runbook. Who owns what.
By Friday of 06: the team uses this instead of whatever they used before.

Note

Week 01 is the one people want to skip. It's also the one that determines whether week 06 ships something useful. Do not skip.

10 Opinions

What we deliberately skip — and what we build instead.

Listing these here because they're usually the first bullets on someone else's architecture doc. Some we think are the wrong order of operations. Others — we disagree on principle and do the harder, more valuable version.

Our stack · AWS

We build on AWS and love it. Glue + S3 + Redshift/Athena + Lambda + Step Functions covers almost everything we ship. That said — there are a million ways to build this. Snowflake, BigQuery, Databricks, Fabric, or a laptop-sized setup on DuckDB all work. Pick the cloud you already pay for, the one your team knows, and go.

opinion, not religion

Not now

Vector DBs for business data

Your pipeline and your orders aren't a semantic-search problem. They're a joins-and-aggregates problem. Don't vectorize what you can query directly.

Not now

Real-time streaming

15-minute refreshes are real-time for 99% of business decisions. Streaming is real work you don't need yet. Ask if anyone has ever said "if only this were 30 seconds fresher."

Not now

Custom LLMs / fine-tuning

Fine-tuning fixes vocabulary, not data access. If your AI gives wrong answers, the fix is almost always upstream — in the marts, not the model.

We do build

A real data catalog — with agents that search it

A catalog isn't just docs. We build one that's queryable by agents — so AI can discover what exists, find the right mart, and connect the dots across systems. The catalog is the agent's memory of your business.

We do build

Custom connectors to the SaaS apps that matter

When the managed connector doesn't cover a custom object, a rate-limited endpoint, or the specific field your team lives in — we write the connector. It lands the data the way the business needs, not the way the vendor defaults to.

We do build

Agents that compose tools, not chatbots that answer questions

A single well-scoped agent, with tight tools and strong feedback loops, beats a five-agent swarm every time. Once one agent is solid, we add a second — carefully. Multi-agent is earned, not assumed.

11 Engineer FAQ

The questions every CTO emails after reading this. Answered up front.

Not exhaustive. The questions that come up in every evaluation call.

Q · Tooling

Why dbt and not SQLMesh / SQLFluff alone?

dbt is what the team you're hiring already knows. SQLMesh has a tighter execution model and a few features we like (virtual data environments, native Python models), and we'll use it on greenfield engagements where the client team is open to it. But on most builds, the consultant who picks up the project from us, the analyst who writes the next mart, and the platform engineer who debugs at 3am all know dbt. Pick the boring tool the team can run.

Q · Cloud

Why AWS specifically? Could you build this on Snowflake-only?

You can absolutely build this on Snowflake-only — or BigQuery-only, or Databricks-only — and we have. AWS is our default because S3 + Glue + Athena + Lambda gives us a lake-first architecture for cents on the dollar versus warehouse-first economics, and because most clients already have an AWS bill we can plug into.

If you're already deep on Snowflake, we'll build the marts in Snowflake and skip the lake split. If you're on Databricks, we'll lean into Delta Lake. The five build layers and two disciplines don't change; the SKUs do.

Q · Connectors

Why custom Python connectors over Fivetran / Airbyte / Stitch?

For the long tail — and where the actual value lives — managed connectors don't cover the surface. Custom objects, rate-limited or paginated quirks, that one field your team lives in, an internal API, an SFTP drop, a legacy DB. Fivetran covers the easy 60%. Custom Python covers the 40% that determines whether the AI can actually answer.

We're not religious about it: if Fivetran covers a source cleanly, we use it and pay the per-row fee. The moat is in writing the connector for the source that doesn't have a managed option.

Q · Agents

Why one tightly-scoped agent over a multi-agent swarm?

One well-scoped agent with a small toolset and tight feedback loops beats a five-agent swarm 95% of the time. Multi-agent introduces coordination overhead, error-propagation, debugging hell, and latency — for value that's usually a delta on top of "did the first agent already nail it." We add a second agent only when there's a clear failure mode the first one structurally can't handle (e.g. a planning agent that proposes a plan, a separate executor that runs it).

Q · Orchestration

Why Dagster over Airflow / Prefect?

Dagster's asset-based mental model fits how data actually flows — every asset has owners, freshness, lineage, metadata. Airflow's task-graph model treats data as a side-effect of tasks, which is backwards. Prefect 2/3 is closer to Dagster but lighter — fine for smaller setups. On bigger engagements where lineage and freshness contracts matter, Dagster wins.

If you're already running Airflow and have a team that knows it, we'll write Airflow DAGs. The orchestrator is the most replaceable layer in the stack.

Q · Real-time

Why not streaming / real-time?

15-minute refreshes are real-time for 99% of business decisions. Streaming adds Kafka, Flink/Spark Streaming, exactly-once semantics, and a permanent uplift in operational cost. We do streaming when there's a concrete decision a human or system needs to make in <1min. Otherwise: micro-batch every 5–15min, monitor freshness, ship.

Q · Lake format

Iceberg vs Delta vs Hudi?

Iceberg, in 2026, on AWS. Glue Catalog supports it natively, Athena queries it directly, and the ecosystem (dbt-iceberg, Trino, Snowflake/Databricks both read it) is the most portable. Delta is great if you're committed to Databricks; Hudi has the strongest streaming-upsert story but the smallest ecosystem. The honest answer: any of the three is fine — Iceberg is what we default to because the catalog story is cleanest on the cloud most of our clients use.

Q · Migrations

What if we already have an old warehouse / Looker setup we don't want to throw away?

You don't throw it away. The new foundation can read the old warehouse as a source — we don't ask anyone to rip out a working BI stack. The migration story we ship most often: existing Looker/Sigma/Tableau keeps running on the old marts; new AI workflows + new dashboards point at the new clean marts; over the next 6–12 months, dashboards drift toward the new foundation as people rebuild. The old stack stays useful while the new one earns its place.

That's the sauce. Now do something with it.

This is the shape of every pipeline we ship. Copy it. Disagree with it. Argue with us about it. The goal is that technical teams stop reinventing this stack from scratch every time — and start arguing about which layer to sharpen first.

If you'd rather we build it

4–6 weeks. One fixed price. You end up with the diagram above.

Same opinions, shipped on your stack. You'll be running the whole thing by week six — we document everything, and hand you the repo, the runbook, and the team that knows it.

Book a 30-min walkthrough →

Want more like this

I'm narrating each of these layers on YouTube.

One video per layer. Live code. Real war stories. The same opinions on this page, with more color. Subscribe if that's useful — no newsletter, no upsell, just the videos.

Subscribe on YouTube

Want the markdown source of this playbook to fork or annotate? Email me and I'll send it. Or stash this page — it’ll keep being updated as we ship more.