L3 · Model — Pipelines for AI playbook

Model — storage + transform together

Model — start with a lake. The schema work that turns 6 systems into one row.

The decision everyone wants to overthink. L3 Model is where the lake gets structured and queryable. It has two halves we cover in order: storage (the lake's tiers and zones, immediately below) and transform (SQL in git with tests, further down). Together they're how the lake becomes a single source of truth.

Our take

Most companies don't need a warehouse. They need a lake — S3 + Glue + Athena. You can run the whole business from there for a long time. A warehouse is what you reach for when you need hotter, faster answers or the query costs on the lake stop making sense.

Pipelines for AI by Getting Automated

FREE RESOURCE .xlsx · 7 sheets · ~38 KB

L3 MODEL DESIGN TOOLKIT

Design your marts before writing the SQL.

The schema, storage, and test decisions we use with clients on Pipelines engagements. Plain-English explanations, 10 industry mart examples, the full dbt-style test catalog, naming conventions, and a per-mart production-readiness scorecard. Hand it to any vendor before they quote you a build.

WHAT'S INSIDE — 7 SHEETS

01Start here. Plain-English orientation. What's a mart, a grain, dbt, etc. Read first.
02Storage zones. 18 pre-filled examples — what goes in raw / staging / marts / hot cache.
03Mart definitions. 10 examples (DTC + SaaS + services + healthcare) + 30 blank rows.
04Test catalog. 14 dbt-style tests with when-to-use guidance and example columns.
05Naming rules. Table prefixes, column conventions, what to avoid.
06Schema review. Per-mart Y/N checklist with auto-calculated production-readiness score.
07Glossary. Every term defined plainly — share with anyone on the team.

No newsletter spam. Occasional ping when the playbook gets a real update. Reply “remove” and you’re out.

Cover 01 · Start here 02 · Storage zones 03 · Mart definitions 04 · Tests 05 · Naming 06 · Schema review 07 · Glossary

Mart

Grain

Refresh

Score

dim_customer

per customer

customer_id

5 min

7/7

fct_orders

per order

order_id

real-time

7/7

fct_subs_events

per event

event_id

real-time

7/7

agg_daily_rev

per day, brand

(date, brand)

1 day

6/7

dim_product

per SKU

sku

on change

3/7

fct_returns

per return

return_id

1 hour

4/7

Ready Sheet 03 · Mart definitions DTC e-commerce · 6 of 10 marts shown · score from Sheet 06

First, the words Lake vs warehouse

Two ways to keep your data. One is a garage. One is a filing cabinet.

A data lake is like a big garage — you throw files in (JSON, CSV, logs, whatever), label the shelves, and dig things out when you need them. Cheap to keep stuff there. Slower to find a specific thing. You only pay when you go looking.

A data warehouse is like a filing cabinet — everything pre-sorted, indexed, and formatted. Fast to pull any single thing. But you pay rent on the cabinet whether you open it today or not, and you have to convert every document before it goes in.

Lake · garage

Files in folders, on cheap shelves.

Good at: keeping everything, cheaply. Handling weird shapes (JSON, logs, images). Paying only when queried.

Less good at: sub-second dashboards. High-frequency querying.

Warehouse · filing cabinet

Rows & columns, pre-sorted, indexed.

Good at: fast dashboards. Frequent BI queries. Sub-second responses your CEO wants.

Less good at: keeping cheap raw exhaust (logs, clicks, raw events) at scale.

Upgrade when…

Dashboards take > 3 seconds, scan bills overtake a warehouse subscription, or the AI tool needs sub-second answers. Then you add a warehouse on top of the lake — not instead of it. The lake is still the source of truth.

cheap & slow mixed hot & fast specialist

Stage A · default

Data lake. S3 + Glue + Athena.

Parquet in object storage. Schema-on-read via Glue catalog. Query with Athena. Pay per scan, not per uptime. Start here; stay here longer than you think you should.

S3AWS GlueAthenaIceberg / Parquet

Stage B · when you need hot

Add a warehouse for the queries that can't wait.

Dashboards timing out? AI tools needing sub-second responses? Athena scan bills beating warehouse subscriptions? Now you add Snowflake / BigQuery / Redshift — for the hot marts. Lake stays as the source of truth.

SnowflakeBigQueryRedshiftDuckDB / MotherDuck

Stage C · 10B+ rows

You have a storage problem. Hire for it.

Partitioning strategy, cost governance, hybrid lake/warehouse patterns, Iceberg table maintenance — this is a dedicated role, not a side-quest.

dedicated data eng

Anti-pattern

"We'll just query prod Postgres." You'll take down prod. Or you'll build a read replica and slowly reinvent a warehouse on top of it with worse query ergonomics. Either way, you end up here. Skip ahead.

Inside the storage Raw → Staging → Marts

Think of your data like a restaurant kitchen. Three stations. One direction.

Raw is the delivery dock — crates of ingredients arriving from suppliers, exactly as they were shipped. Nobody cooks from here. Nobody edits the crates. If something's wrong, you can trace it back to the supplier.

Staging is the prep station — peel, chop, rinse, label. Every item gets cleaned and standardized. Same shapes, same units, same names. The menu doesn't happen here yet; this is just getting ingredients ready to be cooked with.

Marts is the plated dish — the thing the customer (your dashboard, your AI, your finance team) actually eats. One row per customer. One row per order. Pre-joined, pre-computed, business-ready. This is the only thing consumers ever read.

Why this order

If someone "cleans up" raw directly, your kitchen is now non-reproducible — you can't re-cook a dish when a supplier changes. Staging is the one place mess gets resolved, and marts is the one place the business reads from. Three zones. One direction. No shortcuts.

Structure

Three zones, every time: `raw` → `staging` → `marts`.

raw mirrors sources exactly. staging cleans and types, 1:1 with source tables. marts models the business — one row per customer, one row per order, pre-joined, pre-aggregated where it matters.

Consumers only ever read marts. Staging is for modelers. Raw is for debugging and disaster recovery.

figRaw → Staging → Marts · what each zone actually contains

debug / disaster recovery modeler-facing only read by AI + dashboards

DefaultLake-first (S3 + Iceberg + Athena)

WhenWarehouse-first (Snowflake / BigQuery)

Storage cost — pennies per GB-month. Pay for the data you keep, not the queries you might run.

Storage cost — bundled into compute pricing. Cheap when small; not when you keep five years of clickstream.

Query cost — pay-per-scan (Athena ≈ $5/TB scanned). With partitioning, most queries are cents.

Query cost — credits for warehouse-uptime. Predictable monthly bill; tougher when usage spikes.

Pick when: You’re starting cold. You want low fixed cost. You’ll add a warehouse for hot marts when query latency hurts.

Pick when: You already have one and a team that knows it. Sub-second BI matters. You don’t want to operate two storage layers.

Artifact s3://your-co-data/ (lake layout) ~12 lines · tree

# Bucket layout — your-co-data
s3://your-co-data/
├── raw/              <- as-is from source. partitioned by ingest date.
│   ├── shopify/orders/dt=2026-04-30/
│   ├── netsuite/transactions/dt=2026-04-30/
│   └── klaviyo/events/dt=2026-04-30/
├── staging/          <- typed, deduped, 1:1 with source. dbt-managed.
│   ├── stg_shopify__orders/
│   └── stg_netsuite__txns/
└── marts/            <- business concepts. customers, orders, ad_spend.
    ├── core/
    └── marketing/

Load-bearingRaw is append-only — we never transform in place. If a model is wrong, fix the model and rebuild from raw. Touching raw means giving up your safety net the first time something breaks.

L3 · Model continues — Transform

SQL in git. With tests.

Storage (above) gives you the lake's structure. Transform (below) gives you its queryable shape. Together they're L3 Model. This is where “data engineering” becomes “software engineering.” Once you've done it this way, you can't go back.

Our take

Every model is a SELECT. Every SELECT is reviewed, tested, and versioned.

First, the words Transform tool

A transform tool turns your collection of SQL queries into software.

Without one: SQL lives in ten places. A query in a dashboard, a saved query in someone's editor, a stored procedure in the database, a script in a notebook. When a number changes, nobody can tell you why, because nobody tracks which query produced it.

With one (dbt, SQLMesh, etc.): every query is a file in a git repo. The tool reads all the files, figures out what depends on what, runs them in the right order, runs tests against each result ("this ID column is never null," "this table is unique by customer"), and fails the build if anything is off. Same workflow as any software codebase — pull requests, reviews, CI checks.

It gives you three superpowers: dependency resolution (model B depends on model A — run A first), tests (the computer checks the work), and versioning (git-blame every number).

Category is mature

Any tool that does those three things is fine. dbt is the default. Don't spend a month picking.

Rule 02

Use a transform tool that compiles SQL and runs tests.

You need three things: dependency resolution (model B depends on model A), tests (this column is never null, this ID is unique), and CI (broken SQL fails PR before prod). Any tool that does those three is fine. The category is mature; don't litigate it.

Pattern

Name columns the way an LLM reads them, not the way SQL devs type them.

Avoid: ltv_amt_usd_ttm. Compact, efficient, impenetrable to anyone who wasn't in the room when it was named. The LLM guesses, and guesses wrong.

Prefer: lifetime_revenue_usd. Plain English. Unit in the name. No guessing. Works whether the reader is an analyst, an agent, or yourself in six months.

Why tests? Data tests

Data tests are assertions the computer checks every time the pipeline runs.

Same idea as software unit tests, but about your data. You declare a rule — "every row in dim_customer has a non-null customer_id" or "order_id is unique in fct_orders" — and the test runs against the real output after every build. If it fails, the build fails, and the bad table never gets published to dashboards or AI.

Five tests catch 90% of "the numbers look weird today" incidents: not_null, unique, accepted_values (is this enum one of the expected options), relationships (every foreign key exists in the parent table), and freshness (this table was updated in the last N hours).

Opinion

Write tests like you'd write any other tests.

For every mart: uniqueness on the grain, non-null on join keys, referential integrity against staging, accepted values on enums. Adding a new column? Add a test. It takes thirty seconds and saves a production outage.

Example

What a mart actually looks like.

One row per customer. Every metric the business cares about, pre-computed. This is the table both your dashboards and your AI read from.

figThe dependency graph a transform tool walks · with tests

every box has tests before it ships break a test → break the build

▾ marts/dim_customer.sql — abbreviated

-- one row per customer, joined to every relevant source
with orders as (
  select
    customer_id,
    count(*)                as lifetime_orders,
    sum(gross_amount)       as lifetime_revenue,
    max(ordered_at)         as last_order_at
  from staging.orders
  group by 1
),

tickets as (
  select customer_id,
    count(*)                as lifetime_tickets,
    count(*) filter (where status = 'open') as open_tickets
  from staging.support_tickets
  group by 1
)

select
  c.customer_id,
  c.email,
  c.signed_up_at,
  coalesce(o.lifetime_revenue, 0) as lifetime_revenue,
  coalesce(o.lifetime_orders,  0) as lifetime_orders,
  o.last_order_at,
  coalesce(t.lifetime_tickets, 0) as lifetime_tickets,
  coalesce(t.open_tickets, 0)     as open_tickets,
  -- business-defined status, not a source-system field
  case
    when o.last_order_at > current_date - interval '90 days' then 'active'
    when o.last_order_at > current_date - interval '365 days' then 'lapsing'
    else 'dormant'
  end as lifecycle_stage

from staging.customers c
left join orders   o using (customer_id)
left join tickets  t using (customer_id)

Heuristic

Name columns like a human.

lifetime_revenue, not ltv_amt_usd. The LLM on the other end is going to read these names and reason about them. Good names reduce hallucinations more than any prompt engineering.

Artifact models/staging/stg_shopify__orders.sql + _schema.yml ~30 lines · dbt

-- models/staging/stg_shopify__orders.sql
{{ config(materialized='incremental', unique_key='order_id') }}

SELECT
  id::TEXT                         AS order_id,
  customer_id::TEXT                AS customer_id,
  total_price::NUMERIC             AS total_amount,
  total_discounts::NUMERIC         AS discount_amount,
  total_tax::NUMERIC               AS tax_amount,
  cancelled_at::TIMESTAMP          AS cancelled_at,
  COALESCE(cancelled_at, processed_at)::TIMESTAMP AS effective_at,
  _ingested_at::TIMESTAMP          AS ingested_at
FROM {{ source('raw_shopify', 'orders') }}
{% if is_incremental() %}
  WHERE _ingested_at > (SELECT MAX(ingested_at) FROM {{ this }})
{% endif %}

# models/staging/_schema.yml
models:
  - name: stg_shopify__orders
    columns:
      - name: order_id
        tests: [unique, not_null]
      - name: total_amount
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0
              max_value: 1000000

Load-bearingTests are non-optional. Every column that drives a business decision gets a not_null. Every monetary column gets a range. If a model has no tests, it doesn’t merge — that's the rule.