Build layer Layer 5 of 5 build layers ~10 min read
L5
Build layer · the payoff

AI interface — tools, not SQL. The payoff layer.

The fifth and final build layer. Where the AI actually queries the lake. The reflex is to build a chatbot — almost always wrong. The pattern that works for production: structured AI features with the request envelope state machine. Where most DIY attempts go sideways.

Our take

Expose the modeled lake through a small set of typed tools. The LLM calls them. Never writes SQL. Wrap every irreversible action in a human-approval gate. Track cost per call from day one.

First, the words MCP & tool-calling

Tools are the menu. The LLM picks from the menu. It doesn't write its own recipes.

Tool-calling: instead of letting the LLM write SQL against raw tables (where it can hallucinate wrong joins or miss a filter and report "total revenue" for a subset), you give it a short list of named functionsget_revenue_by_period(start, end), list_at_risk_customers(threshold), compare_periods(a, b). The model picks one, fills in the arguments, and your code runs the actual query.

MCP — Model Context Protocol — is the standardized way to describe those tools to the model. Same format whether you're plugging into Claude, ChatGPT, Cursor, or a custom agent. Define the tool once, it works everywhere.

Behind each tool lives a semantic layer: a dictionary of named metrics with exactly one definition each. When the tool says "revenue," there's exactly one SQL behind it — not six slightly-different versions scattered across dashboards.

Why not just let the LLM write SQL?
Because it's a demo, not a product. On a real warehouse with 200 tables, the LLM guesses the join, picks the wrong date column, forgets a filter — and confidently returns a wrong number. Tools constrain the surface area to the queries you've already validated. Boring. Reliable. Shippable.
figOne tool call · user question to grounded answer
USER LLM TOOL (MCP) SEMANTIC LAYER MARTS "what's my open pipeline?" get_pipeline({ stage: "qualified" }) metric("pipeline").where({...}) SELECT … FROM fct_pipeline rows structured + summary grounded narration (quoted from result)
LLM never writes SQL every call logged · auditable forever row-level access enforced in the tool, not the model
Rule 05

Think in verbs, not in tables.

Good tools are verbs the business uses: get_pipeline, list_at_risk_customers, compare_periods. Bad tools are thin wrappers around SQL: query_warehouse(sql). The first shape is constrained and auditable; the second is an LLM with a loaded gun.

From a recent build

One tool — find_stuck_deals — replaced a year of Slack back-and-forth.

One verb. Takes a stage and a staleness threshold. Returns deals, summary, and suggested next steps. Replaced a channel where people asked the sales ops lead the same question twice a day for a year.

Opinion

Every tool returns structured results with a natural-language summary.

The structured part is for downstream tools and renderers. The summary is what the LLM quotes back to the user. Forcing both keeps answers grounded — the LLM has to reconcile what it wants to say with what the data actually returned.

Example

A tool definition, end to end.

One tool. Strongly typed inputs. A query it actually runs (against marts, through the semantic layer). A result shape the model can narrate.

tools/get_pipeline.ts — shape, not vendor-specific
// Tool surface the LLM sees — names and types are the prompt.
export const getPipeline = defineTool({
  name: "get_pipeline",
  description:
    "Return open sales pipeline, optionally filtered by owner, " +
    "stage, or close-date range. Returns total value, deal count, " +
    "weighted forecast, and a short human-readable summary.",

  input: z.object({
    owner:     z.string().optional(),
    stage:     z.enum(["qualified", "proposal", "negotiation"]).optional(),
    close_by:  z.string().datetime().optional(),
  }),

  async run({ owner, stage, close_by }, ctx) {
    // 1. Enforce row-level access for the calling user.
    const scoped = await scopeToUser(ctx.user);

    // 2. Call the semantic layer — NOT raw SQL.
    const rows = await metric("pipeline").by("deal").where({
      owner, stage,
      close_date_lte: close_by,
      account_id_in:  scoped.accounts,
    });

    // 3. Shape it for both tool-chaining AND narration.
    return {
      structured: {
        total_value:        sum(rows, "amount"),
        deal_count:         rows.length,
        weighted_forecast:  sum(rows, r => r.amount * r.stage_prob),
        deals:              rows,
      },
      summary:
        `${rows.length} open deals worth ` +
        `$${fmt(sum(rows, "amount"))}. ` +
        `Weighted forecast: $${fmt(weighted)}.`,
    };
  },
});
Heuristic

Ship 8–15 tools, not 80.

Every tool you add makes the model slower and more confused. A small, well-named set of verbs beats an exhaustive surface. When in doubt, compose existing tools instead of adding new ones.

Opinion

Log every tool call. Forever.

Input, output, user, timestamp. This is your audit trail, your debugging log, and your eval dataset — all at once. Tool-call logs are the single most valuable artifact a pipeline produces after six months. Keep them.

Artifact tools/metrics.py ~22 lines · Python · MCP tool
# tools/metrics.py — what the agent sees
from typing import Literal
from semantic_layer import resolve_metric

def query_metric(
    metric: Literal["revenue", "orders", "cac", "ltv", "active_customers"],
    grain: Literal["day", "week", "month"],
    start_date: str,            # YYYY-MM-DD
    end_date: str,
    segment: str | None = None  # e.g. 'channel:paid_search'
) -> dict:
    """Query a defined business metric over a time window.

    Returns: {"metric", "grain", "rows": [{"period", "value", "source_tables": [...]}]}
    Every row cites the source tables that produced it.
    """
    return resolve_metric(metric, grain, start_date, end_date, segment).to_dict()

Load-bearingOne tightly-scoped tool that only queries pre-defined metrics beats letting the agent write raw SQL every time. The Literal enum prevents hallucinated metric names; the response shape is fixed; source-table citations are baked into the contract — not optional, not after-the-fact.