Skip to content

dbt (Data Build Tool)

Key idea:

dbt — a tool for transforming data in a warehouse via SQL. Paradigm: define models as SQL select statements, dbt compiles the DAG, materialises into tables/views, runs tests, generates docs. The core of what is called the "modern data stack". Open-source dbt-core + SaaS dbt Cloud. Used by: Airbnb, Monzo, HelloFresh, thousands of startup data teams.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • Models: .sql files, one per table/view
  • ref(): lineage-aware table references → auto DAG
  • Materialisations: view / table / incremental / snapshot
  • Tests: not_null, unique, accepted_values, custom
  • Docs: auto-generated with column descriptions + lineage graph
  • Adapters: Snowflake, BigQuery, Redshift, Postgres, DuckDB, ClickHouse

Example

-- models/orders_summary.sql
{{ config(materialized='table') }}

SELECT
  DATE_TRUNC('day', order_date) AS day,
  COUNT(*) AS orders,
  SUM(amount) AS revenue
FROM {{ ref('orders') }}
WHERE status = 'completed'
GROUP BY 1

-- schema.yml
models:
  - name: orders_summary
    columns:
      - name: day
        tests: [not_null, unique]

Related Terms

Learn more

Frequently Asked Questions

dbt Core vs Cloud?

Core: free, CLI, self-host. Cloud: SaaS + web IDE + scheduling + docs hosting + CI/CD, $100-200/dev/mo. For small teams — core + Airflow; enterprise — Cloud.

Alternatives?

SQLMesh (newer, Python-based), Apache Airflow tasks, Dataform (Google). For non-SQL ELT: Fivetran/Airbyte + Python.

Incremental models?

materialized="incremental" + unique_key — dbt detects changed rows, runs INSERT/UPDATE only for them. Huge cost savings vs full refresh.