Skip to content

dbt (data build tool)

Коротко:

dbt — tool для transforming data в warehouse через SQL. Paradigm: define models как SQL select statements, dbt compiles DAG, материализует в таблицы/views, runs tests, generates docs. Core того, что называется "modern data stack". Open-source dbt-core + SaaS dbt Cloud. Used by: Airbnb, Monzo, HelloFresh, 1000s startup data teams.

Ниже: подробности, пример, смежные термины, FAQ.

Попробовать бесплатно →

Подробности

  • Models: .sql files, one per table/view
  • ref(): lineage-aware table references → auto DAG
  • Materializations: view / table / incremental / snapshot
  • Tests: not_null, unique, accepted_values, custom
  • Docs: auto-generated с column descriptions + lineage graph
  • Adapters: Snowflake, BigQuery, Redshift, Postgres, DuckDB, ClickHouse

Пример

-- models/orders_summary.sql
{{ config(materialized='table') }}

SELECT
  DATE_TRUNC('day', order_date) AS day,
  COUNT(*) AS orders,
  SUM(amount) AS revenue
FROM {{ ref('orders') }}
WHERE status = 'completed'
GROUP BY 1

-- schema.yml
models:
  - name: orders_summary
    columns:
      - name: day
        tests: [not_null, unique]

Смежные термины

Больше по теме

Часто задаваемые вопросы

dbt Core vs Cloud?

Core: free, CLI, self-host. Cloud: SaaS + web IDE + scheduling + docs hosting + CI/CD, $100-200/dev/мес. Для small teams — core + Airflow; enterprise — Cloud.

Alternatives?

SQLMesh (newer, Python-based), Apache Airflow tasks, Dataform (Google). Для non-SQL ELT: Fivetran/Airbyte + Python.

Incremental models?

materialized="incremental" + unique_key — dbt детектирует changed rows, runs INSERT/UPDATE только для них. Huge cost savings vs full refresh.