Skip to content

Что такое CDC (Change Data Capture)

Коротко:

CDC (Change Data Capture) — pattern streaming database changes в real-time в downstream systems. Вместо периодически SELECT всей таблицы, читается transaction log (Postgres WAL, MySQL binlog). Tools: Debezium (most popular, Kafka Connect based), AWS DMS, Maxwell, Airbyte. Use cases: sync DB → search index (Elasticsearch), DB → cache (Redis), DB → data warehouse (Snowflake), event-driven arch.

Ниже: подробности, пример, смежные термины, FAQ.

Подробности

  • Log-based: read binlog/WAL (low overhead, zero SQL load)
  • Trigger-based: trigger on INSERT/UPDATE → write to outbox table (higher overhead)
  • Timestamp-based: polling column updated_at (miss deletes)
  • Debezium: Java-based, supports Postgres/MySQL/Mongo/Oracle/SQL Server
  • Typical output: Kafka topic per table с JSON events

Пример

# Debezium Kafka Connect config для Postgres
{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "pg.internal",
  "database.dbname": "mydb",
  "slot.name": "debezium_slot",
  "publication.name": "debezium_pub",
  "topic.prefix": "mydb"
}
# Output: Kafka topics mydb.public.users, mydb.public.orders, ...

Смежные термины

Больше по теме

Часто задаваемые вопросы

CDC vs Event Sourcing?

CDC — capture из existing DB (transparent for apps). ES — DB itself является event log (app writes events). Дополняющие, не synonyms.

Debezium production-ready?

Да, Red Hat backing. Netflix, Wepay, Shopify в production. Основной gotcha — schema changes require careful handling.

Alternative — polling?

Проще setup, но miss deletes, high DB load, latency. Debezium log-based — no SELECT на source.