Skip to content

What is CDC (Change Data Capture)

Key idea:

CDC (Change Data Capture) — pattern streaming database changes in real-time to downstream systems. Instead of periodic SELECT over the whole table, read the transaction log (Postgres WAL, MySQL binlog). Tools: Debezium (most popular, Kafka Connect based), AWS DMS, Maxwell, Airbyte. Use cases: sync DB → search index (Elasticsearch), DB → cache (Redis), DB → data warehouse (Snowflake), event-driven arch.

Below: details, example, related terms, FAQ.

Details

  • Log-based: read binlog/WAL (low overhead, zero SQL load)
  • Trigger-based: trigger on INSERT/UPDATE → write to outbox table (higher overhead)
  • Timestamp-based: polling column updated_at (misses deletes)
  • Debezium: Java-based, supports Postgres/MySQL/Mongo/Oracle/SQL Server
  • Typical output: Kafka topic per table with JSON events

Example

# Debezium Kafka Connect config for Postgres
{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "pg.internal",
  "database.dbname": "mydb",
  "slot.name": "debezium_slot",
  "publication.name": "debezium_pub",
  "topic.prefix": "mydb"
}
# Output: Kafka topics mydb.public.users, mydb.public.orders, ...

Related Terms

Learn more

Frequently Asked Questions

CDC vs Event Sourcing?

CDC — capture from an existing DB (transparent to apps). ES — the DB itself is an event log (app writes events). Complementary, not synonyms.

Is Debezium production-ready?

Yes, Red Hat backing. Netflix, Wepay, Shopify in production. Main gotcha — schema changes require careful handling.

Alternative — polling?

Simpler setup, but misses deletes, high DB load, latency. Debezium log-based — no SELECT on source.