Skip to content

Apache Avro

Key idea:

Apache Avro — row-oriented binary data format, developed by ASF (2009). Foundation for serialization in Kafka and streaming systems. Key feature: schema-first (JSON-defined) + schema evolution (add/drop fields backward-compatible). Compact wire format, fast serialize/deserialize. Used by: Confluent Kafka Schema Registry, Apache Pulsar, Airbyte.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • Row-oriented (vs Parquet columnar) — better for streaming, individual row writes
  • Schema: JSON-defined, embedded in every file / Kafka message
  • Schema Registry: shared central schemas for Kafka (Confluent standard)
  • Evolution: add optional fields backward-compatible
  • Languages: Java, Python, C++, Go, Rust, JavaScript

Example

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "id", "type": "long"},
    {"name": "email", "type": "string"},
    {"name": "age", "type": ["null", "int"], "default": null}
  ]
}

# Python (fastavro)
import fastavro
with open('users.avro', 'wb') as out:
  fastavro.writer(out, schema, records)

Related Terms

Learn more

Frequently Asked Questions

Avro vs Protobuf?

Avro: schema in message/file, dynamic. Protobuf: schema compiled into code. Protobuf more type-safe, but Avro better for streaming with changing schemas.

When Avro vs Parquet?

Avro: streaming (Kafka), one message = one record. Parquet: batch analytics, columnar scans. Complementary.

Need Schema Registry?

For Kafka prod — yes. Enforces schema evolution rules, prevents breaking changes. Confluent Cloud or self-host.