Apache Avro — row-oriented binary data format, developed by ASF (2009). Foundation for serialization in Kafka and streaming systems. Key feature: schema-first (JSON-defined) + schema evolution (add/drop fields backward-compatible). Compact wire format, fast serialize/deserialize. Used by: Confluent Kafka Schema Registry, Apache Pulsar, Airbyte.
Below: details, example, related terms, FAQ.
{
"type": "record",
"name": "User",
"fields": [
{"name": "id", "type": "long"},
{"name": "email", "type": "string"},
{"name": "age", "type": ["null", "int"], "default": null}
]
}
# Python (fastavro)
import fastavro
with open('users.avro', 'wb') as out:
fastavro.writer(out, schema, records)Avro: schema in message/file, dynamic. Protobuf: schema compiled into code. Protobuf more type-safe, but Avro better for streaming with changing schemas.
Avro: streaming (Kafka), one message = one record. Parquet: batch analytics, columnar scans. Complementary.
For Kafka prod — yes. Enforces schema evolution rules, prevents breaking changes. Confluent Cloud or self-host.