Apache Avro — row-oriented binary data format, разработанный ASF (2009). Основа для serialization в Kafka, streaming systems. Key feature: schema-first (JSON-defined) + schema evolution (add/drop fields backward-compatible). Compact wire format, быстрый serialize/deserialize. Used by: Confluent Kafka Schema Registry, Apache Pulsar, Airbyte.
Ниже: подробности, пример, смежные термины, FAQ.
{
"type": "record",
"name": "User",
"fields": [
{"name": "id", "type": "long"},
{"name": "email", "type": "string"},
{"name": "age", "type": ["null", "int"], "default": null}
]
}
# Python (fastavro)
import fastavro
with open('users.avro', 'wb') as out:
fastavro.writer(out, schema, records)Avro: schema в message/file, dynamic. Protobuf: schema compiled в code. Protobuf более type-safe, но Avro лучше для streaming с changing schemas.
Avro: streaming (Kafka), один message = один record. Parquet: batch analytics, columnar scans. Complement.
Для Kafka prod — yes. Enforces schema evolution rules, prevents breaking changes. Confluent Cloud или self-host.