Skip to content

Apache Iceberg

Коротко:

Apache Iceberg — open table format для huge analytic tables. Added ACID transactions, schema evolution, time travel, partitioning flexibility к Parquet/ORC files на S3. Started at Netflix (2018), ASF top-level project. 2024 adoption: Snowflake Iceberg tables, BigQuery, Databricks, AWS S3 Tables native support. Competitor to Delta Lake (Databricks).

Ниже: подробности, пример, смежные термины, FAQ.

Попробовать бесплатно →

Подробности

  • Metadata layer: tracks data files + partitions + statistics
  • ACID: snapshot isolation, write-audit-publish pattern
  • Schema evolution: add/drop columns без rewriting data
  • Time travel: query as of specific snapshot / timestamp
  • Hidden partitioning: partition by transform (year(ts)), no user impact

Пример

-- Spark + Iceberg
CREATE TABLE prod.db.sales (
  id bigint,
  date date,
  amount decimal(18,2)
) USING iceberg
PARTITIONED BY (month(date));

-- Time travel
SELECT * FROM prod.db.sales
FOR TIMESTAMP AS OF '2026-03-01 00:00:00';

-- Schema evolution
ALTER TABLE prod.db.sales ADD COLUMN region string;

Смежные термины

Больше по теме

Часто задаваемые вопросы

Iceberg vs Delta Lake?

Iceberg: open (ASF), multi-engine (Spark, Trino, Flink, Snowflake). Delta: Databricks-led, Spark-first. 2025+ convergence (Delta Uniform reads Iceberg).

Query engines?

Apache Spark, Trino, Dremio, Snowflake, Starburst, Presto, DuckDB, AWS Athena, Google BigQuery. Почти все analytic engines 2025+.

Production reliable?

Yes — Netflix PB-scale с 2019. Apple, Expedia, Pinterest, Adobe — все используют. ACID delivered, schema evolution tested в prod.