Data Lake — storage for raw data (any format — JSON, CSV, Parquet, Avro) on object storage (S3, GCS, HDFS). Schema-on-read. Data Warehouse — structured data optimised for analytics queries (Snowflake, BigQuery, Redshift). Schema-on-write. Lakehouse (Databricks, Iceberg) — hybrid: raw storage + warehouse-style query engine.
Below: details, example, related terms, FAQ.
-- Snowflake (Data Warehouse) — structured SQL
SELECT date, country, SUM(revenue)
FROM orders
WHERE date >= '2026-01-01'
GROUP BY 1, 2;
-- Data Lake + Iceberg + Trino (Lakehouse) — same SQL
-- over Parquet files in S3
SELECT date, country, SUM(revenue)
FROM iceberg.prod.orders
WHERE date >= DATE '2026-01-01'
GROUP BY 1, 2;Lake: raw / semi-structured, petabytes, ML training data. Warehouse: structured, BI dashboards, fast queries. Lakehouse — universal compromise.
Lake S3: $0.02/GB/mo. Warehouse Snowflake compute: $2-4/credit. For 10 TB: Lake ~$200 storage, Warehouse ~$4k storage+compute monthly.
Lake without governance = swamp. Solutions: data catalog (AWS Glue, Unity Catalog), Iceberg for ACID transactions, schema evolution.