Skip to content

What is Log Aggregation

Key idea:

Log aggregation — practice of collecting logs from multiple services into a central searchable store. Reason: grepping across 50 servers doesn't scale. Stack options: ELK (Elasticsearch + Logstash + Kibana) — powerful but expensive, Loki (Grafana, cheaper), Splunk (enterprise $$$), CloudWatch/DataDog Logs (SaaS). Critical features: search, alerts, retention, correlation with traces.

Below: details, example, related terms, FAQ.

Details

  • Collector (node level): Filebeat, Fluent Bit, Vector, Promtail
  • Pipeline: parsing (JSON, multiline), enrichment (host, trace_id), routing
  • Storage: Elasticsearch (indexed, $$), Loki (chunks, $), S3 + Athena (archive, cheapest)
  • Retention: hot (7d, fast) + warm (30d, slower) + cold (1y+, S3)
  • Cost volatility: DEBUG logs in prod → 10× spend. Log level discipline critical

Example

# Fluent Bit config
[INPUT]
    Name tail
    Path /var/log/nginx/access.log

[OUTPUT]
    Name loki
    Host grafana-loki:3100
    Labels host=$HOSTNAME,service=nginx

Related Terms

Learn more

Frequently Asked Questions

ELK vs Loki?

ELK: full-text indexed, fast search, expensive at scale. Loki: Prometheus-like labels + grep at query time, 10× cheaper. For high volume — Loki. For complex search — ELK.

Cost control?

Sampling (drop 90% INFO logs), log level discipline (INFO/WARN/ERROR not DEBUG in prod), TTL (< 30 days hot).

Centralise multi-region?

Ingestion in nearest region + async replication. Or separate stores + federated search (Loki federation, CloudWatch cross-account).