Log aggregation — practice of collecting logs from multiple services into a central searchable store. Reason: grepping across 50 servers doesn't scale. Stack options: ELK (Elasticsearch + Logstash + Kibana) — powerful but expensive, Loki (Grafana, cheaper), Splunk (enterprise $$$), CloudWatch/DataDog Logs (SaaS). Critical features: search, alerts, retention, correlation with traces.
Below: details, example, related terms, FAQ.
# Fluent Bit config
[INPUT]
Name tail
Path /var/log/nginx/access.log
[OUTPUT]
Name loki
Host grafana-loki:3100
Labels host=$HOSTNAME,service=nginxELK: full-text indexed, fast search, expensive at scale. Loki: Prometheus-like labels + grep at query time, 10× cheaper. For high volume — Loki. For complex search — ELK.
Sampling (drop 90% INFO logs), log level discipline (INFO/WARN/ERROR not DEBUG in prod), TTL (< 30 days hot).
Ingestion in nearest region + async replication. Or separate stores + federated search (Loki federation, CloudWatch cross-account).