Observability — the ability to understand a system's internal state from its external outputs. Three pillars: **metrics** (numbers over time — CPU, QPS), **logs** (events — errors, audit trail), **traces** (request path across distributed services). Difference from monitoring: monitoring = knowing known unknowns (CPU high). Observability = exploring unknown unknowns (new bug type).
Below: details, example, related terms, FAQ.
// OpenTelemetry instrumented code
const tracer = trace.getTracer('my-app');
const span = tracer.startSpan('db-query');
try {
await db.query('SELECT ...')
} finally {
span.end(); // exports trace to Jaeger/Tempo
}Monitoring = alerts on predetermined conditions. Observability = ad-hoc investigation via exploration. Overlap is big but observability goes deeper.
Minimum: metrics + logs. Traces — when you have microservices/distributed. In a monolith start with the first two.
Small team: Datadog (SaaS, all-in-one) or Grafana Cloud (cheaper). Self-host: Prometheus + Loki + Tempo + Grafana (LGTM).