Observability — способность понять internal state системы из её external outputs. Three pillars: **metrics** (numbers over time — CPU, QPS), **logs** (events — errors, audit trail), **traces** (request path через distributed services). Отличие от monitoring: monitoring = знание known unknowns (CPU high). Observability = exploring unknown unknowns (новая bug type).
Ниже: подробности, пример, смежные термины, FAQ.
// OpenTelemetry instrumented code
const tracer = trace.getTracer('my-app');
const span = tracer.startSpan('db-query');
try {
await db.query('SELECT ...')
} finally {
span.end(); // exports trace to Jaeger/Tempo
}Monitoring = alerts на предetermined conditions. Observability = ad-hoc investigation через exploration. Overlap большой, но observability deeper.
Минимум: metrics + logs. Traces — когда есть microservices/distributed. В monolith начинаем с first two.
Small team: Datadog (SaaS, all-in-one) или Grafana Cloud (cheaper). Self-host: Prometheus + Loki + Tempo + Grafana (LGTM).