Monitoring RAG Pipelines

Anatoly Oshmanovsky

DevOps

Monitoring RAG Pipelines

Published: 22.06.2026 · ~3 min · 32 views

Short answer. A RAG pipeline is a sequence of components: an embedding service, a vector DB, a retriever and an LLM API документацию. A failure or slowdown in any stage spoils answers. Monitoring RAG comes down to checking each component's availability over HTTP, watching stage latency, and a heartbeat for background indexing. enterno.io provides the external availability layer from RU, EU and US, without replacing output-quality eval.

RAG components that fail

In a typical RAG, every stage can fail:

The embedding service — unavailable or slow;
The vector DB (Qdrant, Weaviate, pgvector, etc.) — down or degrading on latency;
The retriever / search API — returns errors or empties;
The LLM API — 429/5xx, rising response time;
Indexing — a background job stops updating the store.

Three layers of RAG monitoring

Component availability — HTTP health checks of each service.
Stage latency — where time is actually lost.
Index freshness — a heartbeat for background indexing.

The trickiest RAG problem isn't a crash but silent degradation: the vector DB responds, but slowly, and the user gets a delayed answer or one built on stale documents.

Component	Typical failure	How to monitor
Embedding service	Unavailable or slow	HTTP monitor
Vector DB	Down, rising latency	HTTP monitor + latency
Retriever / search API	Errors or empty results	HTTP monitor
LLM API	429/5xx, slow response	HTTP monitor
Indexing	Didn't refresh the store in time	Heartbeat

Health-checking components

Set up a check for each critical pipeline service:

# Vector DB
curl -o /dev/null -s -w "vectordb %{http_code} %{time_total}s\n" \
  https://vectordb.internal.example.com/healthz

# Retriever / search API
curl -o /dev/null -s -w "retriever %{http_code} %{time_total}s\n" \
  https://retriever.example.com/health

# LLM API
curl -o /dev/null -s -w "llm %{http_code} %{time_total}s\n" \
  https://api.example-llm.com/v1/health

Add each check to enterno.io as a separate HTTP monitor — so you instantly see which component took the pipeline down.

Heartbeat for indexing

Background reindexing should signal completion. Have the job Ping a heartbeat after a successful update:

# After a successful reindex
curl -fsS https://enterno.io/api/heartbeat/INDEX_TOKEN \
  -o /dev/null && echo "index heartbeat sent"

If indexing doesn't run on time, you'll learn about a stale index before users start getting outdated answers.

What to monitor beyond availability

Stage latency — separate monitors on the retriever and LLM API.
SSL and DNS of the pipeline's external services.
Cost — LLM tokens for answer generation (log in your own tracing).

The line: availability, not quality

Let's be honest: enterno.io doesn't score retrieval relevance or compute RAG quality metrics (retrieval precision/recall, faithfulness). That needs eval tools. enterno.io answers "is each pipeline component alive and how fast does it respond" — and that's the layer that most often breaks production.

FAQ

Does enterno.io evaluate retrieval quality?

No, that's a job for eval tools. enterno.io covers component availability and latency plus an indexing heartbeat.

How do I tell which stage is slow?

Create a separate monitor per service and compare latency — the bottleneck becomes obvious.

What about a stale index?

A background-indexing heartbeat: alert if reindexing didn't run within the window.

Can I monitor from Russia?

Yes, checks run from ru-msk, with EU and US added on paid tiers.

Cover the pipeline: create HTTP checks for components on the monitors page and connect heartbeat for indexing.

Check your website right now

Check your site →

Monitoring RAG Pipelines

RAG components that fail

Three layers of RAG monitoring

Health-checking components

Heartbeat for indexing

What to monitor beyond availability

The line: availability, not quality

FAQ

Does enterno.io evaluate retrieval quality?

How do I tell which stage is slow?

What about a stale index?

Can I monitor from Russia?

Start monitoring for free