Enterno.io surveyed 500 AI engineers + scanned 10k+ open-source RAG projects (March 2026). 72% of apps use RAG in production (up from 43% in 2024). Hybrid search (dense + sparse) in 48% of setups. Reranking step added by 31% of apps. Standard stack: OpenAI embedding + pgvector / Qdrant + GPT-5 / Claude generation. Median RAG latency 1.2s (embed + search + LLM). Cost ~$0.001 per query.
Below: key findings, platform breakdown, implications, methodology, FAQ.
| Metric | Pass/Value | Median | p75 |
|---|---|---|---|
| Apps with RAG in production | 72% | — | — |
| Hybrid search (dense + sparse) | 48% | — | — |
| Reranking step | 31% | — | — |
| Median chunk size | 640 tokens | 640 | 1024 |
| Median top-k retrieval | 8 | 8 | 15 |
| Median RAG latency (end-to-end) | 1.2s | 1200 | 2,400 |
| Median cost per query | $0.001 | 0.001 | 0.005 |
| Apps with evaluation (Ragas etc) | 28% | — | — |
| Platform | Share | Detail | — |
|---|---|---|---|
| Customer support bots | 32% | RAG: 94% | — |
| Developer docs (AI search) | 21% | RAG: 88% | — |
| Enterprise Q&A (Confluence etc) | 18% | RAG: 100% | — |
| Code generation / search | 14% | RAG: 62% | — |
| Legal / medical Q&A | 10% | RAG: 100% + reranking | — |
Developer survey (n=500) + GitHub OSS project scan + LangChain/LlamaIndex package stats. March 2026.
pgvector: < 1M vectors, simplicity. Qdrant: > 1M, speed. Weaviate: native hybrid. For 90% of use cases — pgvector.
OpenAI text-embedding-3-small ($0.02/1M) — cheapest + good. text-embedding-3-large — best quality. Open: bge-m3 multilingual free.
Ragas: answer_relevancy, context_precision, faithfulness. LlamaIndex evals. Manual eval of 50+ examples.
LC: simpler code, higher cost + latency. RAG: cheaper, scales. Hybrid: RAG for retrieval + LC for reasoning.