Enterno.io опросил 500 AI engineers + проанализировал 10k+ open-source RAG projects (март 2026). 72% apps используют RAG в production (выросло с 43% в 2024). Hybrid search (dense + sparse) в 48% setups. Reranking step добавлен 31% apps. Standard stack: OpenAI embedding + pgvector / Qdrant + GPT-5 / Claude generation. Median RAG latency 1.2s (embed + search + LLM). Cost ~$0.001 per query.
Ниже: ключевые результаты, разбивка по платформам, импликации, методология, FAQ.
| Метрика | Pass/значение | Медиана | p75 |
|---|---|---|---|
| Apps с RAG в production | 72% | — | — |
| Hybrid search (dense + sparse) | 48% | — | — |
| Reranking step | 31% | — | — |
| Median chunk size | 640 токенов | 640 | 1024 |
| Median top-k retrieval | 8 | 8 | 15 |
| Median RAG latency (end-to-end) | 1.2s | 1200 | 2,400 |
| Median cost per query | $0.001 | 0.001 | 0.005 |
| Apps с evaluation (Ragas etc) | 28% | — | — |
| Платформа | Доля | Деталь | — |
|---|---|---|---|
| Customer support bots | 32% | RAG: 94% | — |
| Developer docs (AI search) | 21% | RAG: 88% | — |
| Enterprise Q&A (Confluence etc) | 18% | RAG: 100% | — |
| Code generation / search | 14% | RAG: 62% | — |
| Legal / medical Q&A | 10% | RAG: 100% + reranking | — |
Developer survey (n=500) + GitHub OSS projects scan + LangChain/LlamaIndex package stats. Март 2026.
Pgvector: < 1M vectors, simplicity. Qdrant: > 1M, speed. Weaviate: hybrid native. Для 90% use cases — pgvector.
OpenAI text-embedding-3-small ($0.02/1M) — cheapest + good. text-embedding-3-large — best quality. Open: bge-m3 multilingual free.
Ragas: answer_relevancy, context_precision, faithfulness. LlamaIndex evals. Manual eval 50+ examples.
LC: simpler code, higher cost + latency. RAG: cheaper, scales. Hybrid: RAG для retrieval + LC для reasoning.