RAG chatbot in 30 minutes: (1) Chunk documents into 500-1000 tokens, (2) Embed via OpenAI text-embedding-3-small ($0.02/1M), (3) Store in Qdrant (Rust open-source), (4) User query → embed → similaritySearch top-5 chunks, (5) Inject into prompt → Claude/GPT-5 generates answer with sources. Stack: Node.js + LangChain.js + Qdrant. Cost: ~$0.001 per query.
Below: step-by-step, working examples, common pitfalls, FAQ.
docker run -p 6333:6333 qdrant/qdrant| Scenario | Config |
|---|---|
| LangChain.js full pipeline | import { QdrantVectorStore } from '@langchain/qdrant';
import { OpenAIEmbeddings } from '@langchain/openai';
import { ChatOpenAI } from '@langchain/openai';
const store = await QdrantVectorStore.fromDocuments(
chunks, new OpenAIEmbeddings(),
{ url: 'http://qdrant:6333', collectionName: 'docs' }
);
const docs = await store.similaritySearch(query, 5);
const llm = new ChatOpenAI({ model: 'gpt-5' });
const answer = await llm.invoke([
{ role: 'system', content: `Context: ${docs.join('\n')}` },
{ role: 'user', content: query }
]); |
| Qdrant HNSW tuning | PUT /collections/docs
{"vectors": {"size": 1536, "distance": "Cosine"},
"hnsw_config": {"m": 16, "ef_construct": 100}} |
| Python (LlamaIndex) | from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader('./docs').load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query('Your question') |
| Chunking strategy | from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=800, chunk_overlap=100,
separators=['\n\n', '\n', '.', ' ']
) |
| Hybrid search (dense + sparse) | # Qdrant: create named vectors (dense + sparse BM25)
# Then batch search with weights |
100 small docs already work. 10k+ — rerank needed for quality. 100k+ — shard vector DB, hybrid search.
Embeddings: $0.02/1M tokens. LLM call: $0.15-15/1M. For 1k queries/day ~$0.50-5.
Claude Opus 4.7 — best for long context. GPT-5 — balanced. Gemini 2.5 — 2M context. Llama 3 70B self-host — free.
Ragas (Python) measures context_precision, context_recall, answer_relevancy. Set thresholds in CI.