Skip to content

How to Add Semantic Search to a Site

Key idea:

Semantic search enhances Ctrl+F / keyword search: (1) Embed all content (articles, products) at index time — OpenAI text-embedding-3-small $0.02/1M, (2) Store in Qdrant/pgvector, (3) User query → embed → ANN search → top-5 results, (4) UI: Algolia DocSearch-like + "did you mean" + snippet. Cost: 10k articles → $0.50 indexing + $0.001 per query.

Below: step-by-step, working examples, common pitfalls, FAQ.

Try it now — free →

Step-by-Step Setup

  1. Extract content: title + body → 500-1000 word chunks
  2. Batch embed: OpenAI API accepts up to 100 inputs
  3. Store in Qdrant with payload (id, title, url, snippet)
  4. Create search endpoint: POST /api/search {query}
  5. Re-embed query → search top-10 → return with url + highlight
  6. UI: live search (debounce 300ms), loading state
  7. Analytics: log queries to understand intent

Working Examples

ScenarioConfig
Indexing pipeline (Node)import { OpenAI } from 'openai'; import { QdrantClient } from '@qdrant/js-client-rest'; const openai = new OpenAI(), qdrant = new QdrantClient({ url: 'http://localhost:6333' }); for (const batch of chunkArray(articles, 100)) { const embeds = await openai.embeddings.create({ model: 'text-embedding-3-small', input: batch.map(a => a.title + ' ' + a.body) }); await qdrant.upsert('articles', { points: batch.map((a, i) => ({ id: a.id, vector: embeds.data[i].embedding, payload: { title: a.title, url: a.url } })) }); }
Search endpointapp.post('/api/search', async (req, res) => { const { query } = req.body; const { data } = await openai.embeddings.create({ model: 'text-embedding-3-small', input: query }); const results = await qdrant.search('articles', { vector: data[0].embedding, limit: 10 }); res.json({ results: results.map(r => r.payload) }); });
Hybrid search (keyword + semantic)# Pseudo-code: combine BM25 + embedding const bm25 = await elasticsearch.search({ q: query }); const semantic = await qdrant.search({ vector: embed(query) }); // Reciprocal Rank Fusion for (const r of [...bm25, ...semantic]) { results[r.id] = (results[r.id] || 0) + 1 / (60 + rank); }
pgvector in Next.js// db.sql: CREATE EXTENSION vector; CREATE TABLE articles (id bigint, embedding vector(1536)); CREATE INDEX ON articles USING hnsw (embedding vector_cosine_ops); // app.ts: const { rows } = await db.query('SELECT * FROM articles ORDER BY embedding <=> $1 LIMIT 5', [embedding]);
Frontend live searchconst [query, setQuery] = useState(''); const [results, setResults] = useState([]); const debouncedQuery = useDebounce(query, 300); useEffect(() => { if (!debouncedQuery) return; fetch('/api/search', { method: 'POST', body: JSON.stringify({ query: debouncedQuery }) }) .then(r => r.json()).then(d => setResults(d.results)); }, [debouncedQuery]);

Common Pitfalls

  • Chunks too small — too specific, no context. Too large — cosine dilution. 500-1000 words sweet spot
  • No re-index on content update → stale search. Hook indexer into CMS update webhook
  • Embedding size mismatch: 1536 vs 3072 → recreate collection
  • No filter by category/date → search wrong content. Add metadata filters
  • Semantic without keyword backup → exact name matches miss. Hybrid search recommended

Learn more

Frequently Asked Questions

Cost for 10k articles?

Indexing ~$0.50 (one-time). Queries: $0.001 per search. 1000 searches/day → $1/mo.

Why is semantic better than keyword?

Understands synonyms ("how to fix" ≈ "troubleshoot"), concepts ("SSL errors" finds "TLS cert issues"). For short queries keyword is better.

Algolia vs self-hosted?

Algolia: $40/mo minimum, polished UX but platform lock-in. Self-hosted (Qdrant): $5/mo VPS, your data, more work.

Enterno example?

Enterno articles + pSEO all support semantic search on /search query. See <a href="/en/help">Docs</a> or <a href="/en/articles">Articles</a>.