HNSW (Hierarchical Navigable Small World) — graph-based ANN algorithm, самый популярный для vector DB. Строит multi-layer graph: top layer — sparse, bottom — dense. Search: greedy descent от top до bottom. O(log N) complexity. Used в Qdrant, Pinecone, Weaviate, pgvector (opt-in). Parameters: M (connections per node, 16-64), ef_construction (build quality), ef (search quality).
Ниже: подробности, пример, смежные термины, FAQ.
# pgvector HNSW
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query tuning
SET hnsw.ef_search = 100; -- runtime param
SELECT * FROM docs ORDER BY embedding <=> query_vec LIMIT 5;HNSW: best recall + speed, но all в RAM. IVF: cheaper RAM (centroids + buckets), slower recall. Для huge datasets (>100M) — IVF + re-ranking.
Prefilter может cut graph connectivity. Качественные vector DB (Qdrant, Weaviate) имеют filter-aware HNSW. pgvector 2024 добавил index filtering.
DiskANN — SSD-based ANN. 10× cheaper memory, 2-3× slower. Для billion-scale. Milvus, MyScale поддерживают.