HNSW Index — ANN for Vectors

Igor Verentsov

HNSW: Hierarchical Navigable Small World

By Igor Verentsov · Updated May 23, 2026

Key idea:

HNSW (Hierarchical Navigable Small World) — graph-based ANN algorithm, the most popular for vector DB. Builds a multi-layer graph: top layer sparse, bottom dense. Search: greedy descent from top to bottom. O(log N) complexity. Used in Qdrant, Pinecone, Weaviate, pgvector (opt-in). Parameters: M (connections per node, 16-64), ef_construction (build quality), ef (search quality).

Below: details, example, related terms, FAQ.

Free online tool — HTTP header checker: instant results, no signup.

Check your site →

Details

M: 16 default. Higher = better recall, more RAM
ef_construction: 100-500. Index build quality — one-time cost
ef: search-time param. Higher = better recall, slower. 32-200 typical
Memory: 4-10x vector size due to graph links. 1M × 1536-dim FP32 → ~30 GB
Recall: >95% at ef=100 for most datasets

Example

# pgvector HNSW
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Query tuning
SET hnsw.ef_search = 100;  -- runtime param
SELECT * FROM docs ORDER BY embedding <=> query_vec LIMIT 5;

Related Terms

Understanding HNSW Parameters

The HNSW algorithm's performance heavily relies on its configurable parameters, which dictate how the index is constructed and how searches are conducted. Two critical parameters are M and ef.

M represents the maximum number of connections for each node in the graph. A higher value for M creates a denser graph, which can improve recall but at the cost of increased memory usage and slower construction times. Typical values range from 16 to 64. It's crucial to strike a balance based on the specific use case and available resources.

ef_construction determines the quality of the graph during its construction phase. Setting this parameter higher leads to a more accurate representation of the data structure but also increases the time required to build the index. A common practice is to set ef_construction to a value significantly higher than M, often in the range of 100 to 200.

On the other hand, ef affects the search quality and speed. A higher ef value allows for a more thorough search, as it specifies the number of neighbors to consider during the search process. Values for ef can typically range from 10 to 100, depending on the required precision and latency constraints.

In summary, understanding and tuning these parameters is essential for optimizing the HNSW index for your specific application.

HNSW vs. Other Approximate Nearest Neighbor Algorithms

The HNSW algorithm stands out among various Approximate Nearest Neighbor (ANN) techniques due to its unique graph-based approach. To understand its advantages, it's essential to compare it with other popular ANN algorithms like LSH (Locality-Sensitive Hashing) and Annoy (Approximate Nearest Neighbors Oh Yeah).

LSH is effective for high-dimensional data but suffers from a lack of flexibility in distance metrics. It relies on hash functions to group similar items, which can lead to lower recall rates in certain cases. In contrast, HNSW's graph-based structure allows for more flexible and accurate distance calculations, making it suitable for a broader range of applications.

Annoy, developed by Spotify, is another popular choice for ANN searches. It uses a forest of random projection trees to partition the data. While Annoy is efficient for read-heavy scenarios, it can be less effective in dynamic environments where data is frequently updated. HNSW, with its incremental graph updates, provides better support for real-time data changes.

In terms of performance, HNSW typically exhibits better search speeds and accuracy compared to both LSH and Annoy, particularly in scenarios with large datasets. The logarithmic search complexity of HNSW allows it to scale efficiently with increasing data volumes.

In conclusion, while each ANN algorithm has its strengths, HNSW's combination of flexibility, accuracy, and efficiency makes it a preferred choice for many modern applications involving vector databases.

Practical Implementation of HNSW in Vector Databases

Implementing the HNSW algorithm in a vector database can be straightforward with the right tools. Below is an example of how to configure HNSW parameters using the Qdrant vector database, which supports this algorithm.

First, ensure you have Qdrant installed. You can use Docker for easy setup:

docker run -p 6333:6333 qdrant/qdrant

Next, you can create an HNSW index with the following configuration:

{
  "hnsw_config": {
    "m": 32,
    "ef_construction": 200,
    "ef": 100
  }
}

Once the index is created, you can insert vectors into your HNSW index using the following command:

curl -X POST 'http://localhost:6333/collections/your_collection_name/points' -H 'Content-Type: application/json' -d '{
  "points": [
    {
      "id": 1,
      "vector": [0.1, 0.2, 0.3, ...]
    },
    {
      "id": 2,
      "vector": [0.4, 0.5, 0.6, ...]
    }
  ]
}'

For performing a search on your index, you can utilize the following command:

curl -X POST 'http://localhost:6333/collections/your_collection_name/points/search' -H 'Content-Type: application/json' -d '{
  "vector": [0.1, 0.2, 0.3, ...],
  "top": 5
'}

This command searches for the top 5 nearest neighbors to the specified vector. Adjust the parameters according to your needs to optimize performance and accuracy.

By following these steps, you can effectively implement and utilize the HNSW algorithm in your vector database applications.

Learn more

How-to

Glossary

What is CDC (Change Data Capture)

Research

Frequently Asked Questions

HNSW vs IVF?

HNSW: best recall + speed, but all in RAM. IVF: cheaper RAM (centroids + buckets), slower recall. For huge datasets (>100M) — IVF + re-ranking.

HNSW and filtering?

Prefilter can cut graph connectivity. Quality vector DBs (Qdrant, Weaviate) have filter-aware HNSW. pgvector added index filtering in 2024.

Is DiskANN an alternative?

DiskANN — SSD-based ANN. 10× cheaper memory, 2-3× slower. For billion-scale. Milvus, MyScale support it.

Try the live tool that powered this guide

Free plan — 10 monitors, checks every 5 min, no card required. Upgrade for 1-minute interval and multi-region monitoring.

Start free See pricing