Dec 6, 2024 3 min read

What is hnsw - the algorithm that powers vector search engines and how is it relevant for you?

While working on implementing a recommendation system for one of my projects, I kept hearing about the HNSW (Hierarchical Navigable Small World) algorithm and its role in speeding up vector searches. Curious about its efficiency, I looked into how it works and why it's effective for large-scale data search. HNSW's unique structure—using layers and probabilistic skip lists—makes it ideal for fast and accurate nearest neighbor searches, which is what I needed to optimize the search experience in my app.

The HNSW algorithm is a top-performing technique for vector searches. It structures data in a way that enables ultra-fast searches and high recall, making it ideal for applications requiring quick, approximate nearest-neighbor (ANN) searches. Understanding HNSW can help you appreciate the core of how vector search engines like Pinecone operate, even though Pinecone abstracts much of this complexity for developers.

This blog is written by Jeremy Rivera at KushoAI. We're building the fastest way to test your APIs. It's completely free and you can sign up here.

Foundations of HNSW: How it Works

Proximity Graphs and Layers:

HNSW is a type of proximity graph, where nodes (data points) are connected based on proximity (such as Euclidean distance). By incorporating layers, HNSW achieves both efficient and accurate search paths. Upper layers connect nodes with long-range links, while lower layers have more densely packed nodes with short-range links.

import faiss
import numpy as np

# Set parameters for HNSW
d = 128  # Dimensionality of vectors
M = 32   # Number of neighbors to connect each vertex

# Initialize the HNSW index
index = faiss.IndexHNSWFlat(d, M)

# Check the initial state of the HNSW index
print("HNSW index initialized:", index.hnsw)

Probabilistic Skip Lists:

Inspired by skip lists, which allow quick navigation across ordered lists, HNSW uses layers to "skip" unnecessary nodes, speeding up searches. The algorithm moves down layers during the search, narrowing down closer to the target each time.

import faiss
import numpy as np

# Set parameters for HNSW
d = 128  # Dimensionality of vectors
M = 32   # Number of neighbors to connect each vertex

# Initialize the HNSW index
index = faiss.IndexHNSWFlat(d, M)

# Check the initial state of the HNSW index
print("HNSW index initialized:", index.hnsw)

Navigable Small World Graphs:

HNSW combines elements of navigable small world (NSW) graphs, creating a structure that uses short- and long-range links to achieve logarithmic search complexity. This results in quick searches with a high probability of locating nearest neighbors efficiently.

# Generate random vectors for demonstration
num_vectors = 1000000
xb = np.random.random((num_vectors, d)).astype('float32')

# Add vectors to the index
index.add(xb)

# Check the maximum level and distribution of levels
max_level = index.hnsw.max_level
levels_distribution = np.bincount(faiss.vector_to_array(index.hnsw.levels))

print("Max level in HNSW index:", max_level)
print("Distribution of vectors across levels:", levels_distribution)

HNSW in Vector Search Engines: Why it Matters

With applications in vector search engines, HNSW underpins the ability to perform searches through vast datasets in milliseconds—crucial in real-time recommendation systems, personalized search, and AI-powered assistants like Pinecone’s. For instance, querying a large collection of user or product embeddings can be done in near real-time, providing relevant, fast results.

Pinecone, for example, allows developers to integrate this kind of search without having to manage the complex index structures. Still, if you’re implementing similar functionality with other libraries like Faiss, understanding HNSW's structure helps in fine-tuning parameters like the number of neighbors and levels in your specific application, optimizing for both recall and speed.

This blog is written by Jeremy Rivera at KushoAI. We're building an AI agent that tests your APIs for you. Bring in API information and watch KushoAI turn it into fully functional and exhaustive test suites in minutes.

Foundations of HNSW: How it Works

Proximity Graphs and Layers:

Probabilistic Skip Lists:

Navigable Small World Graphs:

HNSW in Vector Search Engines: Why it Matters

You might also like...

Does GitHub co-pilot improve code quality?

Stop cooking spaghetti and follow PEP 8 – Style Guide for Python Code

WordPress Vs WP Engine: Battle for open source control and money

Learn/Try Linux in your web browser

Should you still use Kubernetes?