Unlocking the Power of Vector Databases for RAG Systems

Uchechi Njoku
3 min readJul 12, 2024

--

In my previous post, I explored RAGs (Retrieval-Augmented Generation), where “R” stands for retrieval, as a framework for working with LLMs. If you missed it, you can catch up [here].

In the data space, there are multiple ways to retrieve data from your knowledge base for use in RAG. A popular method involves using Vector Databases. These specialized databases are designed to store, index, and query high-dimensional vector data, which is typically encountered in projects requiring RAG systems due to the explosive and often heterogeneous nature of the data. Some popular examples of vector databases include:

1. FAISS (Facebook AI Similarity Search):
Developed by Facebook AI, FAISS is a library for efficient similarity search and clustering of dense vectors. It is highly optimized for both CPU and GPU implementations and supports various indexing methods.

2. Milvus:
Milvus is an open-source vector database designed for scalable and efficient similarity search. It supports a variety of index types and provides features for managing large-scale vector data.

3. Annoy (Approximate Nearest Neighbors Oh Yeah):
Developed by Spotify, Annoy is a library that builds static, read-only structures for fast approximate nearest neighbor searches in high-dimensional spaces.

4. Elasticsearch with Vector Search:
Elasticsearch has added support for dense vector fields, enabling similarity search capabilities on top of its powerful full-text search engine.

5. Pinecone:
Pinecone is a managed vector database service that provides scalable, real-time vector search capabilities. It abstracts away the complexity of managing infrastructure and offers integration with various AI/ML tools.

Vector databases provide optimal storage and retrieval for large datasets by indexing the data. When it is time to search the database, it retrieves data based on the similarity of the search query to the indexed data. Some example vector indices include:

  • Flat (e.g. BruteForce)
  • Graph (e.g. HNSW)
  • Inverted (e.g. IVF)

Read more about vector indices [here].

Steps to Use a Vector Database in an LLM-Related Project

Steps to Use a Vector Database in an LLM-Related Project

1. Transform Data into Vector Embeddings:
Vector embeddings are a set of numbers that capture the data’s associations, such as word, sentence, document, and image embeddings. To create these vector embeddings from raw data, pre-trained ML models are used. You can find pre-trained models on Hugging Face.

2. Create Indices for the Embeddings.

3. Store the Indexed Embeddings in the Vector Databases.

4. Search Vector Databases with a Query:
Vector databases perform searches based on similarity, a process called semantic search because it “interprets” the meaning of the words and phrases in the query.

The outcome of the query retrieved from the vector database is then used to augment the generation done by the LLM, resulting in RAG!

One question remains: How do we choose a vector database? How do we measure the quality of the retrieval? This will be the focus of the next blog post.

--

--

Uchechi Njoku
Uchechi Njoku

Written by Uchechi Njoku

I am an early stage researcher in Data Engineering for Data Science, a polyglot and a traveler :-))

No responses yet