[RAG, AI] Vector Similarity Search and RAG

Vector Similarity Search

Vector similarity search (or vector search) is a data retrieval method that searches for data represented as vectors. A vector is a numerical representation of data such as words, sentences, documents, images, and videos. It is a concept primarily used in machine learning models and artificial intelligence applications. Vectors make it possible to determine the semantic relationship and similarity between data, enabling searches that are more relevant to the meaning and context of a query than simple keyword-based searches.

A vector database is a repository for vector data. To store data in a vector database, the data must first be converted into vectors. The process of converting data into a simple format is called embedding, and converting data into a vector format is called vectorization or vector embedding.

A pre-trained model that embeds data into vectors is called a vector embedding model. Examples include Word2Vec, GloVe, FastText, or pre-trained transformer-based models like BERT (Bidirectional Encoder Representations from Transformers) or RoBERTa (a robustly optimized BERT approach).

A vector database not only stores vector data, but also searches for vector data by calculating the similarity between different vector data. This is called similarity search or semantic search.

RAG

Retrieval-Augmented Generation (RAG) is a technique where an LLM first retrieves data it can reference when generating an answer to a query, and then generates an answer based on that data. Before generating an answer to a question, it searches for data that can be used as a reference for the answer, and responds based on that data. Here, vector similarity search is used to retrieve data from a dedicated vector database. Vector similarity search refers to finding the vector data that has the highest similarity to the query data vector, and RAG is one of the search techniques that utilize the vector similarity search method.

The LLM problems that RAG solves are as follows:

LLMs cannot respond to unlearned (outdated) data. When using RAG, the LLM uses the retrieved similar data as context to generate responses based on more updated data.
The hallucination phenomenon, where LLMs generate inaccurate data as responses, can occur. When using RAG, the LLM uses the retrieved similar data as context to produce more accurate and relevant responses. This helps reduce hallucinations.

The operation process of RAG is as follows:

Vectorize the data and store it in a vector database.
Search the vector database with new data to retrieve the most similar data.
Pass the similar data along with the question when querying the LLM. In other words, the prompt is Question + Similar Data.

LangChain

LangChain is a tool that abstracts and calls LLM providers. It is a framework that integrates various features to help perform complex tasks based on LLMs. In short, LangChain can be seen as a framework for easily integrating LLMs into various applications. LangChain is widely known as the synonymous framework for vector search and RAG, with LlamaIndex being a similar framework.

Using frameworks like LangChain makes it easy to implement vector searches and LLM-based chatbots. LangChain also provides abstractions for embedding model providers that vectorize data. Using these features, you can effortlessly implement a RAG-based LLM chatbot interface that converts data into a vector database and answers user queries based on the internal database with just a few lines of code.

References

https://firebase.google.com/docs/firestore/vector-search?hl=en