[ad_1]
Continue reading here for more ideas on how to improve the performance of your RAG pipeline and make it production-ready.
This section describes the packages and API keys that you need to follow in this article.
required packages
This article shows you how to use LlamaIndex in Python to implement simple and advanced RAG pipelines.
pip install llama-index
This article uses LlamaIndex. v0.10
. If you are upgrading from an older LlamaIndex version, you will need to run the following command to install her LlamaIndex and run it properly.
pip uninstall llama-index
pip install llama-index --upgrade --no-cache-dir --force-reinstall
LlamaIndex provides an option to save vector embeddings locally to a JSON file for persistent storage. This is great for quickly prototyping ideas. However, because advanced RAG technology is intended for production-ready applications, it uses vector databases for persistent storage.
In addition to storing vector embeddings, metadata storage and hybrid search capabilities are required, so we decided to use the open source vector database Weaviate (v3.26.2
), supports these features.
pip install weaviate-client llama-index-vector-stores-weaviate
API key
Use Weaviate embed, which is free to use without registering an API key. However, this tutorial uses OpenAI’s embedded model and LLM, so you will need an OpenAI API key. To get one, you need an OpenAI account and “Create a new private key” under API Keys.
Then create a local .env
Place the file in your root directory and define your API key within it.
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
You can then load your API key using the following code:
# !pip install python-dotenv
import os
from dotenv import load_dotenv,find_dotenvload_dotenv(find_dotenv())
This section describes how to implement a simple RAG pipeline using LlamaIndex. This Jupyter Notebook describes an entire simple RAG pipeline. For implementation using LangChain, you can continue with this article (Simple RAG Pipeline with LangChain).
Step 1: Define the embedding model and LLM
First, you can define embedded models and LLMs in a global configuration object. By doing this, you don’t have to explicitly specify the model again in your code.
- Embedding model: Used to generate vector embeddings of document chunks and queries.
- LLM: Used to generate answers based on user queries and relevant context.
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import SettingsSettings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()
Step 2: Load the data
Next, create a local directory named . data
Go to the root directory and download the sample data from the LlamaIndex GitHub repository (MIT license).
!mkdir -p 'data'
!wget '<https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt>' -O 'data/paul_graham_essay.txt'
The data can then be loaded for further processing.
from llama_index.core import SimpleDirectoryReader# Load data
documents = SimpleDirectoryReader(
input_files=["./data/paul_graham_essay.txt"]
).load_data()
Step 3: Split the document into nodes
Because the entire document is too large to fit into LLM’s context window, we need to split the document into smaller text chunks called . Nodes
In the llama index. To parse the loaded document into nodes, use SimpleNodeParser
The defined chunk size is 1024.
from llama_index.core.node_parser import SimpleNodeParsernode_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)
Step 4: Build the index
Next, we build an index to store all external knowledge in Weaviate, an open source vector database.
First, you need to connect to your Weaviate instance. In this case we are using Weaviate Embedded. This allows you to experiment with Notebooks for free without an API key. For production-ready solutions, we recommend deploying Weaviate yourself, such as via Docker or using a managed service.
import weaviate# Connect to your Weaviate instance
client = weaviate.Client(
embedded_options=weaviate.embedded.EmbeddedOptions(),
)
next, VectorStoreIndex
Store and manipulate data from Weaviate clients.
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStoreindex_name = "MyExternalContext"
# Construct vector store
vector_store = WeaviateVectorStore(
weaviate_client = client,
index_name = index_name
)
# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex(
nodes,
storage_context = storage_context,
)
Step 5: Set up the query engine
Finally, set up the index as the query engine.
# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine()
Step 6: Run a simple RAG query on your data
Now you can run simple RAG queries on your data, as shown below.
# Run your naive RAG query
response = query_engine.query(
"What happened at Interleaf?"
)
This section describes some simple adjustments you can make to turn the simple RAG pipeline above into an advanced RAG pipeline. This tutorial covers the following advanced RAG techniques:
We’ll only highlight the changes here, but you can find a complete end-to-end advanced RAG pipeline in this Jupyter Notebook.
The sentence window acquisition technique requires two adjustments. First, you need to adjust how your data is stored and post-processed. Instead of, SimpleNodeParser
Use the. SentenceWindowNodeParser
.
from llama_index.core.node_parser import SentenceWindowNodeParser# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
of SentenceWindowNodeParser
Do two things:
- Split the document into single sentences and embed them.
- A context window is created for each sentence. If you specify
window_size = 3
, the resulting window is three sentences long, starting from the sentence before the embedded sentence and spanning the sentence after it. Windows are saved as metadata.
During the search, the sentences that best match the query are returned. After retrieval, you need to define the entire window of metadata to replace the statement. MetadataReplacementPostProcessor
and use it in a list node_postprocessors
.
from llama_index.core.postprocessor import MetadataReplacementPostProcessor# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
...
query_engine = index.as_query_engine(
node_postprocessors = [postproc],
)
The implementation of hybrid search in LlamaIndex is query_engine
If the underlying vector database supports hybrid search queries.of alpha
The parameter specifies the weighting between vector searches and keyword-based searches. alpha=0
means keyword-based search, alpha=1
means a pure vector search.
query_engine = index.as_query_engine(
...,
vector_store_query_mode="hybrid",
alpha=0.5,
...
)
Adding a reranker to your advanced RAG pipeline takes just three simple steps:
- First, define the reranker model. What we are using here is
BAAI/bge-reranker-base
From Hug Face. - In the query engine, add the reranker model to the list.
node_postprocessors
. - increase
similarity_top_k
Get more context passages in the query engine. This can be reduced to:top_n
After reranking.
# !pip install torch sentence-transformers
from llama_index.core.postprocessor import SentenceTransformerRerank# Define reranker model
rerank = SentenceTransformerRerank(
top_n = 2,
model = "BAAI/bge-reranker-base"
)
...
# Add reranker to query engine
query_engine = index.as_query_engine(
similarity_top_k = 6,
...,
node_postprocessors = [rerank],
...,
)
[ad_2]
Source link