/ PROJECT

RAG - Query Transformation

Welcome back, I hope you enjoyed the first part of this series where we are going to explore a portion portion of RAG tool. It is higly suggested that you take a look at all the projects of this series step by step and more importantly to code along this project. If you don't code it out you won't get it.

In this notebook we are going to take a look how to create an assistant that will help us modify our question. This technique is called Query transformation. Imagine Query Transformation as your search request going through a makeover montage in a movie. Your original query walks in a bit plain and straightforward, and then gets spruced up with the latest styles and smarts to become the most efficient, dashing version of itself. By rephrasing, optimizing, and enhancing your search terms, Query Transformation ensures that what you’re asking for is crystal clear and ready to fetch the best possible results. It’s like sending your query to a high-end stylist who makes sure it’s dressed to impress and ready to get exactly what you need!

It is taken for granted that you already have created a .env file as requested in the first part of this series.

Modules Required

import os
from dotenv import load_dotenv
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import AzureChatOpenAI
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import AzureOpenAIEmbeddings
from langchain.load import loads, dumps
from typing import List

load_dotenv()

The "new" modules we are going to use in this section are

  • langchain.prompts.ChatPromptTemplate: ChatPromptTemplate is used to create templates for chat prompts. Using these templates can be used to standardize and structure the prompts that are fed into the language model, ensuring consistency and improving the quality of the generated responses.
  • langchain.load.loads: loads function is used to load data or configurations from a serialized format (such as JSON or YAML).
  • langchain.load.dumps: dumps function in the langchain library is used for serializing data or configurations into a specific format (like JSON or YAML).
  • typing.List: List class from this module is used to indicate that a variable is expected to be a list of a certain type.

The main ingedient in our recipy in this series is always an LLM agent, which will assist us. As a result, the first step that will take us closer to this result, is to call our LLM model

LLM Agent and its Prompt

llm_075 = AzureChatOpenAI(deployment_name=os.getenv('LLM_35_Deployment'),
                         model_name=os.getenv('LLM_35_Model'),
                         azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'),
                         temperature=0.75,
                         )

In this call we have set the temperature argument to be 0.75 (mind that it is an argument that takes value from the close interval [0,1]). But what this argument represent, right? Imagine the temperature argument in an LLM call as the spice level in a recipe. When you set the temperature low, it’s like adding just a pinch of spice, making the responses mild, predictable, and focused. Crank up the temperature, and it’s like dumping in hot sauce, making the responses more adventurous, creative, and sometimes a bit unpredictable. So, adjusting the temperature lets you control how bold or conservative the language model’s answers will be, ensuring your conversational dish is seasoned just to your taste! Since we need it to generate new questions similar, yet better formated to ours we need to use this spiciness.

Next we are going to define a prompt. During the first part of this series we used a prompt from the self, where we requested it using the hub module. Now we are going to create our own. The reason for that is to give to our agent "personality" or better purpose for its existance.

prompt = ChatPromptTemplate.from_template(
    """
    You are a smart assistant. Your task is to create 5 questions, each phrased differently and from various perspectives, based on the given question, to retrieve relevant documents from a vector database. By offering multiple perspectives on the user's question, your aim is to help the user mitigate some of the constraints of distance-based similarity search. List these alternative questions, each on a new line. Original question: {question}
    """
)

Chain Construction

Finally, we are ready to create our very first chain, in this section. Initially, we need to pass our question in the chain. Then we insert it to the prompt in th {question} place. After having our prompt ready-to-go, we "send" it to our LLM agent. Lastly, using StrOutputParser() and (lambda x: "\n".join(x.split("\n"))) we exporet the results in a readable and nice format for us humans! 🤖 In general before setting up a chain it is suggested to think your steps one by one as simple as possible, define your functions/tools (if needed) and then set it up. Do not start from defining everything, as lated on you will miss something or will need to modify them as you need those steps to be "connected" somehow.

By connected, I mean the {question} part in the prompt building, or in other cases more information like {context} etc.


generate_queries = (
        {"question": RunnablePassthrough()}
        | prompt
        | llm_075
        | StrOutputParser()
        | (lambda x: "\n".join(x.split("\n")))
)

result = generate_queries.invoke("What do you know about Query Transforamtion?")
print(result)

Fusion Scores and uniqueness

Since we have generated our better-formatted question, we need to use them to get better-more relevant answers to our questions. There are many options, that you could experiment with. Here we are going to explore RAG-Fusion. Through this process, relevant information for each question is retrieved. A union of the retrievals is created that keeps only the unique of them. Finally, a rank is measures and depending on our preference an answer is presented.

def rrf(results: List[List], k=60):
    fused_scores = {}
    for docs in results:        
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)
        reranked_results = [
            (loads(doc), score)
            for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
        ]        
        return reranked_results

Initially a dictionary is created to save the fused scores for each unique document retrieved (fused_scores). After that we iterate through each ranked document, and each document depending on its rank. We then create a key for each document. Either we add the document in the list (if it does not exists) or retrieve its score. Based on the question, we update the score using the provided RRF formula 1/(rank + k). Finally, we return the list of containing each document and its fused score in the format of tuples.

After having this tool defined and set up in our toolbox, we can either us the previously defined prompt, or create a new one to continue with. Just for practise we are going to set up a new one where we are going to generate 3 questions, and not 5 as we did in the previously created prompth.

prompt = ChatPromptTemplate.from_template(
    """
    You are a smart assistant. Your task is to create 3 questions, each phrased differently and from various perspectives, based on the provided question, to retrieve relevant documents from a vector database. By offering multiple perspectives on the user's question, your aim is to help the user address some of the limitations of distance-based similarity search. List these alternative questions, each on a new line. Original question: {question}
    """
    )

So we have our LLM agent ready-to-go, its purpose of existance set, a new tool to generate the scores and retrieve the most relevant information. So you may wonder what is left. We now are ready to define a new chain to include everything we did.

generate_queries = (
        {"question": RunnablePassthrough()}
        | prompt
        | llm_075
        | StrOutputParser()
        | (lambda x: x.split("\n"))
)

fusion_retrieval_chain = (
        {'question': RunnablePassthrough()}
        | generate_queries
        | retriever.map()
        | rrf
)

fusion_retrieval_chain.invoke("What are the benefits of Bert?")

Food for thought

There are multiple other ways that you could modify the purpose of the LLM agent, your assistance, so that it could help you the way you want. For example, imagine that you have a complex question, where you are not sure how to provide/invoke it to your shiny and brand new chain. Or, when providing it to your chain the results, you are getting back are not satisfying. Giving your agent a new slightly modified purpose, everything will be again bright and shiny. For example, you could use within your chain the following prompt:

decompostion_prompt = ChatPromptTemplate.from_template(
    """
    You are a helpful assistant capable of simplifying complex questions into smaller parts.
    Your goal is to break down the given question into several sub-questions that can each be answered separately, ultimately addressing the main question.
    List these sub-questions, each separated by a newline character.
    Original question: {question}
    Output (3 queries):
    """
)

Feeding this prompt to a new chain could save you from some time and simplify your question to reach to your goal, step by step.

query_generation_chain = (
        {"question": RunnablePassthrough()}
        | decompostion_prompt
        | llm_075
        | StrOutputParser()
        | (lambda x: x.split("\n"))
)

questions = query_generation_chain.invoke("What are the benefits of LDA?")
questions

🚀 Imagine the possibilities and think about what you could do by defining a proper prompt based on you needs. 🚀 I hope you enjoyed this as much as I did and of course, learned something from it! In the next post of this series, we will explore Hypothetical Document Embeddings. We'll create our own documents, allowing their embedding vectors to identify neighborhoods in the corpus embedding space where similar real documents can be retrieved based on vector similarity.

Be safe, code safer!