GroqRAG: Implementing advanced RAG models with Groq

Aug 6, 2024

GroqRAG: Implementing advanced RAG models with Groq

Retrieval-Augmented Generation (RAG) enhances generative models by integrating information retrieved from relevant documents into the generation process. This guide details the implementation of an advanced and robust RAG pipeline using the Groq API in Python. The pipeline includes multi-step document retrieval, verification, response generation, and a state-of-the-art ranking algorithm to identify the most relevant documents. It is designed to handle multiple documents stored in a data/ folder and maintain conversational context across messages.

Step 1: Install Dependencies

pip install groq requests PyPDF2 scikit-learn

Step 2: Obtain API Key

export GROQ_API_KEY=&lt;your-api-key-here&gt;

Step 3: Implement Advanced RAG Pipeline with Groq API

Environment Setup: Import the necessary libraries and set up the API key and base URL:

import os
from groq import Groq
from PyPDF2 import PdfReader
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import glob
from collections import deque
# Set up your API key and base URL
API_KEY = os.getenv(&quot;GROQ_API_KEY&quot;)
BASE_URL = &#39;https://api.groq.com/openai/v1&#39;
client = Groq(api_key=API_KEY)

Function to Extract Text from PDF: Define a function to extract text from a local PDF file:

def extract_text_from_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    text = &quot;&quot;
    for page in reader.pages:
        text += page.extract_text() or &quot;&quot;
    return text

Function to Load and Extract Text from Multiple PDFs:

Define a function to load and extract text from all PDF files in the data/ folder:

def load_and_extract_texts_from_pdfs(data_folder=&quot;data/&quot;):
    texts = []
    file_paths = glob.glob(os.path.join(data_folder, &quot;*.pdf&quot;))
    for file_path in file_paths:
        try:
            text = extract_text_from_pdf(file_path)
            texts.append({&quot;text&quot;: text, &quot;file_path&quot;: file_path})
        except Exception as e:
            print(f&quot;Error processing {file_path}: {e}&quot;)
            return texts

Function to Split Text into Chunks: Split the extracted text into manageable chunks for processing:

def split_text_into_chunks(text, chunk_size=512):
    chunks = []
    for i in range(0, len(text), chunk_size):
        chunks.append(text[i:i+chunk_size])
    return chunks

Advanced Ranking and Retrieval with TF-IDF and Cosine Similarity: Use TF-IDF and cosine similarity to identify the most relevant chunks based on the input query and apply additional ranking criteria for better performance:

def retrieve_relevant_chunks(texts, query, top_k=5):
    corpus = [query] + [text[&quot;text&quot;] for text in texts]
    vectorizer = TfidfVectorizer().fit_transform(corpus)
    vectors = vectorizer.toarray()
    cosine_matrix = cosine_similarity(vectors)
    similarity_scores = cosine_matrix[0][1:]  # Exclude query itself
    ranked_indices = np.argsort(similarity_scores)[-top_k:]
    relevant_texts = [texts[idx] for idx in ranked_indices]
    return relevant_texts

Function to Generate a Response Based on Verified Chunks: Define a function that uses the verified chunks as context to generate a response using the Groq API:

def generate_response(chunks, query, context_history=None, model=&quot;llama3–8b-8192&quot;):
    context = &quot; &quot;.join([chunk[&quot;text&quot;] for chunk in chunks])
    if context_history:
        context = &quot; &quot;.join(context_history) + &quot; &quot; + context
    response = client.chat.completions.create(
        messages=[{&quot;role&quot;: &quot;user&quot;, &quot;content&quot;: f&quot;Context: {context} Query: {query}&quot;}],
        model=model,
        stream=False,
    )
    return response.choices[0].message.content

Maintaining Conversational Context: Implement a function to maintain conversational context using a deque for efficient context management:

def maintain_conversational_context(response, context_history, max_context_length=10):
    if len(context_history) &gt;= max_context_length:
        context_history.popleft()
    context_history.append(response)
    return context_history

Multi-Step RAG Function: Implement the main RAG function that orchestrates the extraction, retrieval, verification, and generation steps:

def rag_pipeline(data_folder, query, context_history):
    # Step 1: Load and extract texts from PDFs
    texts = load_and_extract_texts_from_pdfs(data_folder)
    print(f&quot;Loaded and Extracted Texts from {len(texts)} PDFs&quot;)

    # Step 2: Retrieve relevant chunks
    relevant_texts = retrieve_relevant_chunks(texts, query)
    print(f&quot;Relevant Texts: {[text[&#39;file_path&#39;] for text in relevant_texts]}&quot;)

    # Step 3: Split relevant texts into chunks
    chunks = []
    for text in relevant_texts:
        chunks.extend(split_text_into_chunks(text[&quot;text&quot;]))

    # Step 4: Generate response
    response = generate_response(chunks, query, context_history)
    return response

Interactive Command Line Interface: Implement an interactive CLI to allow users to input queries and receive responses while maintaining conversational context:

def interactive_cli(data_folder=&quot;data/&quot;):
    context_history = deque()
    print(&quot;Welcome to the RAG-powered conversational assistant! Type /bye to exit.&quot;)

    while True:
        user_input = input(&quot;&gt;&gt; user: &quot;)
        if user_input.lower() == &quot;/bye&quot;:
            print(&quot;Goodbye!&quot;)
            break

        response = rag_pipeline(data_folder, user_input, context_history)
        context_history = maintain_conversational_context(response, context_history)

        print(f&quot;&gt;&gt; groq: {response}&quot;)

**Run the Interactive CLI: **Start the interactive CLI to begin the conversation:

if __name__ == &quot;__main__&quot;:
    interactive_cli()

Explanation:

**Environment Setup: **The necessary libraries are imported, and the API key is set up as an environment variable. The Groq client is initialized using the API key.
Text Extraction: The extract_text_from_pdf function reads a PDF file and extracts its text content. The load_and_extract_texts_from_pdfs function loads all PDFs from the data/ folder and extracts text from each.
Text Splitting: The split_text_into_chunks function divides the extracted text into smaller, manageable chunks.
**Advanced Ranking and Retrieval: **The retrieve_relevant_chunks function uses TF-IDF vectorization and cosine similarity to rank the documents based on their relevance to the query. It identifies the top-k most relevant documents. Additional ranking criteria can be added for improved performance.
**Response Generation: **The generate_response function uses the verified chunks as context to generate a response using the Groq API. It also incorporates historical context if available.
Maintaining Conversational Context: The maintain_conversational_context function maintains a history of previous responses to ensure continuity in the conversation.
Multi-Step RAG Pipeline: The rag_pipeline function integrates extraction, splitting, retrieval, verification, and generation into a coherent workflow. It prints the number of loaded texts, the relevant texts’ file paths, and then generates the final response.
Interactive Command Line Interface: An interactive CLI allows users to input queries and receive responses while maintaining conversational context. The conversation continues until the user types /bye.

Make It More Innovative:

Feedback Mechanism:

def get_user_feedback():
    feedback = input(&quot;&gt;&gt; user: Please rate this response (1-5): &quot;)
    try:
        rating = int(feedback)
        if 1 &lt;= rating &lt;= 5:
            return rating
        else:
            print(&quot;Invalid rating. Please enter a number between 1 and 5.&quot;)
            return get_user_feedback()
    except ValueError:
        print(&quot;Invalid input. Please enter a number between 1 and 5.&quot;)
        return get_user_feedback()

Updating the Main Loop with Feedback:

def interactive_cli(data_folder=&quot;data/&quot;):
    context_history = deque()
    print(&quot;Welcome to the RAG-powered conversational assistant! Type /bye to exit.&quot;)

    while True:
        user_input = input(&quot;&gt;&gt; user: &quot;)
        if user_input.lower() == &quot;/bye&quot;:
            print(&quot;Goodbye!&quot;)
            break

        response = rag_pipeline(data_folder, user_input, context_history)
        context_history = maintain_conversational_context(response, context_history)

        print(f&quot;&gt;&gt; groq: {response}&quot;)

        # Get feedback from the user
        rating = get_user_feedback()
        print(f&quot;&gt;&gt; user: Rated the response as {rating}/5&quot;)

Implementation of Visual and Interactive Responses:

Generating Visual Summaries:

import matplotlib.pyplot as plt

def generate_visual_summary(texts):
    # Example: Generate a bar chart showing the length of each document
    lengths = [len(text[&quot;text&quot;]) for text in texts]
    file_names = [os.path.basename(text[&quot;file_path&quot;]) for text in texts]

    plt.figure(figsize=(10, 5))
    plt.bar(file_names, lengths, color=&#39;blue&#39;)
    plt.xlabel(&#39;Document&#39;)
    plt.ylabel(&#39;Length&#39;)
    plt.title(&#39;Length of Documents&#39;)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

Updating the Main Loop to Include Visual Summaries:

def interactive_cli(data_folder=&quot;data/&quot;):
    context_history = deque()
    print(&quot;Welcome to the RAG-powered conversational assistant! Type /bye to exit.&quot;)

    while True:
        user_input = input(&quot;&gt;&gt; user: &quot;)
        if user_input.lower() == &quot;/bye&quot;:
            print(&quot;Goodbye!&quot;)
            break

        response = rag_pipeline(data_folder, user_input, context_history)
        context_history = maintain_conversational_context(response, context_history)

        print(f&quot;&gt;&gt; groq: {response}&quot;)

        # Get feedback from the user
        rating = get_user_feedback()
        print(f&quot;&gt;&gt; user: Rated the response as {rating}/5&quot;)

        # Generate and display a visual summary
        texts = load_and_extract_texts_from_pdfs(data_folder)
        generate_visual_summary(texts)

By incorporating these innovative features, you can create a more engaging and advanced RAG pipeline that responds accurately, learns and adapts over time, provides visual insights, and maintains a seamless conversational experience.