How to Implement RAG With Amazon Bedrock and LangChain

RAG (retrieval-augmented generation) is at the forefront of AI-driven applications, revolutionizing how we interact with language models and retrieve information. Building RAG applications has never been more accessible, thanks to managed services like Amazon Bedrock.

Amazon Bedrock is a fully managed service, providing users access to a plethora of foundation models (FM) from leading companies like Cohere, Stability AI, Mistral AI, and Anthropic. It also includes the Amazon Titan family of models, offering high-performing image, multimodal, and text model choices. The serverless service provides a playground to experiment with different FMs against different use cases. These can be customized with additional task-specific data and be integrated into enterprise systems.

In this article, we will explore Amazon Bedrock for developing a large language model (LLM) application and harnessing RAG. We will focus on setting up Amazon Bedrock and highlight the potent Amazon Titan model using LangChain. We will also look at how using pgvector on Timescale's PostgreSQL cloud platform, makes it easier to set up a vector database optimized for efficient storage and powering LLM applications with RAG.

Getting Started With Amazon Bedrock for RAG Applications

Amazon Bedrock is a comprehensive managed service that provides access to top-tier foundation models from leading artificial intelligence (AI) firms, such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon, through a unified API.

In this section, we will walk you through setting up the Amazon Bedrock service, using the Amazon Titan foundation model, and discussing its pricing. In the later sections, we will integrate with LangChain for our RAG application.

Here's a step-by-step overview of what we’ll cover:

IAM user setup: Setting up permissions and access keys for Amazon Bedrock.
Interacting with language models: Using boto3 to query and interact with Amazon Titan Text G1 - Express model.
Pricing: Understanding On-demand and Provisioned Throughput pricing for Amazon Titan Text Model.
Sample dataset: Loading and saving a sample e-commerce toy dataset for analysis.
Splitting dataset with LangChain: Utilizing LangChain for text splitting and chunking.
Vector database: Introduction to pgvector on Timescale for embedding storage and search.
Storing embeddings with LangChain: Creating and storing embeddings using LangChain, pgvector, and Timescale Vector.
Retriever: Understanding document retrieval using LangChain's retriever functionality.
Chains: Constructing a chatbot using LangChain's Chains.
Prompts: Customizing chatbot interaction with prompts using LangChain.

Right, let’s get started.

Setting up IAM user for Amazon Bedrock (optional)

The first step is to log in to Amazon Web Services.

For best practice, we will create an IAM role and grant access to the Amazon Bedrock service. To create an IAM role, go to the IAM service as shown below:

This will prompt the IAM Dashboard. Under the Access management tab, go to Users.

From here, we need to create a user, which can be done by clicking the Create user button as seen below:

Provide the specific user details, including the username. Then, select the checkbox to provide user access to the AWS Management Console, and underneath it, check “I want to create an IAM user.”

In the next step, we will give the IAM user Amazon Bedrock permissions. We can then create a group that includes all the required permissions for Bedrock.

Setting permissions for the newly created user

After this, we can name the User group and attach the policy named AmazonBedrockFullAccess.

Review the selected choices and proceed to create the user. Once created, save the credentials for the IAM user. You can then log in using those credentials.

Granting model access

Access to the Amazon Bedrock Foundation Models (FMs) isn't granted by default. Therefore, you must request access to the foundation models. This section will guide you through the process. After logging in with the IAM user using the saved credentials, you will be directed to the console home. You may notice that some services are disabled because their permissions are not attached. To proceed directly to Amazon Bedrock, follow the steps below:

On the left side menu, go to model access:

As of now, we have the necessary permissions to access the models. Click on the managed model access and check the box for the models needed. For certain models like Claude, you may first need to submit use case details before you can request access. For this tutorial, we will use the Amazon Titan model, as shown below.

We are now ready to use our Amazon Bedrock models for LLM inference. But, one final step before that is to create access keys for the IAM user to create a boto3 client. Boto3, or the AWS SDK for Python, is a Python API that allows developers to build applications on top of Amazon Web Services (AWS) infrastructure services.

To generate the keys, log in as the root user and navigate to the IAM service. Then, go to the Users tab and choose the user for whom you wish to create the access key. Then, go to the Security credentials tab.

Security credentials for the IAM user using the root user

Scroll down to the Access key section and click on Create access key as shown below:

Save the keys, as these will be required to access Amazon Web Services using boto3. If you still need to get the AWS CLI, install it using this guide.

Once installed, we can bash the following command and input the creds:

> aws configure

Interacting with language models in Amazon Bedrock

In this section, we will use boto3 to query the Titan Text G1 - Express from the Titan series. Amazon Titan Text Express boasts a context length of up to 8,000 tokens, making it ideal for various advanced language tasks like open-ended text generation, conversational chat, and supporting RAG. Initially optimized for English, it offers multilingual support for over one hundred additional languages in preview. The API request follows the following structure:

 "modelId": "amazon.titan-text-express-v1",
 "contentType": "application/json",
 "accept": "application/json",
 "body": "{\"inputText\":\"this is where you place your input text\",\"textGenerationConfig\":{\"maxTokenCount\":8192,\"stopSequences\":[],\"temperature\":0,\"topP\":1}}"

We can get the above from the provider's section or in the Amazon Titan Express section, as shown below:

The code below converts the following request to JSON. Then, the request is sent to the model using the invoke_model API. You can find the complete code in this Jupyter Notebook.

%pip install boto3 pandas langchain

import boto3
import json

bedrock=boto3.client(service_name="bedrock-runtime")

prompt_data="""
What is Generative AI?
"""
model_id="amazon.titan-text-express-v1"

payload = {
        "inputText":  prompt_data,
        "textGenerationConfig": {
            "maxTokenCount": 4096,
            "stopSequences": ["User:"],
            "temperature": 0,
            "topP": 1
        }
}

body=json.dumps(payload)


response=bedrock.invoke_model(
    body=body,
    modelId=model_id,
    accept="application/json",
    contentType="application/json"
)

response_body=json.loads(response.get("body").read())
repsonse_text=response_body['results'][0]['outputText']
print(repsonse_text)

Pricing

With Amazon Bedrock, charges apply for model inference and customization. You can opt for either On-Demand and Batch pricing, which is pay-as-you-go without time-based commitments, or Provisioned Throughput, where you commit to a time-based term for guaranteed performance.

On-Demand pricing

Here are the prices for the On-Demand Amazon Titan Text Model:

On-Demand and Batch pricing for the Amazon Titan model

An application developer makes hourly API calls to Amazon Bedrock. For instance, a request to Amazon Titan Text - Lite model to summarize 2,000 tokens of input text to 1,000 tokens output incurs an hourly cost of $0.001.

Provisioned Throughput pricing

The prices for Provisioned Throughput are provided below for the Amazon Titan Models:

Provisioned Throughput prices for the Amazon Titan model

An application developer purchases two model units of Titan Text Express with a one-month commitment for text summarization. This results in a monthly cost of $27,379.20, calculated based on the hourly rate of .40 per unit over 24 hours for 31 days.

Sample Dataset

The sample application focuses on an e-commerce toy company. The dataset utilized in this notebook is a subset derived from a larger public retail dataset found on Kaggle. Specifically, the notebook's dataset comprises approximately 800 toy products, whereas the public dataset encompasses over 370,000 products spanning various categories.

Loading the dataset

The first step is to load the provided sample dataset into a Pandas data frame. For reference, you will find the first five rows of the dataset below:

%pip install --upgrade lark pgvector langchain openai tiktoken psycopg2 timescale_vector

import pandas as pd

# Download and save the dataset in a Pandas dataframe.
DATASET_URL='https://github.com/GoogleCloudPlatform/python-docs-samples/raw/main/cloud-sql/postgres/pgvector/data/retail_toy_dataset.csv'
df = pd.read_csv(DATASET_URL)
df = df.loc[:, ['product_id', 'product_name', 'description', 'list_price']]
df.head(5)

The first five rows of the utilized dataset

Saving the dataset

The code below saves the dataset in a CSV format:

df.to_csv('product.csv', sep=',', index=False, encoding='utf-8')

Splitting the dataset with LangChain

In an RAG workflow, the initial step typically involves text splitting or chunking. This process involves breaking down lengthy text documents into smaller segments. These segments are then embedded, indexed, and stored for subsequent use in information retrieval tasks. Here's the summary of the code below:

CSVLoader is configured to load data from a CSV file named product.csv using the specified delimiter ,.
loader.load() method is called to load data from the CSV file specified in the file_path parameter product.csv.
RecursiveCharacterTextSplitter is configured with a chunk_size of 1,500 characters and a chunk_overlap of 150 characters. This means that the text will be split into chunks of 1,500 characters each, with an overlap of 150 characters between adjacent chunks.
Splitter.create_documents() method is called to create documents from the splits created from the RecursiveCharacterTextSplitter method before.

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = CSVLoader(file_path='product.csv',encoding='utf-8',csv_args={
                'delimiter': ','}) 

data = loader.load() 

Splitter = RecursiveCharacterTextSplitter(chunk_size = 1500, chunk_overlap = 150) # split into chunks of 1500 characters each, with an overlap of 150 characters between adjacent chunks.
 
splits = Splitter.create_documents([datum.page_content for datum in data])

Vector Database: Embedding Storage and Search

The most essential ingredient for the RAG application is a vector database that allows querying questions with indexed documents to retrieve the most relevant document to provide context to the LLM.

In this tutorial, we will leverage Timescale Vector, a vector database built on PostgreSQL tailored for AI applications. Timescale Vector enhances pgvector and optimizes PostgreSQL for storing and querying millions of vector embeddings.

Here's what it offers:

Pgvector, the open-source PostgreSQL extension for vector storage and similarity search.
Enhanced vector search indexes for faster and more accurate similarity search on large-scale vectors using DiskANN-inspired indexing.
Quick time-based vector search through automatic time-based partitioning and indexing.
Familiar SQL interface for querying vector embeddings and relational data.

Benefits:

Simplifies operations by consolidating relational metadata, vector embeddings, and time-series data in a single database.
Leverages PostgreSQL's robust foundation, offering enterprise-grade features like streaming backups, replication, high availability, and row-level security.
Ensures worry-free experience with enterprise-grade security and compliance.

Access Timescale Vector via Timescale’s cloud PostgreSQL platform on AWS. To start, sign up, create a new database, and follow the provided instructions. For more information, refer to the Create your first Timescale service guide.

After signing up, connect to your cloud database service by providing the service URI, which can be found under the Service section on the dashboard.

The Connect to your service page in the Timescale console

The URI will look something like this: postgres://tsdbadmin:<password>@<id>.tsdb.cloud.timescale.com:<port>/tsdb?sslmode=require.

The password can be created by going to Project settings and clicking on Create credentials.

Database connectivity test

The following code verifies the CONNECTION and executes a simple "hello world" query. Successful execution indicates database accessibility.

import psycopg2
CONNECTION = "..."

conn = psycopg2.connect(CONNECTION)
cursor = conn.cursor()
# use the cursor to interact with your database
cursor.execute("SELECT 'hello world'")
print(cursor.fetchone())
('hello world',)

The Services page in the Timescale console with an arrow pointing to the name of object 'content_embeddings'

Retriever

A retriever, more general than a vector store, provides documents based on an unstructured query without the need to store them. While vector stores can serve as retriever backbones, other retriever types exist.

Retrievers take a string query as input and return a list of documents. All retrievers uniformly implement the method get_relevant_documents(), and aget_relevant_documents() , its asynchronous counterpart.

In the code below, a vector store is being used as a retriever with a similarity search type. It selects the first three documents. The search type is dependent on the vector store.

retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retriever.get_relevant_documents("I am looking for Playing cards game") # Will result in 3 most relevant documents related to the query.

Chains

In the tutorial's concluding step, we'll construct our chatbot using the toy dataset. This process involves utilizing Chains, which orchestrate sequences of calls to LLMs, tools, or data preprocessing steps. But before we proceed, let's inform our LLM about its task using prompts.

Prompts

Prompt engineering customizes chatbots from a general to a specialized level. LangChain offers prompt templates that streamline prompt creation by integrating default messages, user input, chat history, and additional conversation context optionally.

The ChatPromptTemplate class accepts a list of MessagePromptTemplate objects. LangChain provides various types of MessagePromptTemplate. The most frequently used ones include:

AIMessagePromptTemplate: generates an AI message
SystemMessagePromptTemplate: produces a system message
HumanMessagePromptTemplate: constructs a human message

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)

instructions = """You are a friendly chatbot capable of answering questions related to products. User's can ask questions about its description,
            and prices. Be polite and redirect conversation specifically to product information when necessary."""

human = """
The context is provided as: {context}
New human question: {question}
"""
prompt = ChatPromptTemplate(
    messages=[
        SystemMessagePromptTemplate.from_template(instructions),
        HumanMessagePromptTemplate.from_template(human), #User query will go here
    ],
    input_variables=['context','question'], # context provided by retriever and question by the user
)

LCEL

The LangChain Expression Language (LCEL) offers a declarative approach to combine chains seamlessly. LCEL simplifies the construction of intricate chains from fundamental components and includes built-in features like streaming, parallelism, and logging. For example:

from langchain.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.llms import Bedrock

llm = Bedrock(model_id=model_id,
client=bedrock,
model_kwargs={'maxTokenCount':512})


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
)
rag_chain.invoke("I am looking for Playing cards game")

Here’s its output.

AIMessage(content='Sure! We have a Jumbo Size Full Deck Playing Cards available. These large poker game cards are perfect for parties, pranks, magic tricks, and comedy accessories. The full card deck is nearly 2x the size of a regular deck, measuring approximately 6 3/4" by 4 3/4" wide. It\'s a great gift for friends and family. Let me know if you would like to know more details or the price.')

As you can see, our model picked the right document and answered the question. By adding memory to the LLM, we can make a chatbot. But that’s for another article!

Conclusion

In this post, we learned how to set up Amazon Bedrock to access top-tier AI models like Amazon Titan and integrated it with LangChain for the RAG application. For our vector database needs, we chose pgvector on Timescale for its enhanced PostgreSQL offering tailored for AI applications. Get 90 days free of pgvector on Timescale and enjoy efficient storage, quick searches, and simplified querying in your AI projects.

Ingest and query in milliseconds, even at terabyte scale.

This post was written by

Haziqa Sajid