Many organizations are seeking ways to realize the value of their knowledge assets using interactive chat tools connected to document repositories. Cloud-based generative AI services such as Amazon Bedrock and semantic search capabilities in services such as Amazon OpenSearch Service have put this goal increasingly within reach. For organizations working within AWS GovCloud, specialized solutions may be required, and many published reference architectures do not apply to such use cases.
This repository enables the user to setup, run, and cleanup a chatbot in an AWS GovCloud account. The chatbot provides a web-based interface for a user to ask questions and receive answers about information contained in a body of documents that can be in.docx, .pdf, or markdown (.md) format.
The user uploads a set of documents into an Amazon S3 bucket and opens a browser-based user interface to ask questions. The user receives answers based on the information in the document store, and references to the source documents to enable further exploration. An example screenshot of the user interface is below.
Completing this demonstration from setup to cleanup will take approximately 1.5 hours using the development and test deployment option. Using the production-like deployment option will take approximately 2.5 hours.
- Demonstrate how a generative AI-based chatbot can be deployed in AWS GovCloud
- Support commonly used document types: .pdf, .docx, and .md
- Use pay-as-you-go managed services such as Amazon Bedrock to minimize total cost and burden of a deployed solution
- Expose demonstration code in Amazon SageMaker Studio notebooks to enable understanding and ability to modify the code
- Demonstrate an advanced Retrieval Augmented Generation (RAG) technique using document summaries to filter search results to documents focused on the topic of the question presented by the user
- Demonstrate adjusting RAG search results based on document age to prefer newer content
- Demonstrate filtering harmful content from questions and answers using Guardrails for Amazon Bedrock
The architecture makes use of the following major components in AWS GovCloud:
- Amazon S3 bucket to store the documents used as the knowledge repository
- Amazon OpenSearch domain to host a searchable index of the document repository with semantic search capability
- Foundation Model on Amazon Bedrock to serve requests to answer user questions based on OpenSearch content matches
- Amazon SageMaker Studio to host demonstration notebooks and web user interface in development and test deployment
- Elastic Container Service (ECS) to host the web user interface in production-like deployment
- A Virtual Private Cloud (VPC) to support the OpenSearch, SageMaker Studio and ECS components
- AWS CloudFormation to build the cloud infrastructure used by the demonstration
The diagram below illustrates the high-level architecture.
- The Hugging Face distilroberta-v1 plugin is installed in OpenSearch by a Lambda function or a SageMaker Studio notebook
- New documents in the S3 bucket are summarized using Amazon Bedrock and indexed into OpenSearch by a Lambda function or a SageMaker Studio notebook
- New documents in the S3 bucket have their full text indexed into OpenSearch by a Lambda function or a SageMaker Studio notebook
- New documents in the S3 bucket have their date indexed into OpenSearch by a Lambda function or a SageMaker Studio notebook
- A user accesses the Streamlit web user interface, which runs on Elastic Container Service or in SageMaker Studio
- The user’s question is searched against the document summary index in OpenSearch to identify the most relevant documents. The user’s question is searched against the full text index in OpenSearch. Full text results from documents that have summary matches are promoted.
- The most relevant search results are provided as context along with the user’s question to a foundation model on Amazon Bedrock. The foundation model transforms the search results into a concise, meaningful answer. This answer along with document references are presented back to the user in the Streamlit web user interface.
This chatbot uses the semantic search capability of Amazon OpenSearch to find relevant content based on the meaning of the user's question rather than relying on a keyword match approach. This demonstration uses the Hugging Face distilroberta-v1 plugin to provide the embedding capability within OpenSearch. Semantic search typically provides superior results over lexical search, since the user may not use the exact same terminology or keywords as those contained in the source documents.
The search component includes a document summary feature as an advanced Retrieval Augmented Generation (RAG) technique. This can improve the relevance of answers by comparing the question posed by the user to summaries of the documents in the document base, and ranking the results of full-text search from those documents with relevant summaries higher. This approach can help reduce occurrences of search results from mentions in less relevant or authoritative documents. The feature can be enabled or disabled by setting the parameter use_summary
in the file /containers/streamlit/opensearch_retrieve_helper.py. More details are in the Tunable parameters section below. The diagram below illustrates how the document summary process can improve search results.
For many use cases, documents that are newer are more relevant, since older documents may contain out-of-date information. This chatbot includes a search component that can lower the likelihood of older documents appearing in search results based on configurable parameters.
To enable this technique, a date must be assigned to each document as it is ingested. For this demonstration, dates for .pdf and .docx files are set according to each document’s metadata creation date, and for .md files the date is set according to the last modified date in S3. Other approaches of determining document date may be more appropriate for your use case. The date of each Document is set in the index_documents_helper.py file in the /containers/lambda_index folder. Here each document’s date is stored in an OpenSearch index which can later be queried.
To enable age-based search, the parameter use_date must be set to True and the years_until_no_value must be set to a number of years age where the document no longer has value. These parameters are located in the file opensearch_retrieve_helper in the /containers/streamlit folder for production-like deployment and the file 3_search_indices.ipynb in the /sagemaker_studio/notebooks folder for development and test.
Based on the above parameters, each full text search relevance score is adjusted in proportion to the document’s age until the years_until_no_value
value is reached, at which point the relevance score is zero. Full text hits with lower scores are less likely to appear than those with higher scores. All full text search hits with adjusted relevance scores below the value set in the full_text_hit_score_threshold
parameter are ignored
The chatbot includes a harmful content filter enabled by Guardrails for Amazon Bedrock. This provides configurable thresholds to filter content across hate, insults, sexual, violence, misconduct, and prompt attack. If a user asks a question, or an answer is retrieved that exceeds the guardrail threshold in any of these categories, the chatbot answers "Sorry, I cannot answer this question." or other block message configured in the CloudFormation stack parameter BedrockGuardrailsBlockMessage
.
The CloudFormation template deploys a guardrail and a guardrail version. The chat.py file in the /containers/streamlit folder gets the guardrail ID and version from the CloudFormation stack and applies those in every request to Bedrock.
Two deployment options are available. One must be chosen using the DeploymentMode
parameter in CloudFormation at stack creation.
-
Development and testing using SageMaker Studio notebooks to handle OpenSearch setup and index management, and web user interface hosted in the SageMaker Studio environment that can be accessed by the local user - this is the default option
-
Production-like using Lambda functions to handle OpenSearch setup and index management, and web user interface hosted in Elastic Container Service with load balancer and internet gateway
Both deployment options use a common CloudFormation script that creates the following key resources:
- VPC with subnets, endpoints and security groups
- OpenSearch domain
- S3 bucket to contain the files in the document base
- IAM roles and permissions needed to perform the demonstration
The development and testing deployment provides the ability to see the code running in a SageMaker Studio environment to understand how it works, try modifications and see the results in real time. After confirming the changes work as expected, files in the /containers folder of the code repository can be updated for building container images in the production-like deployment outlined in the section above.
Container build is not required for the development and testing deployment.
1. Create the CloudFormation stack using the template located in the /cloudformation folder in the code repository.
-
Important: Name the stack
chatbot-demo
. -
Set the CloudFormation stack parameter
DeploymentMode
to DevTest -
It will take 20-30 minutes for the stack to complete.
2. Request access to the Titan Text Express model in Amazon Bedrock - If you haven't previously requested access to the Titan Text Express foundation model in Amazon Bedrock for your account, you will need to do that using the instructions here. If you want to use Llama 3 models in place of Titan Text Express, you will need to also request those models using the same process.
3. Create the SageMaker domain – After the stack is complete, run the script create_sagemaker_domain.sh in the /sagemaker_studio folder of the code repository to create a SageMaker domain. CloudShell in the AWS console is a useful tool to run such a command. Download either the entire repository or the script at the link above to your local machine. In CloudShell, click Actions -> Upload file and select the file from your local machine. After the file is uploaded, at the CloudShell prompt, enter the following command:
sh create_sagemaker_domain.sh
Go to the SageMaker console and click Domains. Here you'll see the new domain chatbot-demo-domain listed. Wait until its status changes from Pending to InService before moving to the next step.
4. Create a user in the SageMaker domain and launch Studio – After the SageMaker domain is created and its status is InService, use the SageMaker console Domains page to create a user in the domain by clicking the domain named chatbot-demo-domain, then in the Domain Details page User profile tab that appears, click the Add user button. In the Add user profile General settings page that appears, keep the default settings and click Next. In the Studio settings Jupyter Lab version page that appears keep the default settings and click Submit. Launch SageMaker Studio with the user you created by clicking the Launch button and then Studio. It will take a few minutes for Studio to start.
5. Clone the repository into SageMaker Studio - After SageMaker Studio launches, click the Git icon () on the far left panel. Click the Clone a Repository button. In the Clone Git Repository window, enter the URL for this repository:
https://github.com/aws-samples/rag-chatbot-with-bedrock-opensearch-and-document-summaries-in-govcloud.git
Leave the Project directory to clone into field blank and click the Clone button.
6. Copy the SageMaker files from the code repository to a folder – In SageMaker Studio, open a terminal window by selecting File -> New -> Terminal from the main menu. In the terminal, enter the following command to copy the files needed from the repository folders into a new folder called /chatbot:
sh rag-chatbot-with-bedrock-opensearch-and-document-summaries-in-govcloud/sagemaker_studio/copy_files_to_sagemaker_studio.sh
In the SageMaker Studio file browser pane on the left, navigate to the root SageMaker Studio folder by clicking the folder icon () immediately to the left of the repository folder name /rag-chatbot-with-bedrock-opensearch-and-document-summaries-in-govcloud. In the file navigator root, find the folder /chatbot and open it by double clicking its name. Here in the /chatbot folder you'll see the files neded to run the chatbot in SageMaker Studio.
7. Provide document base files – In the S3 console, find the bucket with the name chatbot-demo-data-your AWS account ID and upload a few document files in .pdf, .docx, or .md format into the root of the bucket. These will provide the source material for the chatbot. Keep the number and size of document files to a reasonable number, otherwise it will take a long time to index the files in the next step.
8. Run notebooks in Studio – Open the first notebook, 1_create_indices.ipynb by double clicking its name in the SageMaker Studio file navigator. In the Set up notebook environment window, leave all settings at default and click the Select button. Wait a few minutes until the Starting notebook kernel... message clears. Run all cells in notebook 1 to create the OpenSearch indices. Once all cells in notebook 1 are complete, open the second notebook, 2_populate_indices_all.ipynb and run all cells in it to populate the OpenSearch indices with data from the files you uploaded to S3. Optionally, run notebook 3 to present questions and get responses. Follow the instructions and prerequisites in each notebook.
9. Ask questions in the web user interface hosted from SageMaker Studio – To launch the user interface within the Studio environment in single-user mode, open notebook 4, 4_run_web_user_interface.ipynb and follow the instructions in it.
The production-like deployment demonstrates using containers in Elastic Container Service (ECS) and Lambda to support many users and an evolving document base.
A Lambda function runs as a custom resource from the CloudFormation stack to initialize the OpenSearch cluster with the embedding model.
The document ingestion pipeline uses S3 event notifications to Simple Queue Service (SQS) to a Lambda function that handles documents being added to or removed from S3 by updating the OpenSearch indices based on S3 events. The use of SQS limits the concurrency of the OpenSearch Lambda indexing function to no more than two instances, which minimizes the potential for exceeding the capacity of the OpenSearch cluster. The diagram below illustrates the ingestion pipeline architecture.
The web front end runs in a container on ECS Fargate behind an Application Load Balancer (ALB).
1. Build the required container images into Elastic Container Registry (ECR). A readme and a CloudFormation template are in the /containers folder of the repository to create AWS CodeBuild and ECR resources to perform the container build. Or you can choose to use your own process to build the images and store them in ECR.
-
The Streamlit user interface container is built using files in the code repository folder /containers/streamlit.
-
The Lambda function container for OpenSearch setup is built using files in the code repository folder /containers/lambda_setup_opensearch.
-
The Lambda function container for OpenSearch indexing is built using files in the code repository folder /containers/lambda_index.
2. Request access to the Titan Text Express model in Amazon Bedrock - If you haven't previously requested access to the Titan Text Express foundation model in Amazon Bedrock, you will need to do that using the instructions here. If you want to use Llama 3 models in place of Titan Text Express, you will need to also request those models using the same process.
3. Create the CloudFormation stack located in the /cloudformation folder in the code repository.
-
Impportant: Name the stack
chatbot-demo
. -
Set the CloudFormation stack parameters for the ECR repository names to point to the container images built in the step above:
LambdaIndexEcrRepositoryName
,LambdaOpenSearchSetupEcrRepositoryName
,StreamlitImageEcrRepositoryName
-
Set the CloudFormation stack parameter
DeploymentMode
to Prod
4. Upload document base files to S3 – After the stack is complete, drop sample document base files into the S3 bucket created by the stack and wait several minutes for file indexing in OpenSearch to complete. You can monitor the progress of indexing by monitoring the CloudWatch logs for the lambda function chatbot_prod_lambda_index
, or by monitoring in the web user interface using the document index status page desribed here.
5. Locate the URL of the ALB for the ECS service in the ECS console. Open the URL in a browser on port 8501 to open the web user interface and ask questions.
Amazon Bedrock provides a choice of models in the us-gov-west-1 region. The following options are available for Q&A in the chatbot:
-
Amazon Titan Text G1 - Express (default) - A good choice for many use cases. Provides concise answers cost effectively. Supports FedRAMP High use cases. More information here.
-
Meta Llama 3 8B Instruct - A good choice where more expressive answers are desired and the use case falls within the Meta Llama 3 Acceptable Use Policy. More information here.
-
Meta Llama 3 40B Instruct - Similar to Llama 3 8B Instruct for Q&A use cases. More information here.
See the RAG search configuration section below for details on how to select and configure the Q&A foundation model.
Several tunable parameters can be changed to best align with the use case. Default values will work in most cases.
FiltersConfig
in theBedrockGuardrail
resource sets the strength for each type of content filter. Additional information on the Bedrock Guardrails filter config is available in the AWS documentation at https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-bedrock-guardrail-contentfilterconfig.html- Parameter
BedrockGuardrailsBlockMessage
sets the message given to the user if Bedrock Guardrails blocks the input or output.
-
Text Gen
section – These parameters select the foundation model and parameters used to present answers to the users based on the document context retrieved through OpenSearch. Titan Text Express is the default foundation model for Q&A. Conservative temperature and topP values are set by default to stay close to the content of the documents provided. A Llama 3 model may optionally be selected in this section by uncommenting the line for the desired model and commenting other models. Additional information on setting model parameters is available in the AWS documentation at https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html -
MaxLengthRagText
– Sets the maximum length of context provided to the Amazon Bedrock foundation model from the document context retrieved through OpenSearch. Any context exceeding this length is truncated. Since context is sorted in reverse order of relevance score, the least relevant context is most likely to be truncated. This value should be set to no more than the number of characters supported by the context length of the selected foundation model. -
FullTextHitScoreThreshold
– Defines the percentile cut-off of full text hit scores that will be included in the result. Any full text results with a relevance score less than this value times the highest result’s relevance score are excluded from the context. -
IncludeTextInReferences
- When set to True the text of material retrieved in each search will be shown with references. When set to False only the document name and page or section will be shown. Default is False. -
UseSummary
– If set to True the relevance of all the text in a document from the document summary index is used as part of the overall relevance score for chunks. If set to False the document summary index is not used, and only the full text relevance scores are used to determine the relevance of chunks. Default is True. -
SummaryWeightOverFullText
– Sets the weighting of document summary result vs. full text result relevance scores in calculating the overall relevance score of a particular chunk. Higher values weight the document summary relevance more. Lower values weight the full text summary relevance more. -
SummaryHitScoreThreshold
– Sets the percentage value used as a cut-off for relevance scores retrieved from the OpenSearch document summary index. Any document summary results with a relevance score less than this value times the highest document summary result’s relevance score are excluded from the context. -
UseDate
- If set to True the age of each document will be used to adjust the relevance of full text hit scores downward as they age until theYearsUntilNoValue
age is reached, at which point the relevance score will become zero. If set to False document date is not used to determine document relevance. Default is True. -
YearsUntilNoValue
- Sets the number of years each document may age until it has no value as described above. -
S3 Key to Weblink Conversion
section - These parameters are used for the feature to convert .md file references in search results to corresponding web pages. Refer to the section Markdown S3 key to weblink reference feature in this document for more information.
text_gen_config
– This is used to set the configuration for Titan Text Express as the foundation model used to summarize documents used in the document summary index. Conservative temperature and topP values are set by default to stay close to the original content. Additional information on these parameters is available in the AWS documentation at https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-text.html
-
max_file_size
– Sets the maximum file size that will be indexed into OpenSearch. Use this to filter out documents that are excessively large. Higher values may exceed the 15 minute maximum run time for Lambda. -
max_summary_length
– Sets the maximum document summary size in characters. Documents are progressively summarized to fit within this limit. Smaller values produce more focused summaries. Larger values require less time to produce.
The indexing status of the documents in the S3 bucket created by the CloudFormation stack can be viewed by selecting Index status in the side menu of the web user interface.
This shows a list of all the documents in the S3 bucket and the number of summary and full text index chunks in OpenSearch. If zero chunks are shown then indexing has likely not yet begun for that document. The date of each document as recorded in the date index can also be viewed by scrolling to the right.
The screenshot below shows an example of the feature.
The CloudFormation stack can deploy a CloudWatch Logs group and a CloudWatch dashboard to provide observability on Bedrock invocations.
This will show the number of requests made using Bedrock Guardrails, the number of blocked requests, and the number of Bedrock tokens consumed. These resources will be deployed if the CloudFormation stack parameter CreateBedrockInvocationLogs
is set to yes.
When this option is enabled, the stack deploys a Lambda custom resource that will update the Bedrock invocation logging setting for the account with a new IAM role and CloudWatch Logs group. This will overwrite any existing Bedrock invocation logging setting in the account. For this reason, the default value for the CreateBedrockInvocationLogs
parameter is no. Set the parameter to yes if you are sure you will not interfere with existing Bedrock invocation logging in the account.
The screenshot below shows an example of the CloudWatch dashboard.
By enabling Bedrock invocation logging, you will be able to see the details of all Bedrock requests, including the content of the request and the response. These are available in the log group created by the CloudFormation stack which can be found by searching CloudWatch log groups for bedrock-invocation
. For Q&A requests, the Bedrock Guardrail status is also included in the logs. Summarization requests made to Bedrock during document ingestion do not include Guardrails actions as these are made without a Guardrail.
This feature is for special use cases where markdown formatted source documents in S3 are also the source for a web site accessible to users. By using this feature, the references provided by the Streamlit user interface can be configured as clickable links to the target web pages. This can help provide a better user experience by enabling one-click access to web page references.
To use this feature, set the parameters below in the file /containers/streamlit/rag_search.cfg
-
use_s3_key_to_weblink_conversion
– Sets the feature on or off based on the Boolean value True or False. When set to True, the feature is enabled. -
s3_key_prefix_to_remove
– Sets the portion of the S3 prefix to remove from markdown file S3 keys. -
weblink_prefix
– Sets the prefix to add to the S3 key value after thes3_key_to_remove
is removed. -
s3_key_suffix_to_remove
– Sets the S3 key suffix to remove. -
weblink_suffix
– Sets the suffix to add to the S3 key value after thes3_key_suffix_to_remove
is removed.
To clean up, perform the following steps:
1. Empty the S3 document bucket – In the S3 console, select the bucket created by the CloudFormation stack, click the Empty button and confirm.
2. In deveopment and testing mode the SageMaker domain must be deleted with the following procedure:
-
Shut down Studio – If SageMaker Studio is running for a user created under the demonstration domain, shut it down by selecting Shut Down under the File menu and then selecting Shutdown All.
-
Delete applications – In the SageMaker console, select the domain, then select the user. If an application is still running for the user, delete it and wait for all pending application deletions to complete.
-
Delete user – In the SageMaker console, delete the user created under the demonstration domain and wait for the deletion to complete.
-
Delete SageMaker domain
-
Run the script delete_sagemaker_domain.sh in the sagemaker_studio folder of the code repository to delete the SageMaker domain. CloudShell in the AWS console is a useful tool to run such a command.
-
In the SageMaker console, check the status of the domain. Wait until the status of the domain changes to Deleted.
-
3. Delete the stack – In the CloudFormation console, select the chatbot-demo stack, choose Delete and confirm. Deletion will take several minutes.
4. For production-like deployment, delete ECR and any other resources used for container build - If you used the provided CloudFormation template to create CodeBuild and ECR repositores, follow the cleanup instructions in the container readme. Otherwise, remove container images and any other resources you created.
This repository is intended for use in the AWS us-gov-west-1 region.
The code in this repository has also worked in the us-east-1 region. Regions and partitions are referred by parameters in code to help enable region portability. However, testing in regions outside us-gov-west-1 is limited. When deploying in development and test mode use SageMaker Studio Classic.
- Jason Stollings, Senior Solutions Architect
Streamlit in SageMaker Studio portion adapted from the AWS Machine Learning Blog Build Streamlit apps in Amazon SageMaker Studio at https://aws.amazon.com/blogs/machine-learning/build-streamlit-apps-in-amazon-sagemaker-studio/
Semantic search in OpenSearch portion adapted from the GitHub repository aws-samples/semantic-search-with-amazon-opensearch at https://github.com/aws-samples/semantic-search-with-amazon-opensearch
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.