Guidance for Retail Analytics with Generative AI on AWS

Introduction

A NLQ(Natural Language Query) guidance using Amazon Bedrock, Amazon OpenSearch with RAG technique for Retail Analytics.

This data showcased how we used this guidance to generate SQL statements in analyzing online retail customer behavior data. Use the steps in Next Steps to configure the data in ./assets/data/init_mysql_db.sql

We can then ask questions like:

What are the most frequently purchased products in a specific category?
How often do customers abandon their shopping carts before completing a purchase?
What is the average order value on the website?
What is the average time spent by a user on the website before making a purchase?
How many customers return to make a second purchase within a specific time frame?

Overview

This is a comprehensive framework designed to enable Generative BI capabilities on customized data sources (RDS/Redshift) hosted on AWS. It offers the following key features:

Text-to-SQL functionality for querying customized data sources using natural language.
User-friendly interface for adding, editing, and managing data sources, tables, and column descriptions.
Performance enhancement through the integration of historical question-answer ranking and entity recognition.
Customize business information, including entity information, formulas, SQL samples, and analysis ideas for complex business problems.
Add agent task splitting function to handle complex attribution analysis problems.
Intuitive question-answering UI that provides insights into the underlying Text-to-SQL mechanism.
Simple agent design interface for handling complex queries through a conversational approach.

Cost

As of May, 2024, the cost for running this Guidance with the default settings in the us-west-2 is approximately $1337.8 per month for processing 2000 requests.

Sample Cost Table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS service	Dimensions	Cost [USD] per Month
Amazon ECS	v0.75 CPU 5GB	$804.1
Amazon DynamoDB	25 provisioned write & read capacity units per month	$ 14.04
Amazon Bedrock	2000 requests per month, with each request consuming 10000 input tokens and 1000 output tokens	$ 416.00
Amazon OpenSearch Service	1 domain with m5.large.search	$ 103.66

Prerequisites

Operating System

“CDK are optimized to best work to be initiated on <Amazon Linux 2023 AMI>. Deployment in another OS may require additional steps.”

AWS account requirements

VPC
IAM role with specific permissions
Amazon Bedrock
Amazon ECS
Amazon DynamoDB
Amazon Cognito
Amazon OpenSearch Service
Amazon Elastic Load Balancing
Amazon SageMaker (Optional, if you need customized models to be deployed)
Amazon Secrets Manager

Supported Regions

us-west-2, us-east-2, us-east-1, ap-south-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, eu-central-1, eu-west-1, eu-west-3, or any other region that supports the services (bedrock) used in the Guidance.

Deployment Steps

1. Prepare CDK Pre-requisites

Please follow the instructions in the CDK Workshop to install the CDK toolkit. Make sure your environment have the authorization to create the resources.

2. Set a password for the Streamlit Web UI

The default password is [Empty] for Streamlit Web UI. If you need to set a password for the Streamlit Web UI, you can update the password in the application/config_files/stauth_config.yaml

for example

credentials:
  usernames:
    jsmith:
      email: jsmith@gmail.com
      name: John Smith
      password: XXXXXX # To be replaced with hashed password
    rbriggs:
      email: rbriggs@gmail.com
      name: Rebecca Briggs
      password: XXXXXX # To be replaced with hashed password
cookie:
  expiry_days: 30
  key: random_signature_key # Must be string
  name: random_cookie_name
preauthorized:
  emails:
  - melsby@gmail.com

change the password 'XXXXXX' to hashed password

Use the python code below to generate XXXXXX. We need python 3.8 and up to run the code below:

from streamlit_authenticator.utilities.hasher import Hasher
hashed_passwords = Hasher(['password123']).generate()

3. Deploy the CDK Stack

For global regions, execute the following commands:

Navigate to the CDK project directory:

cd generative-bi-using-rag/source/resources

Deploy the CDK stack, change the region to your own region if needed, for example, us-west-2, us-east-1, etc.:

export AWS_DEFAULT_REGION=us-west-1
cdk bootstrap
cdk deploy GenBiMainStack

You will see the following when deployed succeeded

GenBiMainStack.AOSDomainEndpoint = XXXXX.us-west-2.es.amazonaws.com
GenBiMainStack.APIEndpoint = XXXXX.us-west-2.elb.amazonaws.com
GenBiMainStack.FrontendEndpoint = XXXXX.us-west-2.elb.amazonaws.com
GenBiMainStack.StreamlitEndpoint = XXXXX.us-west-2.elb.amazonaws.com

Deployment Validation

Please go to FrontendEndpoint and StreamlitEndpoint to see if the endpoints function well.

Running the Guidance

After the CDK stack is deployed, wait around 40 minutes for the initialization to complete. Then, open the Admin Web UI in your browser: https://your-public-dns

Create Data Connection Mainly introduces how to connect to the database.
- Select the Data_Connection_Management page.
- Click `Test Connection` to test the database connection. - Click `Add Connection` to create a new database connection.
Background

After creating the database connection, you can select different tables from the database to create different Data Profiles.
Create Data Profile
- Click Data_Profile_Management.
- Fill in the `Data Profile Name`. - Select the database connection. - Select the relevant data table. - Click `Create Data Profile`.
Create Schema Description After creating a Data Profile, to help the large language model better understand the data tables, it is necessary to add relevant descriptions to the schema of the data tables. The more detailed the description of the schema, the better. For some fields, try to provide as much detail as possible, including example data, synonyms, operations used in calculations, etc. For example:
- Time field: provide the time format.
- Boolean field: indicate whether the data is 1/0 or True/False. To create a Schema Description:
- Click Schema_Description_Management.
Table annotation: description of the data table.
- Name: name of the field (automatically extracted by the system).
- Datatype: description of the data type of the field (automatically extracted by the system).
- Comment: explanation of the field, interpretation of the field's meaning.
- Annotation: synonyms or other notes for the field, example data for the field (including common data examples), and instructions for operations on the field, such as using sum, count, etc.
Modify Prompts For different industries, it is necessary to adjust the Prompts accordingly.
- Intent Prompt
  - Mainly for intent recognition. Currently, there are four main types of intents: knowledge base intent, data query intent, refusal query intent, and chain of thought intent.
- Knowledge Prompt
  - When a knowledge base intent is recognized, it answers related knowledge base questions. Currently, this is implemented through PE, and later this query can be connected to a knowledge base solution.
- Text2SQL Prompt
  - The key Prompt for Text2SQL.
- Data Summary Prompt
  - The main Prompt for data summarization, providing data to the LLM to generate summary text.
- Data Visualization Prompt
  - Using the LLM to select appropriate visualization charts.
- Agent Task Prompt
  - Using the understanding ability of the LLM combined with business knowledge to break down complex business problems into multiple dimensional sub-tasks.
- Agent Data Analyse Prompt
  - For complex business problems, after breaking them down into multiple sub-tasks, each sub-task involves data queries. After the queries are completed, the data and questions are given to the LLM for data analysis.
- Suggest Question Prompt
  - Generating suggested questions based on the user's query.
Add Business Knowledge Since the large language model may lack relevant industry knowledge, it is necessary to supplement it with relevant business knowledge.

Add business entity information:
- Click Entity_Management.
- Entity: mainly write the name of the entity.
- Comment: mainly write the explanation of the entity, such as which field the entity needs to query, what the value of the entity is in the table, the calculation logic of specific formulas, explanations of business names, etc.
Add sample question SQL information:
- Click Index_Management.
- Question: mainly the user's question.
- Answer (SQL): mainly the query SQL for the question.
Add analysis ideas for complex questions:
- Click Agent_Cot_Management.
- Query: user's complex business question.
- Comment: analysis ideas for the question, in dict structure, where the key is task_1, task_2, ..., and the value is the sub-tasks for the question.

Cleanup

Delete the CDK stack:

cdk destroy GenBiMainStack

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
application		application
assets		assets
report-front-end		report-front-end
source/resources		source/resources
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guidance for Retail Analytics with Generative AI on AWS

Introduction

Table of Content

Overview

Cost

Sample Cost Table

Prerequisites

Operating System

AWS account requirements

Supported Regions

Deployment Steps

1. Prepare CDK Pre-requisites

2. Set a password for the Streamlit Web UI

3. Deploy the CDK Stack

Deployment Validation

Running the Guidance

Cleanup

About

Releases

Packages

Contributors 3

Languages

License

aws-solutions-library-samples/guidance-for-retail-analytics-using-generative-ai-on-aws

Folders and files

Latest commit

History

Repository files navigation

Guidance for Retail Analytics with Generative AI on AWS

Introduction

Table of Content

Overview

Cost

Sample Cost Table

Prerequisites

Operating System

AWS account requirements

Supported Regions

Deployment Steps

1. Prepare CDK Pre-requisites

2. Set a password for the Streamlit Web UI

3. Deploy the CDK Stack

Deployment Validation

Running the Guidance

Cleanup

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages