From b9fd0f542f82125cf107e3b15a60b61cfe287441 Mon Sep 17 00:00:00 2001
From: Michael Shteyn <5659756+mshteyn@users.noreply.github.com>
Date: Wed, 5 Jun 2024 16:41:44 -0400
Subject: [PATCH 1/4] Update README.md
---
README.md | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index 72aa64e..08c3dbd 100644
--- a/README.md
+++ b/README.md
@@ -55,16 +55,20 @@ Our final database consisted of 190,000 restaurant reviews from over 20,000 inde
## Exploratory Data Analysis

-A visualization of restaurants in the Philadelphia area sorted by how many times they have been reviewed by users through the Google Local API.
+Visualization of restaurants in the Philadelphia area sorted by how many times they have been reviewed by users through the Google Local API.

-A histogram depicting the lenghts of reviews stored in our vector database.
+Histogram depicts the lenghts of reviews stored in our vector database.
## Modeling Approach
Text reviews were embedded as 1024 dimensional vectors using Alibaba's sentence transformer model (GTE-Large v1.5) and stored in a Pinecone vector database. User queries were embedded at runtime and compared to stored embeddings with cosine similarity. The top 5 closest reviews to the user query were retreived from the database and provided the context with which Llama 2 (13B Instruction-tuned) was prompted before generating a response to the user query.
+
+
+Worflow.
+
## Model Evaluation

From 96cec5aefe00dbe9dd53c8fe0515da3bc20b0dd7 Mon Sep 17 00:00:00 2001
From: Michael Shteyn <5659756+mshteyn@users.noreply.github.com>
Date: Wed, 5 Jun 2024 16:42:18 -0400
Subject: [PATCH 2/4] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 08c3dbd..f4a8346 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ identify and filter reviews containing specific mention of food items.
Our final database consisted of 190,000 restaurant reviews from over 20,000 independent restaurants.
## Exploratory Data Analysis
-
+
Visualization of restaurants in the Philadelphia area sorted by how many times they have been reviewed by users through the Google Local API.
From 455d6409ced936af935a4ac5592586630ff9dd1e Mon Sep 17 00:00:00 2001
From: Michael Shteyn <5659756+mshteyn@users.noreply.github.com>
Date: Wed, 5 Jun 2024 16:42:52 -0400
Subject: [PATCH 3/4] Update README.md
---
README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index f4a8346..17af797 100644
--- a/README.md
+++ b/README.md
@@ -53,7 +53,7 @@ identify and filter reviews containing specific mention of food items.
Our final database consisted of 190,000 restaurant reviews from over 20,000 independent restaurants.
## Exploratory Data Analysis
-
+
Visualization of restaurants in the Philadelphia area sorted by how many times they have been reviewed by users through the Google Local API.
@@ -65,7 +65,7 @@ Histogram depicts the lenghts of reviews stored in our vector database.
Text reviews were embedded as 1024 dimensional vectors using Alibaba's sentence transformer model (GTE-Large v1.5) and stored in a Pinecone vector database. User queries were embedded at runtime and compared to stored embeddings with cosine similarity. The top 5 closest reviews to the user query were retreived from the database and provided the context with which Llama 2 (13B Instruction-tuned) was prompted before generating a response to the user query.
-
+
Worflow.
From d50e04a72d1684ea12d6c613cbacd5143e31c212 Mon Sep 17 00:00:00 2001
From: Michael Shteyn <5659756+mshteyn@users.noreply.github.com>
Date: Wed, 5 Jun 2024 16:43:45 -0400
Subject: [PATCH 4/4] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 17af797..d3a14c2 100644
--- a/README.md
+++ b/README.md
@@ -88,7 +88,7 @@ Flavor-Finder achieved an average score of 3.1 out of 4, outperforming the origi
## Challenges
-Significant GPU resources are required to load the necessary components of the model.
+GPU resources are required to perform inference efficient.
Updating the vector database requires access to subscription-based Google API keys which were beyond our budget. We've developed a tool that enables live scrapping of the Google Local API to perform database updates which we have updated within the limits of free use. As a result, our vector database is necessarily dated by the age of the dataset we had access to, containing reviews through 2020.