Skip to content

Commit

Permalink
Merge pull request #33 from merefield/semantic_search
Browse files Browse the repository at this point in the history
FEATURE: Semantic Search
  • Loading branch information
merefield authored Aug 28, 2023
2 parents 5d8c2c4 + 281ded4 commit 262a0a4
Show file tree
Hide file tree
Showing 23 changed files with 548 additions and 83 deletions.
9 changes: 9 additions & 0 deletions .github/workflows/plugin-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@ jobs:
sudo -E -u postgres script/start_test_db.rb
sudo -u postgres psql -c "CREATE ROLE $PGUSER LOGIN SUPERUSER PASSWORD '$PGPASSWORD';"
- name: Install pg_embeddings
run: |
sudo apt-get update
sudo apt-get -y install -y postgresql-server-dev-13
git clone https://github.com/neondatabase/pg_embedding.git
cd pg_embedding
make PG_CONFIG=/usr/lib/postgresql/13/bin/pg_config
make PG_CONFIG=/usr/lib/postgresql/13/bin/pg_config install
- name: Bundler cache
uses: actions/cache@v3
with:
Expand Down
18 changes: 18 additions & 0 deletions app/jobs/regular/chatbot_post_embedding_delete_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

# Job is triggered on a Post destruction.
class ::Jobs::ChatbotPostEmbeddingDeleteJob < Jobs::Base
sidekiq_options retry: false

def execute(opts)
begin
post_id = opts[:id]

::DiscourseChatbot.progress_debug_message("101. Deleting a Post Embedding for Post id: #{post_id}")

::DiscourseChatbot::PostEmbedding.find_by(post_id: post_id).destroy!
rescue => e
Rails.logger.error ("OpenAIBot Post Embedding: There was a problem, but will retry til limit: #{e}")
end
end
end
20 changes: 20 additions & 0 deletions app/jobs/regular/chatbot_post_embedding_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# frozen_string_literal: true

# Job is triggered on an update to a Post.
class ::Jobs::ChatbotPostEmbeddingJob < Jobs::Base
sidekiq_options retry: 5, dead: false

def execute(opts)
begin
post_id = opts[:id]

::DiscourseChatbot.progress_debug_message("100. Creating/updating a Post Embedding for Post id: #{post_id}")

post_embedding = ::DiscourseChatbot::PostEmbeddingProcess.new

post_embedding.upsert_embedding(post_id)
rescue => e
Rails.logger.error ("OpenAIBot Post Embedding: There was a problem, but will retry til limit: #{e}")
end
end
end
7 changes: 7 additions & 0 deletions app/models/embedding.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# frozen_string_literal: true

class ::DiscourseChatbot::PostEmbedding < ActiveRecord::Base
self.table_name = 'chatbot_post_embeddings'

validates :post_id, presence: true, uniqueness: true
end
89 changes: 89 additions & 0 deletions config/locales/server.en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,95 @@ en:
title: "The subject of this conversation is %{topic_title}"
first_post: "The first thing someone said was %{username} who said %{raw}"
post: "%{username} said %{raw}"
function:
calculator:
description: |
Useful for getting the result of a math expression. It is a general purpose calculator. It works with Ruby expressions.
You can retrieve the current date from it too and using the core Ruby Time method to calculate dates.
The input to this tool should be a valid mathematical expression that could be executed by the base Ruby programming language with no extensions.
Be certain to prefix any functions with 'Math.'
Usage:
Action Input: 1 + 1
Action Input: 3 * 2 / 4
Action Input: 9 - 7
Action Input: Time.now - 2 * 24 * 60 * 60
Action Input: Math.cbrt(13) + Math.cbrt(12)
Action Input: Math.sqrt(8)
Action Input: (4.1 + 2.3) / (2.0 - 5.6) * 3
parameters:
input: the mathematical expression you need to process and get the answer to. Make sure it is Ruby compatible.
error: "'%{parameter}' is an invalid mathematical expression, make sure if you are trying to calculate dates use Ruby Time class"
forum_search:
description: |
Search the local forum for information that may help you answer the question. Especially useful when the forum specialises in the subject matter of the query.
Searching the local forum is preferable to searching google or the internet and should be considered higher priority. It is quicker and cheaper.
Input should be a search query. You can optionally also specify the number of posts you wish returned from your query.
Outputs text from the Post and a url link to it you can provide the user. When presenting the url in your reply, do not embed in an anchor, just write the straight link.
parameters:
query: "search query for looking up information on the forum"
number_of_posts: "specify the number of posts you want returned from your query"
answer_summary: "The top %{number_of_posts} posts on the forum related to this query are, best match first:\n\n"
answer: "Number %{rank}: the post is at this web address: %{url}, it was written by '%{username}' on %{date} and the text is '%{raw}'.\n\n"
error: "'%{query}': my search for this on the forum failed."
google_search:
description: |
A wrapper around Google Search.
Useful for when you need to answer questions about current events.
Always one of the first options when you need to find information on internet.
Input should be a search query.
parameters:
query: "search query for looking up information on the internet"
error: "%{query}: my search for this on the internet failed."
news:
description: |
A wrapper around the News API.
Useful for when you need to answer questions about current events in the news, current events or affairs.
Input should be a search query and a date from which to search news, so if the request is today, the search should be for todays date
parameters:
query: "query string for searching current news and events"
start_date: "start date from which to search for news in format YYYY-MM-DD"
answer: "The latest news about this is: "
error: "ERROR: Had trouble retrieving the news!"
stock_data:
description: |
An API for MarketStack stock data. You need to call it using the stock ticker. You can optionally also provide a specific date.
parameters:
ticker: "ticker for share or stock query"
date: "date for data in format YYYY-MM-DD"
answer: "Ticker %{ticker} had a day close of %{close} on %{date}, with a high of %{high} and a low of %{low}"
error: "ERROR: Had trouble retrieving information from Market Stack for stock market information!"
wikipedia:
description: |
A wrapper around Wikipedia.
Useful for when you need to answer general questions about
people, places, companies, facts, historical events, or other subjects.
Input should be a search query
parameters:
query: "query string for wikipedia search"
answer: "The relevant wikipedia page has the following summary: '%{summary}' and the article can be found at this url link: %{url}"
error: "ERROR: Had trouble retrieving information from Wikipedia!"
agent:
handle_function_call:
answer: "The answer is %{result}."
call_function:
error: "There was something wrong with your function arguments"
final_thought_answer:
opener: "To answer the question I will use these step by step instructions.\n\n"
thought_declaration: "I will use the %{function_name} function to calculate the answer with arguments %{arguments}.\n\n"
final_thought: "%{thoughts} Based on the above, I will now answer the question, this message will only be seen by me so answer with the assumption with that the user has not seen this message."

errors:
general: "Sorry, I'm not well right now. Lets talk some other time. Meanwhile, please ask the admin to check the logs, thank you!"
retries: "I've tried working out a response for you several times, but ultimately failed. Please contact the admin if this persists, thank you!"
Expand Down
6 changes: 4 additions & 2 deletions config/settings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,13 @@ plugins:
default: gpt-3.5-turbo
choices:
- gpt-3.5-turbo
- gpt-3.5-turbo-16k
- gpt-3.5-turbo-0613
- gpt-3.5-turbo-16k
- gpt-3.5-turbo-16k-0613
- gpt-4
- gpt-4-32k
- gpt-4-0613
- gpt-4-32k
- gpt-4-32k-0613
chatbot_reply_job_time_delay:
client: false
default: 3
Expand Down
18 changes: 18 additions & 0 deletions db/migrate/20230820010101_enable_embedding_extension.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# frozen_string_literal: true

class EnableEmbeddingExtension < ActiveRecord::Migration[7.0]
def change
begin
enable_extension :embedding
rescue Exception => e
if DB.query_single("SELECT 1 FROM pg_available_extensions WHERE name = 'embedding';").empty?
STDERR.puts "----------------------------DISCOURSE CHATBOT ERROR----------------------------------"
STDERR.puts " Discourse Chatbot now requires the embedding extension on the PostgreSQL database."
STDERR.puts " Run a `./launcher rebuild app` to fix it on a standard install."
STDERR.puts " Alternatively, you can remove Discourse Chatbot to rebuild."
STDERR.puts "----------------------------DISCOURSE CHATBOT ERROR----------------------------------"
end
raise e
end
end
end
11 changes: 11 additions & 0 deletions db/migrate/20230820010103_create_chatbot_embeddings_table.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# frozen_string_literal: true

class CreateChatbotEmbeddingsTable < ActiveRecord::Migration[7.0]
def change
create_table :chatbot_embeddings do |t|
t.integer :post_id, null: false, index: { unique: true }, foreign_key: true
t.column :embedding, "real[]", null: false
t.timestamps
end
end
end
16 changes: 16 additions & 0 deletions db/migrate/20230820010105_create_chatbot_embeddings_index.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# frozen_string_literal: true

class CreateChatbotEmbeddingsIndex < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE INDEX hnsw_index_on_chatbot_embeddings ON chatbot_embeddings USING hnsw(embedding)
WITH (dims=1536, m=64, efconstruction=64, efsearch=64);
SQL
end

def down
execute <<-SQL
DROP INDEX hnsw_index_on_chatbot_embeddings;
SQL
end
end
13 changes: 13 additions & 0 deletions db/migrate/20230826010101_rename_chatbot_embeddings_table.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@

# frozen_string_literal: true

class RenameChatbotEmbeddingsTable < ActiveRecord::Migration[7.0]
def change
begin
Migration::SafeMigrate.disable!
rename_table :chatbot_embeddings, :chatbot_post_embeddings
ensure
Migration::SafeMigrate.enable!
end
end
end
7 changes: 7 additions & 0 deletions db/migrate/20230826010103_rename_chatbot_embeddings_index.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# frozen_string_literal: true

class RenameChatbotEmbeddingsIndex < ActiveRecord::Migration[7.0]
def change
rename_index :chatbot_post_embeddings, 'hnsw_index_on_chatbot_embeddings', 'hnsw_index_on_chatbot_post_embeddings'
end
end
22 changes: 12 additions & 10 deletions lib/discourse_chatbot/bots/open_ai_agent.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,19 @@

module ::DiscourseChatbot

class OpenAIAgent < Bot
class OpenAIAgent < OpenAIBotBase

def initialize
super

@model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model

calculator_function = ::DiscourseChatbot::CalculatorFunction.new
wikipedia_function = ::DiscourseChatbot::WikipediaFunction.new
news_function = ::DiscourseChatbot::NewsFunction.new
google_search_function = ::DiscourseChatbot::GoogleSearchFunction.new
forum_search_function = ::DiscourseChatbot::ForumSearchFunction.new
stock_data_function = ::DiscourseChatbot::StockDataFunction.new
functions = [calculator_function, wikipedia_function]

functions = [calculator_function, wikipedia_function, forum_search_function]

functions << news_function if !SiteSetting.chatbot_news_api_token.blank?
functions << google_search_function if !SiteSetting.chatbot_serp_api_key.blank?
Expand Down Expand Up @@ -106,7 +106,7 @@ def handle_function_call(res)
func_name = first_message["function_call"]["name"]
args_str = first_message["function_call"]["arguments"]
result = call_function(func_name, args_str)
res_msg = { 'role' => 'assistant', 'content' => "The answer is #{result}." }
res_msg = { 'role' => 'assistant', 'content' => I18n.t("chatbot.prompt.agent.handle_function_call.answer", result: result) }
@internal_thoughts << res_msg
end

Expand All @@ -121,24 +121,26 @@ def call_function(func_name, args_str)
func = @func_mapping[func_name]
res = func.process(args)
res
rescue
"There was something wrong with your function arguments"
rescue
I18n.t("chatbot.prompt.agent.call_function.error")
end
end

def final_thought_answer
thoughts = "To answer the question I will use these step by step instructions.\n\n"
thoughts = I18n.t("chatbot.prompt.agent.final_thought_answer.opener")
@internal_thoughts.each do |thought|
if thought.key?('function_call')
thoughts += "I will use the #{thought['function_call']['name']} function to calculate the answer with arguments #{thought['function_call']['arguments']}.\n\n"
thoughts += I18n.t("chatbot.prompt.agent.final_thought_answer.thought_declaration", function_name: thought['function_call']['name'], arguments: thought['function_call']['arguments'])
else
thoughts += "#{thought['content']}\n\n"
end
end

final_thought = {
'role' => 'assistant',
'content' => "#{thoughts} Based on the above, I will now answer the question, this message will only be seen by me so answer with the assumption with that the user has not seen this message."
'content' => I18n.t("chatbot.prompt.agent.final_thought_answer.final_thought", thoughts: thoughts)
}

final_thought
end

Expand Down
6 changes: 2 additions & 4 deletions lib/discourse_chatbot/bots/open_ai_bot.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

module ::DiscourseChatbot

class OpenAIBot < Bot
class OpenAIBot < OpenAIBotBase

def initialize
super
Expand All @@ -13,11 +13,9 @@ def get_response(prompt)
system_message = { "role": "system", "content": I18n.t("chatbot.prompt.system.basic") }
prompt.unshift(system_message)

model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model

response = @client.chat(
parameters: {
model: model_name,
model: @model_name,
messages: prompt,
max_tokens: SiteSetting.chatbot_max_response_tokens,
temperature: SiteSetting.chatbot_request_temperature / 100.0,
Expand Down
31 changes: 31 additions & 0 deletions lib/discourse_chatbot/bots/open_ai_bot_base.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# frozen_string_literal: true
require "openai"

module ::DiscourseChatbot

class OpenAIBotBase < Bot
def initialize
::OpenAI.configure do |config|
config.access_token = SiteSetting.chatbot_open_ai_token
end
if !SiteSetting.chatbot_open_ai_model_custom_url.blank?
::OpenAI.configure do |config|
config.uri_base = SiteSetting.chatbot_open_ai_model_custom_url
end
end
if SiteSetting.chatbot_open_ai_model_custom_api_type == "azure"
::OpenAI.configure do |config|
config.api_type = :azure
config.api_version = SiteSetting.chatbot_open_ai_model_custom_api_version
end
end
@client = ::OpenAI::Client.new
@model_name = SiteSetting.chatbot_open_ai_model_custom ? SiteSetting.chatbot_open_ai_model_custom_name : SiteSetting.chatbot_open_ai_model
end

def get_response(prompt)
raise "Overwrite me!"
end

end
end
Loading

0 comments on commit 262a0a4

Please sign in to comment.