Chat-any (chat with any website)

1. Introduction

In today's information-driven world, accessing relevant and accurate data quickly is paramount. Traditional search methods can be time-consuming and often yield irrelevant results, creating a demand for more efficient information retrieval systems. My web application addresses this need by allowing users to input website URLs, which are then crawled to build a comprehensive knowledge base. This knowledge base is leveraged by a Retrieval-Augmented Generation (RAG) system, enabling users to interact with the website content through intuitive, conversational AI.

2. Overview

System architecture

The process begins by user inputing website URL. After that, a website is crawled and convert its content into text. This text is then split and embedded using the embeddings model. When a user inputs a prompt, the system performs a similarity search in the embedding space to find relevant information, which is then augmented to the original prompt. This augmented prompt is sent to the large language model (LLM), which generates a detailed and contextually appropriate response that is returned to the user. This system leverages advanced retrieval and generation techniques to provide accurate and relevant answers based on the content of the crawled website.

3. Installation

Install dependencies
```
pip install -r requirements.txt 
```
Because, we use Gemini-pro as LLM, so you may need to get Gemini API Key. Get an API key. Once you have Google API key, add it into .env file

Optional

Caching embedding model

Make weights/ and move into the directory:

!git lfs install
!git clone https://huggingface.co/BAAI/bge-small-en

cd .. to move back to the previous directory
Now, uncomment # os.environ["HF_HOME"] = "/workspaces/chat-any/weights" line in app.py

4. Usage

To run demo app with Streamlit
```
streamlit run app.py
```

Demo

Limit

Can only handle english language (Because I use Huggingface: BAAI/bge-small-en as embedding model)
I don’t focus on optimizing inference, so creating embeddings or other processes may take a while 🐢. Please take a deep breath and be patient, my friend! 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Chat-any (chat with any website)

1. Introduction

2. Overview

3. Installation

4. Usage

Demo

Limit

Files

README.md

Latest commit

History

README.md

File metadata and controls

Chat-any (chat with any website)

1. Introduction

2. Overview

3. Installation

4. Usage

Demo

Limit