update

lrbmike · Aug 9, 2024 · 7ae7c8e · 7ae7c8e
1 parent 10c0e8e
commit 7ae7c8e
Show file tree

Hide file tree

Showing 2 changed files with 56 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -11,7 +11,14 @@ playwright install
 ```
 
 ## Environment
-Currently, only Gemini and OpenAI models are available. Edit `.env` file
+Edit `.env` file, Due to the special nature of the Gemini model, it is configured separately. Other models are configurable via `API_KEY` and `API_BASE_URL`
+
+```
+GOOGLE_API_KEY=
+GOOGLE_API_ENDPOINT=
+API_KEY=
+API_BASE_URL=
+```
 
 ## Run
 It's important to note that you can't start in `dev` mode, as playwright will fail in `dev` mode, Otherwise, it will be reported as a "NotImplementedError" error.
@@ -20,16 +27,23 @@ fastapi run app/main.py
 ```
 
 ## Use
+
+> The project use langchain init_chat_model function to initialize a ChatModel from the model name and provider, you can find them by langchain website [inferring-model-provider](https://python.langchain.com/v0.2/docs/how_to/chat_models_universal_init/#inferring-model-provider)
+
+### Gemini Model
+
+You need to set  `GOOGLE_API_KEY` or `GOOGLE_API_ENDPOINT`  in `.env` file first.
+
 scraper graph 
+
 ```shell
 curl -X POST https://your-domain/crawl/scraper_graph \
 -H "Content-Type: application/json" \
 -d '{
-    "prompt": "List me all the projects with their title、description、url、published",
+    "prompt": "List me all the articles with their title、description、link、published",
     "url": "https://techcrunch.com/category/artificial-intelligence/",
-    "llm_name": "Gemini",
+    "model_provider": "google_genai",
     "model_name": "gemini-1.5-flash-latest",
-    "embeddings_name": "models/text-embedding-004",
     "temperature": 0,
     "model_instance": true
 }'
@@ -42,15 +56,50 @@ curl -X POST https://your-domain/crawl/search_graph \
 -H "Content-Type: application/json" \
 -d '{
     "prompt": "List me all the traditional recipes from Chioggia",
-    "llm_name": "Gemini",
+    "model_provider": "google_genai",
     "model_name": "gemini-1.5-flash-latest",
-    "embeddings_name": "models/text-embedding-004",
     "temperature": 0,
     "model_instance": true
 }'
 ```
 
+### OpenAI Model
+
+You need to set  `API_KEY` or `API_BASE_URL`  in `.env` file first.
+
+scraper graph 
+
+```shell
+curl -X POST https://your-domain/crawl/scraper_graph \
+-H "Content-Type: application/json" \
+-d '{
+    "prompt": "List me all the articles with their title、description、link、published",
+    "url": "https://techcrunch.com/category/artificial-intelligence/",
+    "model_provider": "openai",
+    "model_name": "gpt-4o-mini",
+    "temperature": 0,
+    "model_instance": false
+}'
+```
+
+search graph 
+
+```shell
+curl -X POST https://your-domain/crawl/search_graph \
+-H "Content-Type: application/json" \
+-d '{
+    "prompt": "List me all the traditional recipes from Chioggia",
+    "model_provider": "openai",
+    "model_name": "gpt-4o-mini",
+    "temperature": 0,
+    "model_instance": false
+}'
+```
+
+
+
 ## Docker
+
 `Dockerfile` introduce `mcr.microsoft.com/playwright/python:v1.45.1-jammy` provide a `playwright` environment. So we don't need to install any more.
 
 Or you can publish to [Render](https://render.com/)

diff --git a/app/modules/scrapegraphai.py b/app/modules/scrapegraphai.py
@@ -15,7 +15,7 @@ async def run_blocking_code_in_thread(blocking_func, *args):
 
 class ScrapeGraphAiEngine:
     """
-    Your can find the model_provider by langchain website:
+    You can find the model_provider by langchain website:
     https://python.langchain.com/v0.2/docs/how_to/chat_models_universal_init/#inferring-model-provider
     """
     def __init__(