Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
lrbmike committed Aug 9, 2024
1 parent 10c0e8e commit 7ae7c8e
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 7 deletions.
61 changes: 55 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,14 @@ playwright install
```

## Environment
Currently, only Gemini and OpenAI models are available. Edit `.env` file
Edit `.env` file, Due to the special nature of the Gemini model, it is configured separately. Other models are configurable via `API_KEY` and `API_BASE_URL`

```
GOOGLE_API_KEY=
GOOGLE_API_ENDPOINT=
API_KEY=
API_BASE_URL=
```

## Run
It's important to note that you can't start in `dev` mode, as playwright will fail in `dev` mode, Otherwise, it will be reported as a "NotImplementedError" error.
Expand All @@ -20,16 +27,23 @@ fastapi run app/main.py
```

## Use

> The project use langchain init_chat_model function to initialize a ChatModel from the model name and provider, you can find them by langchain website [inferring-model-provider](https://python.langchain.com/v0.2/docs/how_to/chat_models_universal_init/#inferring-model-provider)
### Gemini Model

You need to set `GOOGLE_API_KEY` or `GOOGLE_API_ENDPOINT` in `.env` file first.

scraper graph

```shell
curl -X POST https://your-domain/crawl/scraper_graph \
-H "Content-Type: application/json" \
-d '{
"prompt": "List me all the projects with their title、description、url、published",
"prompt": "List me all the articles with their title、description、link、published",
"url": "https://techcrunch.com/category/artificial-intelligence/",
"llm_name": "Gemini",
"model_provider": "google_genai",
"model_name": "gemini-1.5-flash-latest",
"embeddings_name": "models/text-embedding-004",
"temperature": 0,
"model_instance": true
}'
Expand All @@ -42,15 +56,50 @@ curl -X POST https://your-domain/crawl/search_graph \
-H "Content-Type: application/json" \
-d '{
"prompt": "List me all the traditional recipes from Chioggia",
"llm_name": "Gemini",
"model_provider": "google_genai",
"model_name": "gemini-1.5-flash-latest",
"embeddings_name": "models/text-embedding-004",
"temperature": 0,
"model_instance": true
}'
```

### OpenAI Model

You need to set `API_KEY` or `API_BASE_URL` in `.env` file first.

scraper graph

```shell
curl -X POST https://your-domain/crawl/scraper_graph \
-H "Content-Type: application/json" \
-d '{
"prompt": "List me all the articles with their title、description、link、published",
"url": "https://techcrunch.com/category/artificial-intelligence/",
"model_provider": "openai",
"model_name": "gpt-4o-mini",
"temperature": 0,
"model_instance": false
}'
```

search graph

```shell
curl -X POST https://your-domain/crawl/search_graph \
-H "Content-Type: application/json" \
-d '{
"prompt": "List me all the traditional recipes from Chioggia",
"model_provider": "openai",
"model_name": "gpt-4o-mini",
"temperature": 0,
"model_instance": false
}'
```



## Docker

`Dockerfile` introduce `mcr.microsoft.com/playwright/python:v1.45.1-jammy` provide a `playwright` environment. So we don't need to install any more.

Or you can publish to [Render](https://render.com/)
Expand Down
2 changes: 1 addition & 1 deletion app/modules/scrapegraphai.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ async def run_blocking_code_in_thread(blocking_func, *args):

class ScrapeGraphAiEngine:
"""
Your can find the model_provider by langchain website:
You can find the model_provider by langchain website:
https://python.langchain.com/v0.2/docs/how_to/chat_models_universal_init/#inferring-model-provider
"""
def __init__(
Expand Down

0 comments on commit 7ae7c8e

Please sign in to comment.