Skip to content

Commit

Permalink
chore: 更新graphrag集成文档
Browse files Browse the repository at this point in the history
  • Loading branch information
Menghuan1918 committed Jul 14, 2024
1 parent c22ad08 commit a5833af
Show file tree
Hide file tree
Showing 2 changed files with 112 additions and 4 deletions.
15 changes: 11 additions & 4 deletions src/demo/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
---
title: Features demo
title: Demo demonstration
index: false
icon: laptop-code
category:
- Guide
- User Guide
---

to be supplemented
You can [view detailed usage instructions](../guide/README.md) here.

You can [view instructions for use here](../guide/README.md).
## graphrag integration

graphrag is a structured, layered Retrieval-Augmented Generation (RAG) method developed by Microsoft.

- [Github](https://github.com/microsoft/graphrag)
- [How to integrate](graphrag.md)

If you have good integration methods or ideas, feel free to submit a PR!
101 changes: 101 additions & 0 deletions src/demo/graphrag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
---
title: Integration of graphrag
category:
- Guide
icon: diagram-project
---

## Install and configure the corresponding libraries

To avoid unnecessary trouble, please use a virtual environment:
- [miniconda3](https://docs.anaconda.com/miniconda/), the minimal installation version of conda, of course, you can also directly use Anaconda.
- [uv](https://github.com/astral-sh/uv), a very fast package installer and resolver built with Rust.

::: code-tabs#python

@tab conda

```bash
conda create -n rag python=3.12
conda activate rag
pip install --upgrade pdfdeal graphrag
```

@tab uv

```bash
uv venv
source .venv/bin/activate # For Linux
source .venv/Scripts/activate # For Windows
uv pip install --upgrade graphrag pdfdeal
```

:::

## Step1: Convert PDF

Create two folders to store the PDFs before processing and the txt files after processing:

```zsh
mkdir ./pdf
mkdir -p ./ragtest/input
```

Put the PDFs to be processed into the pdf folder, here using graphrag's [own paper](https://arxiv.org/pdf/2404.16130) and it's [references](https://arxiv.org/pdf/2306.04136).

Go to [Doc2X](https://doc2x.com/), click on identity information, and copy your identity token as a key.

Use `pdfdeal`'s CLI tool `doc2x` for batch processing, please add the long flag `--graphrag` to enable special adaptation for graphrag:

```zsh
doc2x -k "Your Key Here" -o ./ragtest/input -p --graphrag ./pdf
```

![](../images/demo/graphrag/doc2x.png)

Wait for it to complete processing:

![](../images/demo/graphrag/tree.png)

## Step2: Build knowledge graph

```zsh
python -m graphrag.index --init --root ./ragtest
```

Modify `settings.yaml` and `.env` files, then build:

```zsh
python -m graphrag.index --root ./ragtest
```

![](../images/demo/graphrag/build.png)

After building is complete, you can start asking questions to graphrag using different answering strategies:

::: code-tabs

@tab global

```bash
python -m graphrag.query \
--root ./ragtest \
--method global \
"Q"
```

@tab local

```bash
python -m graphrag.query \
--root ./ragtest \
--method local \
"Q"
```

:::

## See Also

- [graphrag official website](https://microsoft.github.io/graphrag/)
- [将PDF知识图谱化:graphrag+Doc2X+DeepSeek](https://blog.menghuan1918.com/posts/graphrag_doc2x_deepseek.html)

0 comments on commit a5833af

Please sign in to comment.