Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR goes into endless loop #170

Open
Garfonso opened this issue Jan 31, 2025 · 2 comments
Open

OCR goes into endless loop #170

Garfonso opened this issue Jan 31, 2025 · 2 comments

Comments

@Garfonso
Copy link

Hi,

not sure if this is an issue you can do anything about...
I often see automatic OCR go into some weird kind of endless loop. The call to ollama never comes back, it seems and the GPU of the ollama host is very busy all the time.
If I forcefully restart ollama, paperless-gpt stops the process and in the log I can see that some random (?) text, in parts extracted from the document, is repeated indefinitively.

Like this:

paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |

Used model for OCR is minicpm-v.
I am not sure how much control you have over the OCR stuf... maybe you have some hints to better investigate that?

@icereed
Copy link
Owner

icereed commented Feb 3, 2025

We are discussing this on our Discord channel and possible ways out. The thing is: It might be an issue with the combination of prompt and LLM. The paperless-gpt code has only very little influence on that.

@KWottrich
Copy link

KWottrich commented Feb 4, 2025

I am also having this same issue with ollama never responding and being stuck at high resource usage, with the same kind of setup (ollama docker, using minicpm-v as the OCR model).
I'm attaching my ollama logs just in case it helps, though I fully acknowledge @icereed's comment above that this seems like an ollama/model problem, not a paperless-gpt problem

paperless-gpt-ollama.log
(no further output for 15 minutes + after the end of the logs in that file, though ollama is still maxing out 6 CPU threads)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants