OCR goes into endless loop #170

Garfonso · 2025-01-31T07:47:23Z

Hi,

not sure if this is an issue you can do anything about...
I often see automatic OCR go into some weird kind of endless loop. The call to ollama never comes back, it seems and the GPU of the ollama host is very busy all the time.
If I forcefully restart ollama, paperless-gpt stops the process and in the log I can see that some random (?) text, in parts extracted from the document, is repeated indefinitively.

Like this:

paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |
paperless-gpt-1  | Bescheibung nach 1893 Absatz der Gewerbeordnung.
paperless-gpt-1  | Steuergestellstandteil: Brutto
paperless-gpt-1  | Gesetzliche Abzüge:
paperless-gpt-1  |     - Lohnsteuer, Ird.: 0.00 / Rentensteuer, Ird.: No
paperless-gpt-1  |

Used model for OCR is minicpm-v.
I am not sure how much control you have over the OCR stuf... maybe you have some hints to better investigate that?

The text was updated successfully, but these errors were encountered:

icereed · 2025-02-03T08:06:23Z

We are discussing this on our Discord channel and possible ways out. The thing is: It might be an issue with the combination of prompt and LLM. The paperless-gpt code has only very little influence on that.

KWottrich · 2025-02-04T15:17:18Z

I am also having this same issue with ollama never responding and being stuck at high resource usage, with the same kind of setup (ollama docker, using minicpm-v as the OCR model).
I'm attaching my ollama logs just in case it helps, though I fully acknowledge @icereed's comment above that this seems like an ollama/model problem, not a paperless-gpt problem

paperless-gpt-ollama.log
(no further output for 15 minutes + after the end of the logs in that file, though ollama is still maxing out 6 CPU threads)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR goes into endless loop #170

OCR goes into endless loop #170

Garfonso commented Jan 31, 2025

icereed commented Feb 3, 2025

KWottrich commented Feb 4, 2025 •

edited

Loading

OCR goes into endless loop #170

OCR goes into endless loop #170

Comments

Garfonso commented Jan 31, 2025

icereed commented Feb 3, 2025

KWottrich commented Feb 4, 2025 • edited Loading

KWottrich commented Feb 4, 2025 •

edited

Loading