Bug: Error handling needs changes #109

joerg-hermanns · 2025-01-11T10:59:24Z

I just discovered this morning, that paperless-gpt is kind of "stalled" with OCR
That is due the fact that it tried to process a "too big" document as it seems:

time="2025-01-11T10:54:35Z" level=debug msg="Image dimensions: 12600x16800"
time="2025-01-11T10:54:35Z" level=debug msg="Image size: 15274 KB"
time="2025-01-11T10:54:43Z" level=error msg="Error in processAutoTagDocuments: error in processAutoOcrTagDocuments: error processing document OCR: error performing OCR: error getting response from LLM: API returned unexpected status code: 400: You uploaded an unsupported image. Please make sure your image is valid."

Now it everytime tries to reprocess this document - but obiously the error message will not change.
At least we need two things here i think:

A configurable limit of maximum sizes for a picture to be sent to OCD (maybe based on document size instead of pixel dimensions??)
A kind of error handling which for example retries 5 times or so and the puts that specific document to an error queue, or maybe just tags it with a (configurable) tag in paperless (ex: ai-ocr-failed)

For the moment is my question: How can i identify which exact document this is ... ?
I have 953 documents in the processing queue ...

@icereed - Based on which API query to paperless do you get the next document to be processed?

joerg-hermanns · 2025-01-11T11:00:00Z

Addition: I assume it might be one of the documents, which already IS a JPEG if that helps for debugging?

joerg-hermanns · 2025-01-11T12:43:06Z

Addition: Got this one on a normal document. It seems the database is overloaded at the moment.
Will paperless-gpt retry this one ... ?

time="2025-01-11T12:41:38Z" level=error msg="Error updating document 2905: 500, \n<!doctype html>\n<html lang="en">\n\n <title>Server Error (500)</title>\n\n\n

Server Error (500)

\n\n\n"
time="2025-01-11T12:41:38Z" level=error msg="Error in processAutoTagDocuments: error in processAutoOcrTagDocuments: error updating documents: error updating document 2905: 500, \n<!doctype html>\n<html lang="en">\n\n <title>Server Error (500)</title>\n\n\n

Server Error (500)

\n\n\n"

joerg-hermanns · 2025-01-12T15:41:36Z

got another one. Any ideas here?

time="2025-01-12T15:20:00Z" level=error msg="Error in processAutoTagDocuments: error in processAutoOcrTagDocuments: error processing document OCR: error downloading document images: fitz: cannot open document"
time="2025-01-12T15:22:51Z" level=debug msg="Found at least 25 remaining documents with tag ai-ocr"
format error: cannot recognize version marker
warning: trying to repair broken xref
warning: repairing PDF document
warning: name is too long

icereed · 2025-01-13T07:08:52Z

First step of enhanced logging and error reporting is implemented in #114

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Error handling needs changes #109

Bug: Error handling needs changes #109

joerg-hermanns commented Jan 11, 2025

joerg-hermanns commented Jan 11, 2025

joerg-hermanns commented Jan 11, 2025

joerg-hermanns commented Jan 12, 2025

icereed commented Jan 13, 2025

Bug: Error handling needs changes #109

Bug: Error handling needs changes #109

Comments

joerg-hermanns commented Jan 11, 2025

joerg-hermanns commented Jan 11, 2025

joerg-hermanns commented Jan 11, 2025

Server Error (500)

Server Error (500)

joerg-hermanns commented Jan 12, 2025

icereed commented Jan 13, 2025