Data Handling Between Paperless-ngx and Paperless AI #374
Replies: 3 comments 2 replies
-
Hey Stefan, It utilizes the existing OCR data provided by Paperless-Ngx. As for now it would not work to use vision models if they are not also capable of llm instructions and undestanding. |
Beta Was this translation helpful? Give feedback.
-
Thank you! I was just wondering, because this was not clear. I think it would be a great feature to enable this as an option. For physical receipts that are submitted as photographs (gas station receipts, restaurant receipts) I experience a much better result from a vision model (eg. Gemini) compared with Tesseract. Typical example attached. |
Beta Was this translation helpful? Give feedback.
-
Jumping on this thread here as well, as Stefan describes the same issues I have. Textract isn't that great, and it seems that 3rd party OCR plugins for Paperless-ngx are still not materializing that quickly. I've solved this with a custom intake script that pre-processes my documents with Azure's Document API right now before they end up in Paperless. But what to do with the documents that already exist in Paperless and have less-than-optimal OCR quality? Maybe one could take the OCR'ed text within Paperless, send it to an LLM for "quality analysis" and upon the LLM deciding that the OCR quality isn't that great (that may be something the LLM could just tell from the text content of the Paperless document or so I hope?) a proper re-processing of the entire document, with the goal to extract a better fulltext version can be triggered. I'd suggest using dedicated OCR APIs for that, such as Azure AI Document Intelligence, Amazon Textract or as you mentioned Gemini. |
Beta Was this translation helpful? Give feedback.
-
Hi there,
sorry, for asking something that might be obvious, but I couldn't fit.
Does Paperless AI process the original PDFs or images, performing its own OCR using the LLM APIs, or does it utilize the text content already extracted by Paperless-ngx’s OCR engine for AI analysis?
So are there files or images transferred to the AI and it requires vision models or is it just performing additional analysis on the already extracted text?
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions