-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR contents filled with   #148
Comments
Hi there, thanks for reporting. If you wanna try custom prompt templates, you could try to add to the prompt that it shall not use but instead print spaces. |
Ah, that could be...I had assumed it was a parsing error somewhere in the connection to the API...I suppose OpenAI could literally be returning ' '...I'll try that and report back. Thanks! |
I deleted and re-added the image I had the most trouble with, as well as updating the template as you suggested. When I ran it again, it worked perfectly this time, so either it was the template change or somehow reuploading the image fixed the problem (id imagine the prompt did the trick as it was literally the same image, downloaded from paperless then reuploaded). I'll try more and see, but it seems to be fixed for now. thanks for the quick response. |
Awesome! Could you share the prompt that you now used? Maybe it makes sense to include it into the default prompt. |
For some of the documents I have (receipts from restaurants etc., not multi-page documents), the auto-ocr returns some part of the text, and then fills in the rest with repetitive ' '. For some documents, deleting the content and redoing the ocr fixes the problem, for others, it does not. I have changed OCR models (4o to 4omini) and as I said deleted the content and the behaviour does not change.
Example, from a till tape from a restaurant called Bone Daddies (the original image is a jpg photo from a phone):
The native OCR from Paperless didn't have problems scanning the image (although it wasn't very accurate).
The Paperless-GPT OCR otherwise works very well (everything I have tried has been a jpg photo), even in rotated images.
Thanks for your great work on this - it is an awesome addition to Paperless!
The text was updated successfully, but these errors were encountered: