You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is part of the investigation of eikek/docspell#2504, an issue with a document management tool that uses ocrmypdf under the hood. There we initially though the issue is caused my its specific container image, but as it downs out, if I modify the .docker/Dockerfile.alpine in this repo to be based on alpine:3.20.3, then execution also fails. We are not quite sure what the root cause is, but probably some kind of packaging change in alpine recording tesseract and/or opencl. I'm creating this issue as a heads-up, it is not currently an issue but it will probably become one.
Where are you installing/running from?
Docker container
OCRmyPDF version
master
What operating system are you working on?
Linux
Operating system details and version
alpine 3.20.3
Simple sanity checks
Operating system is currently supported by its vendor (not end of life)
Python version is compatible with OCRmyPDF
This issue is not about a specific input file
Relevant log output
$ podman run --rm -it -v $HOME/some.pdf:/input_pdf.pdf test -l deu /input_pdf.pdf /tmp/output.pdf --output-type pdf --force-ocr
Tesseract failed to report available languages. __main__.py:69
Output from Tesseract:
-----------
[DS] Profile file not available (tesseract_opencl_profile_devices.dat); performing profiling.
[DS] Device: "(null)" (Native) evaluation...
Error in pixCloseBrick: pixs not 1 bpp
Error in pixOpenBrick: pixs not defined
Error in pixSubtract: pixs1 not defined
Error in pixOpenBrick: pixs not defined
Error in pixOpenBrick: pixs not defined
[DS] Device: "(null)" (Native) evaluated
[DS] composeRGBPixel: 0.008948 (w=1.2)
[DS] HistogramRect: 0.038714 (w=2.4)
[DS] ThresholdRectToPix: 0.036480 (w=4.5)
[DS] getLineMasksMorph: 0.000026 (w=5.0)
[DS] Score: 0.267943
[DS] Scores written to file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 0.267943
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
The text was updated successfully, but these errors were encountered:
It's not going to possible to use ocrmypdf for Alpine 3.20.0 through 3.20.3 inclusive. Alpine edge has removed the offending patch, and Tesseract has since dropped all of that code from its repository.
What were you trying to do?
This is part of the investigation of eikek/docspell#2504, an issue with a document management tool that uses ocrmypdf under the hood. There we initially though the issue is caused my its specific container image, but as it downs out, if I modify the .docker/Dockerfile.alpine in this repo to be based on alpine:3.20.3, then execution also fails. We are not quite sure what the root cause is, but probably some kind of packaging change in alpine recording tesseract and/or opencl. I'm creating this issue as a heads-up, it is not currently an issue but it will probably become one.
Where are you installing/running from?
Docker container
OCRmyPDF version
master
What operating system are you working on?
Linux
Operating system details and version
alpine 3.20.3
Simple sanity checks
Relevant log output
The text was updated successfully, but these errors were encountered: