Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Tesseract fails on Alpine 3.20.3 #1395

Closed
3 tasks done
pschichtel opened this issue Sep 15, 2024 · 2 comments
Closed
3 tasks done

[Bug]: Tesseract fails on Alpine 3.20.3 #1395

pschichtel opened this issue Sep 15, 2024 · 2 comments
Assignees

Comments

@pschichtel
Copy link

What were you trying to do?

This is part of the investigation of eikek/docspell#2504, an issue with a document management tool that uses ocrmypdf under the hood. There we initially though the issue is caused my its specific container image, but as it downs out, if I modify the .docker/Dockerfile.alpine in this repo to be based on alpine:3.20.3, then execution also fails. We are not quite sure what the root cause is, but probably some kind of packaging change in alpine recording tesseract and/or opencl. I'm creating this issue as a heads-up, it is not currently an issue but it will probably become one.

Where are you installing/running from?

Docker container

OCRmyPDF version

master

What operating system are you working on?

Linux

Operating system details and version

alpine 3.20.3

Simple sanity checks

  • Operating system is currently supported by its vendor (not end of life)
  • Python version is compatible with OCRmyPDF
  • This issue is not about a specific input file

Relevant log output

$ podman run --rm -it -v $HOME/some.pdf:/input_pdf.pdf test -l deu /input_pdf.pdf /tmp/output.pdf --output-type pdf --force-ocr
Tesseract failed to report available languages.                                                                                               __main__.py:69
Output from Tesseract:                                                                                                                                      
-----------                                                                                                                                                 
[DS] Profile file not available (tesseract_opencl_profile_devices.dat); performing profiling.                                                               
                                                                                                                                                            
[DS] Device: "(null)" (Native) evaluation...                                                                                                                
Error in pixCloseBrick: pixs not 1 bpp                                                                                                                      
Error in pixOpenBrick: pixs not defined                                                                                                                     
Error in pixSubtract: pixs1 not defined                                                                                                                     
Error in pixOpenBrick: pixs not defined                                                                                                                     
Error in pixOpenBrick: pixs not defined                                                                                                                     
[DS] Device: "(null)" (Native) evaluated                                                                                                                    
[DS]          composeRGBPixel: 0.008948 (w=1.2)                                                                                                             
[DS]            HistogramRect: 0.038714 (w=2.4)                                                                                                             
[DS]       ThresholdRectToPix: 0.036480 (w=4.5)                                                                                                             
[DS]        getLineMasksMorph: 0.000026 (w=5.0)                                                                                                             
[DS]                    Score: 0.267943                                                                                                                     
[DS] Scores written to file (tesseract_opencl_profile_devices.dat).                                                                                         
[DS] Device[1] 0:(null) score is 0.267943                                                                                                                   
[DS] Selected Device[1]: "(null)" (Native)                                                                                                                  
List of available languages in "/usr/share/tessdata/" (7):                                                                                                  
chi_sim                                                                                                                                                     
deu                                                                                                                                                         
eng                                                                                                                                                         
fra                                                                                                                                                         
osd                                                                                                                                                         
por                                                                                                                                                         
spa
@pschichtel pschichtel added the triage Issue needs triage label Sep 15, 2024
@jbarlow83
Copy link
Collaborator

Alpine 3.20 is building Tesseract with --enable-opencl, which is an unstable and experimental feature, discussed here
https://gitlab.alpinelinux.org/alpine/aports/-/issues/16143

It's not going to possible to use ocrmypdf for Alpine 3.20.0 through 3.20.3 inclusive. Alpine edge has removed the offending patch, and Tesseract has since dropped all of that code from its repository.

@jbarlow83
Copy link
Collaborator

ocrmypdf is changing to Alpine 3.21 which no longer has the issue.

@github-actions github-actions bot removed the triage Issue needs triage label Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants