Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: produced pdf is empty #1453

Closed
leosenko opened this issue Jan 6, 2025 · 1 comment
Closed

[Bug]: produced pdf is empty #1453

leosenko opened this issue Jan 6, 2025 · 1 comment
Assignees

Comments

@leosenko
Copy link

leosenko commented Jan 6, 2025

Describe the bug

Empty pages are produced, no text is recognized. If just rasterizing, I only get rasterized text that was previously OCRed but the text that was not OCRed is gone.
The file before OCR
Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf
The file after
ocrmypdf --force-ocr --output-type pdf "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf" "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf

Steps to reproduce

1. ocrmypdf --force-ocr --output-type pdf "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf" "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
2. Open Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
3. Only the vertical text in the margin is left which, before OCR had been as text.

Files

Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

16.8.0

Relevant log output

No response

@leosenko leosenko added the triage Issue needs triage label Jan 6, 2025
@jbarlow83
Copy link
Collaborator

Resolved. We were mistakenly suppressing an error message from Ghostscript, and also Ghostscript unexpectedly generates an output image of the whole page minus the offending image instead of exiting with an error. Between the two, ocrmypdf didn't notice anything was wrong.

Your original command will now exit with an error but suggesting using --continue-on-soft-render-error to proceed even though there is invalid/ambiguous content in the input PDF. That's expected behavior.

@github-actions github-actions bot removed the triage Issue needs triage label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants