[Bug]: produced pdf is empty #1453

leosenko · 2025-01-06T21:23:02Z

Describe the bug

Empty pages are produced, no text is recognized. If just rasterizing, I only get rasterized text that was previously OCRed but the text that was not OCRed is gone.
The file before OCR
Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf
The file after
ocrmypdf --force-ocr --output-type pdf "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf" "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf

Steps to reproduce

1. ocrmypdf --force-ocr --output-type pdf "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets.pdf" "Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
2. Open Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf"
3. Only the vertical text in the margin is left which, before OCR had been as text.

Files

Blandford-1982-Hydromagnetic flows from accretion discs and the production of radio jets-ocr.pdf

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

16.8.0

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2025-01-07T08:31:26Z

Resolved. We were mistakenly suppressing an error message from Ghostscript, and also Ghostscript unexpectedly generates an output image of the whole page minus the offending image instead of exiting with an error. Between the two, ocrmypdf didn't notice anything was wrong.

Your original command will now exit with an error but suggesting using --continue-on-soft-render-error to proceed even though there is invalid/ambiguous content in the input PDF. That's expected behavior.

leosenko added the triage Issue needs triage label Jan 6, 2025

leosenko assigned jbarlow83 Jan 6, 2025

jbarlow83 closed this as completed in 6edc749 Jan 7, 2025

github-actions bot removed the triage Issue needs triage label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: produced pdf is empty #1453

[Bug]: produced pdf is empty #1453

leosenko commented Jan 6, 2025

jbarlow83 commented Jan 7, 2025

[Bug]: produced pdf is empty #1453

[Bug]: produced pdf is empty #1453

Comments

leosenko commented Jan 6, 2025

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

jbarlow83 commented Jan 7, 2025