Skip to content

Commit

Permalink
Repair PDF before all processing
Browse files Browse the repository at this point in the history
Some PDFs choke both pdfminer.six and Ghostscript but the issues can be fixed first.

Fixes #1403
  • Loading branch information
jbarlow83 committed Oct 28, 2024
1 parent 5e478a7 commit ee5acbe
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions src/ocrmypdf/_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,13 @@ def triage(
"Argument --image-dpi is being ignored because the "
"input file is a PDF, not an image."
)
# Origin file is a pdf create a symlink with pdf extension
safe_symlink(input_file, output_file)
try:
with pikepdf.open(input_file) as pdf:
pdf.save(output_file)
except pikepdf.PdfError as e:
raise InputFileError() from e
except pikepdf.PasswordError as e:
raise EncryptedPdfError() from e
return output_file
except OSError as e:
log.debug(f"Temporary file was at: {input_file}")
Expand Down

0 comments on commit ee5acbe

Please sign in to comment.