Skip to content

Commit

Permalink
Work around pdfium bug
Browse files Browse the repository at this point in the history
  • Loading branch information
VikParuchuri committed May 6, 2024
1 parent 54a4218 commit ebace90
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion pdftext/pdf/chars.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def get_pdfium_chars(pdf, fontname_sample_freq=settings.FONTNAME_SAMPLE_FREQ, pa

rotation = pdfium_c.FPDFText_GetCharAngle(text_page, i)
rotation = rotation * 180 / math.pi # convert from radians to degrees
coords = text_page.get_charbox(i, loose=True)
coords = text_page.get_charbox(i, loose=rotation == 0) # Loose doesn't work properly when page is rotated

This comment has been minimized.

Copy link
@mara004

mara004 May 7, 2024

Contributor

Indeed, seems like this is an open issue upstream: https://crbug.com/pdfium/1637

This comment has been minimized.

Copy link
@mara004

mara004 Jan 22, 2025

Contributor

@VikParuchuri Not sure if you still need this, but I think that bug was fixed in pdfium very recently. You'd yet have to wait for a new pdfium-binaries (and pypdfium2) release to get out, though.

This comment has been minimized.

Copy link
@VikParuchuri

VikParuchuri Jan 22, 2025

Author Owner

Thanks for the note about it!

device_coords = page_bbox_to_device_bbox(page, coords, page_width, page_height, bl_origin, page_rotation, normalize=True)

char_info = {
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pdftext"
version = "0.3.5"
version = "0.3.6"
description = "Extract structured text from pdfs quickly"
authors = ["Vik Paruchuri <vik.paruchuri@gmail.com>"]
license = "Apache-2.0"
Expand Down

0 comments on commit ebace90

Please sign in to comment.