I want a function for delete image from Document.Page #874

shredderzwj · 2021-02-01T05:19:01Z

shredderzwj
Feb 1, 2021

Is your feature request related to a problem? Please describe.
Some pdf files have image watermarks on each page, I want to delete them
I want a function for delete image from Document.Page

Describe the solution you'd like
add a deleteImage method for delete a Image from Document.Page

Describe alternatives you've considered
I'm a novice ， I don’t know much about mupdf yet

Additional context
Some pdf files have image watermarks on each page，like theses：

Answered by JorjMcKie

Feb 1, 2021

There already exists a GUI script that lets you delete images: look at the repo home page.

But this is actually not what you want apparently: you want to remove watermarks!
A watermark can be multiple different things: an image, normal text (like it seems in your case), or a special annotation.

I don't intend to make a general function that covers all these different cases.
But you have enough options to do that yourself with PyMuPDF right now:

You can use the mentioned GUI script to remove an image from a page. That script also allows insertion of new images, or to change the position / rotation of an existing image.
You can remove annotations from a page via method page.deleteAnnot(annot)

View full answer

JorjMcKie · 2021-02-01T12:41:36Z

JorjMcKie
Feb 1, 2021
Maintainer

There already exists a GUI script that lets you delete images: look at the repo home page.

But this is actually not what you want apparently: you want to remove watermarks!
A watermark can be multiple different things: an image, normal text (like it seems in your case), or a special annotation.

I don't intend to make a general function that covers all these different cases.
But you have enough options to do that yourself with PyMuPDF right now:

You can use the mentioned GUI script to remove an image from a page. That script also allows insertion of new images, or to change the position / rotation of an existing image.
You can remove annotations from a page via method page.deleteAnnot(annot). Iterate through a page's annotations using the iterator page.annots(types=[fitz.PDF_ANNOT_WATERMARK]) to locate any watermark annotation on the page.
If the PDF creator did not use one of these two, things are a bit more complicated. The following locates watermark code in the page definition and removes it. To be executed for each page of the document.

# for each page do the following
for page in doc:
    page.clean_contents()
    xref = page.get_contents()[0]
    cont = bytearray(doc.xref_stream(xref))
    pos1 = cont.find(b"/Artifact ")  # start of potential watermark code
    if pos1 < 0:  # then there exists no watermark on this page
        continue
    pos2 = cont.find(b"EMC", pos1)
    # now make sure that the string ``b"/Watermark"`` is contained in the located substring
    if b"/Watermark" in cont[pos1:pos2]:
        cont[pos1 : pos2+3] = b""  # remove watermark code
        doc.update_stream(xref, cont)
        continue  # done with this page

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want a function for delete image from Document.Page #874

{{title}}

Replies: 1 comment

{{title}}

Select a reply

I want a function for delete image from Document.Page #874

shredderzwj Feb 1, 2021

Replies: 1 comment

JorjMcKie Feb 1, 2021 Maintainer

shredderzwj
Feb 1, 2021

JorjMcKie
Feb 1, 2021
Maintainer