Error Generating Preview Images for PDF/A Files in docspell 0.41.0 #2486

ElektroCoder · 2024-02-04T19:58:53Z

Hello,

I encountered an issue with processing PDF/A files in docspell version 0.41.0 on Debian 12. Attempting to generate preview images results in an error specifically for colored PDF/A files, whereas standard PDF files are processed without any issues. Here are the relevant log entries:

[...]
Sun, 4 February 2024, 19:45: Creating preview images for 1 files…
Sun, 4 February 2024, 19:45: Creating preview images failed, continuing without it.: LCMS error 13: Mismatched alpha channels
Sun, 4 February 2024, 19:45: Retrieving page count for 1 files…
[...]

I would greatly appreciate any assistance or suggestions on how to resolve this issue.

eikek · 2024-02-04T22:52:08Z

Hi @ElektroCoder I probably need such a pdf to check it on my side. Do you perhaps have some test file without sensitive stuff? Do you know if the same file works in the/a previous version?

ElektroCoder · 2024-02-04T23:36:17Z

Hi, Thank you for getting back to me on this. It’s no problem at all; I can provide a test file without any sensitive information. Regarding your question, I’ve never encountered any issues with version 0.40. The files are copied over from a document scanner to opt/docs via a Samba share. I’ve noticed that when I save the files as PDF/A, no preview is generated. However, if I adjust the scanner settings to save them as standard PDFs, the preview works fine. I’ll follow up with more information and possibly a test file by tomorrow around 4:00 PM—I’m already in bed for the night. :)

…

Am 04.02.2024 um 23:52 schrieb eikek ***@***.***>: I probably need such a pdf to check it on my side. Do you perhaps have some test file without sensitive stuff? Do you know if the same file works in the/a previous version?

eikek · 2024-02-05T07:50:56Z

Hi, oh sure, there is absolutely no rush. Just take your time - however long that may take.

ElektroCoder · 2024-02-11T10:53:29Z

Hi,

sorry for the delay. I took some time to retest things after double-checking my AMD GPU drivers on Debian and reinstalling Docker and Docspell. I've got two PDFs for you, both scanned with a Brother ADS 2400N scanner. One is in (not working) PDF/A format and the other in standard PDF format. They were saved via a Samba share, which has been working smoothly.

I've never had any issues with Docspell 0.40.0 before. However, I recently upgraded my hardware from an old A3000 CPU to an AMD 5600G CPU, and I'm running everything on a Debian 12 terminal server.
the import process log has this entry:

[...]
Sun, February 11th, 2024, 10:32: Updating SOLR index
Sun, February 11th, 2024, 10:32: Text extraction finished in 46630 ms.
Sun, February 11th, 2024, 10:32: Creating preview images for 1 files…
Sun, February 11th, 2024, 10:32: Creating preview images failed, continuing without it.: LCMS error 13: Mismatched alpha channels
Sun, February 11th, 2024, 10:32: Retrieving page count for 1 files…
Sun, February 11th, 2024, 10:32: Found number of pages: 2
[...]

I'll include the log files as text files. I'm not sure what's causing the problem; everything seems to be functioning fine, and Portainer isn't showing any entries in the container logs.

Thanks for your help in advance.

failed_Scan_20240211_113131_004873.pdf
log_004873_failedPreview_Brother_ADS-2400N_PDF-A.txt
log_004875_workingPreview_Brother_ADS-2400N_PDF.txt
ok_Scan_20240211_113214_004875.pdf

TheAnachronism · 2024-02-15T16:47:12Z

I just got the same error in generating the preview for a file.
I'm running docspell inside Kubernetes, but I don't think that's the issue.

TheAnachronism · 2024-02-15T16:49:25Z

I also get this a bit before the preview fails:

Thu, February 15th, 2024, 16:45: PDF conversion failed: Command result=3. No output file found.. Go without PDF file

tenpai-git · 2024-02-21T11:58:59Z

Hi @ElektroCoder @TheAnachronism

I read your output and also noticed that in the log.
Sun, February 11th, 2024, 11:28: PDF conversion failed: Command result=3. No output file found.. Go without PDF file

I the filenames of your working preview have PDF in the file name, and the failed preview has PDF/A in the file name.

This tells me that potentially PDF/A conversion is the culprit here.

Could you try the following? For scanning this PDF, let's try editing your ocrmypdf configuration a bit. In the /etc/docspell-joex/docspell-joex.confconfig try adding "--output-type", "pdf", to the options (this should come after --skip-text) and then go ahead and restart docspell-joex.

     # The `--skip-text` option is necessary to not fail on "text" pdfs
    # (where ocr is not necessary). In this case, the pdf will be
    # converted to PDF/A.
    ocrmypdf = {
      enabled = true
      command = {
        program = "ocrmypdf"
        args = [
          "-l", "{{lang}}",
          "--skip-text",
          "--deskew",
	  "--output-type", "pdf",
          "-j", "1",
          "{{infile}}",
          "{{outfile}}"
        ]

After editing so it appears similar to the excerpt above, restart docspell-joex.

sudo systemctl restart docspell-joex or use equivalent commands on docker.

Try reprocessing (delete the failed one, and any intermediary or cached filed created from scanning in the original document) and send the log over?

It would be good to know if using "--output-type", "pdf", was a better default than PDF/A. @eikek potentially similar to issue #2504 for affected PDFs.

PDF/A is meant to be archived as is, so even though it's counterintuitive since we want to manage documents, converting to raw PDF for processing may be better for Docspell.

tenpai-git · 2024-02-22T15:31:13Z

Hey guys, maybe try upgrading to nightly 0.4.2 version? I don't use SOLR, I am using PostgreSQL, but my previews were not generating on certain things also.

I tried upgrading to nightly on whim, and that resolved it for me. Perhaps there is a dependency issue of some kind.

Curious to see if the other test suggested works out for you as well. Adding "--output-type", "pdf", as previously described fixed things in a lot of pdfs I was working with, including previews.

eikek · 2024-03-02T22:04:50Z

Hi! I wonder if that issue is also related to #2504 (as mentioned already by @tenpai-git above). The docker images have been updated (sadly reusing the same tags as before) - maybe you could given them a try?

eikek · 2024-03-02T22:10:09Z

@ElektroCoder I tested your "failed scan" document quickly at my 0.39.0 installation. It was all good. I have preview and can select text in the converted pdf. I would assume for now some tooling problems, because I don't recall any changes in code from that version to 0.41.0 in that area. (I'm not using the docker images)

github-actions · 2024-04-11T02:06:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!

vs49688 · 2024-05-19T09:57:24Z

I just hit the same problem, and this workaround ¹ fixes it - simply add -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true

https://issues.apache.org/jira/browse/PDFBOX-5787 ↩

tenpai-git mentioned this issue Feb 21, 2024

Ocrmypdf fails due to Tesseract failed to report available languages #2504

Closed

eikek added the question Further information is requested label Mar 11, 2024

github-actions bot added the stale label Apr 11, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Generating Preview Images for PDF/A Files in docspell 0.41.0 #2486

Error Generating Preview Images for PDF/A Files in docspell 0.41.0 #2486

ElektroCoder commented Feb 4, 2024

eikek commented Feb 4, 2024

ElektroCoder commented Feb 4, 2024 via email

eikek commented Feb 5, 2024

ElektroCoder commented Feb 11, 2024 •

edited

Loading

TheAnachronism commented Feb 15, 2024

TheAnachronism commented Feb 15, 2024

tenpai-git commented Feb 21, 2024 •

edited

Loading

tenpai-git commented Feb 22, 2024 •

edited

Loading

eikek commented Mar 2, 2024

eikek commented Mar 2, 2024 •

edited

Loading

github-actions bot commented Apr 11, 2024

vs49688 commented May 19, 2024

Error Generating Preview Images for PDF/A Files in docspell 0.41.0 #2486

Error Generating Preview Images for PDF/A Files in docspell 0.41.0 #2486

Comments

ElektroCoder commented Feb 4, 2024

eikek commented Feb 4, 2024

ElektroCoder commented Feb 4, 2024 via email

eikek commented Feb 5, 2024

ElektroCoder commented Feb 11, 2024 • edited Loading

TheAnachronism commented Feb 15, 2024

TheAnachronism commented Feb 15, 2024

tenpai-git commented Feb 21, 2024 • edited Loading

tenpai-git commented Feb 22, 2024 • edited Loading

eikek commented Mar 2, 2024

eikek commented Mar 2, 2024 • edited Loading

github-actions bot commented Apr 11, 2024

vs49688 commented May 19, 2024

Footnotes

ElektroCoder commented Feb 11, 2024 •

edited

Loading

tenpai-git commented Feb 21, 2024 •

edited

Loading

tenpai-git commented Feb 22, 2024 •

edited

Loading

eikek commented Mar 2, 2024 •

edited

Loading