Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 2.72 KB

retired-docs.md

File metadata and controls

38 lines (28 loc) · 2.72 KB

Retired documentation

PDF/A Normalization

The following features retired documentation written for PDF/A normalization. Go to Key Decisions: Not normalizing to PDF/A to understand why this documentation was retired.

Software

PDF (access) to PDF/A (preservation)

Currently we are normalizing PDFs to PDF/A on a case-by-case basis. If normalizing to PDF/A, then the PDFs should not have OCR or a complex colour profile. OCR'd PDFs messes with Archivematica's PDF/A normalization script and PDF/A has certain colour profile and text restrictions which could alter the look of coloured-image heavy PDF. Text-heavy, greyscale / black and white PDFs are safe to normalize to PDF/A for preservation. If you want your access copies to have OCR, you have to add that yourself using manual normalization.

PDF/A conformance (veraPDF)

If you have PDF/A files in the AIP, you need to check for conformance using veraPDF as Archivematica only validates whether the file is valid as a PDF and not as a PDF/A using JHOVE.

  1. cd into verapdf folder and check installation with verapdf --version
  2. If command is not found then add verapdf to path using alias verapdf='~/verapdf/verapdf'
  3. Move all the normalized files (files with an appended UUID) from the AIP into a separate folder called normalized
  4. cd to 1 level above the directory you want to scan and use verapdf normalized
  5. In <validationReports> tag it should say noncompliant=”0” if all the files conform to PDF/A, if it doesn't then that's the number of files that failed to conform
  6. Manually normalize original to PDF/A using either Adobe or ocrmypdf, return to the reorganize folder step