pdfsorter

Sorts OCR'd PDF files to a new location by searching the text in them for keywords.

Used similar YAML file format to PyPDFOCR, but only doing my own file sorting, since some scanners come with OCR software that is superior to tesseract.

Dependencies

PyYAML
PyPDF2

Usage

usage: pdfsorter.py [-h] [-d] yaml_fn

Sorts OCR'd PDF files to a new location by searching the text in them for
keywords.

positional arguments:
  yaml_fn       YAML file containing configuration.

optional arguments:
  -h, --help    show this help message and exit
  -d, --dryrun  Will not move files if set.

YAML File Format:

watch_folder: "/Users/bubblegum/scaninbox"
target_folder: "/Users/bubblegum/scans"
default_folder: "/Users/bubblegum/scans/unfiled"

folders:
    water:
        - city of gotham water utilities
        - water utilities
    gas:
        - Monthly Energy Usage
        - save energy
        - gas leak
    cable:
        - cable
        - internet

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
pdfsorter.py		pdfsorter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfsorter

Dependencies

Usage

YAML File Format:

About

Releases

Packages

Languages

License

rdburns/pdfsorter

Folders and files

Latest commit

History

Repository files navigation

pdfsorter

Dependencies

Usage

YAML File Format:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages