OCR-To-Get-Data-in-Structured-Format

Tesseract ocr + OpenCv + PIL + Python

Download the following -:

sudo apt-get install tesseract-ocr
pip install pytesseract
pip install pdf2image
pip install opencv-python
pip install subprocess.run
pip install pandas

Instruction

-: This code is specifically for fetching a Text from PDF, All you need to do is to pass a path of PDF file while executing the Script.

-: As a Result, You will have Images of PDF file and .TXT file.

For Example

-> If your PDF has 5 pages, you will get 5 Images(It will tell you which data I am using in my Script) and 5 .TXT files.

Run

your_pdf_path = "../x.pdf"

path_to_save ="../My_project/"

python OCR.py "your_pdf_path" "path_to_save"

Add-On

sudo apt-get install imagemagick

python OCR.py "PathToYourImg/img.jpeg" "PathToSaveImg/img1.jpeg" 1

  -: This script will remove all Lines from Image :-

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
OCR.py		OCR.py
README.md		README.md
WithLINES_test_img.png		WithLINES_test_img.png
WithoutLine_test_img.png		WithoutLine_test_img.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-To-Get-Data-in-Structured-Format

Download the following -:

Instruction

For Example

Run

Add-On

About

Releases

Packages

Languages

chiragpandav/OCR-To-Get-Data-in-Structured-Format

Folders and files

Latest commit

History

Repository files navigation

OCR-To-Get-Data-in-Structured-Format

Download the following -:

Instruction

For Example

Run

Add-On

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages