Tesseract ocr + OpenCv + PIL + Python
- sudo apt-get install tesseract-ocr
- pip install pytesseract
- pip install pdf2image
- pip install opencv-python
- pip install subprocess.run
- pip install pandas
-: This code is specifically for fetching a Text from PDF, All you need to do is to pass a path of PDF file while executing the Script.
-: As a Result, You will have Images of PDF file and .TXT file.
-> If your PDF has 5 pages, you will get 5 Images(It will tell you which data I am using in my Script) and 5 .TXT files.
your_pdf_path = "../x.pdf"
path_to_save ="../My_project/"
python OCR.py "your_pdf_path" "path_to_save"
sudo apt-get install imagemagick
python OCR.py "PathToYourImg/img.jpeg" "PathToSaveImg/img1.jpeg" 1
-: This script will remove all Lines from Image :-