OCR and PDF Helper - SAKUNO

This is a graphical tool for performing Optical Character Recognition (OCR) on images and converting PDF files to images. Additionally, it allows for merging text files within a selected folder. The tool is built using CustomTkinter for the GUI, EasyOCR for OCR, pypdfium2 for PDF manipulation, and Pillow for image handling.

Features

PDF to Image Conversion: Convert PDF files into images, with adjustable DPI settings for image quality.
OCR on Images: Perform OCR on images in a selected folder to extract text and save it as .txt files.
Merge Text Files: Merge all text files in a folder into a single text file.
User-friendly GUI: Built with CustomTkinter, making it easy to navigate.

Installation

To run this project, you need to have Python installed. Follow these steps to set it up:

Clone the repository:

bash
Copy code
git clone https://github.com/yourusername/ocr-pdf-helper.git
cd ocr-pdf-helper

Install the required dependencies:
```
bash
Copy code
pip install customtkinter pypdfium2 Pillow easyocr
```
You may need additional libraries like pytorch for EasyOCR depending on your system.

Usage

Once installed, you can run the program directly using Python. The interface provides buttons and options for performing the tasks mentioned below.

PDF Conversion

File Selector: Choose a PDF file that you want to convert into images.
Set DPI: Adjust the DPI (dots per inch) for image quality (default is 100%).
Convert: Convert the PDF into images. The images will be saved in a new folder named after the PDF.

OCR on Images

Folder Selector: Select a folder containing images on which OCR should be performed.
Set OCR Language: Input the languages for OCR in a comma-separated format (e.g., eng,bn for English and Bengali).
Perform OCR: The tool will scan each image, extract text, and save it as a .txt file in the same folder.

Merging Text Files

Folder Selector: Select a folder that contains multiple .txt files.
Merge All Text Files: Click the "Merge All the Text Files" button to combine all the .txt files in the folder into one single file.

GUI Overview

PDF Path: Displays the selected PDF file path.
Image Preview: After PDF to image conversion, the preview of the first image will be displayed.
OCR and Merge Options: Available after selecting a folder for OCR and text merging.

Contributing

Contributions are welcome! Feel free to fork this repository, make changes, and submit a pull request.

Steps:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature-name).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Developed by Shaon An Nafi.
Feel free to reach out for any questions or suggestions.

This README.md provides clear instructions for installation, usage, and contributing, making your project easy to understand for new users. Let me know if you need any changes!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
Main.py		Main.py
README.md		README.md
Tools.png		Tools.png
requirements.txt		requirements.txt
run.txt		run.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR and PDF Helper - SAKUNO

Table of Contents

Features

Installation

Usage

PDF Conversion

OCR on Images

Merging Text Files

GUI Overview

Contributing

Steps:

License

Author

About

Releases

Packages

Languages

Nafisarkar/Pdf_Converter_OCR

Folders and files

Latest commit

History

Repository files navigation

OCR and PDF Helper - SAKUNO

Table of Contents

Features

Installation

Usage

PDF Conversion

OCR on Images

Merging Text Files

GUI Overview

Contributing

Steps:

License

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages