Document OCR

About the Project

This project is part of the SJTU ICE4309 - Image Processing & Content Analysis course. We implemented an OCR framework for converting in-the-wild documents to digitally readable and recognizable text.

Features

The model architecture of Document OCR is shown below:

The images undergo preprocessing, including edge detection, contour detection, perspective transformation and binarization to further enhance the image.
The text detection module uses the DBNet model with MobileNetV3 as the backbone network.
The text recognition module uses the CRNN model with MobileNetV3 as the backbone network.

Getting Started

To get started with your project, follow the steps below to set up your environment, install the necessary dependencies.

Create and activate new conda environment

conda create -n ocr python=3.9
conda activate ocr

Install pip requirements

pip install -r requirements.txt

Usage

Run the script

python run.py --img <IMG_DIR> --preprocess

Replace <IMG_DIR> with the path to a single image. Specify --preprocess to preprocess the input image

Example

python run.py --img input_img/receipt.jpg --preprocess

Demonstrations

Edge Detection

Input Image	Grayscale Conversion	Gaussian Blur	Closing	Canny

Contour Detection

LSD	Horizontal Line Segments	Vertical Line Segments	Final Contour

Perspective Transformation & Binarization

Perspective Transformation	Binarization

Text Detection & Recognition

Text Detection	Text Recognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Document OCR

About the Project

Features

Getting Started

Create and activate new conda environment

Install pip requirements

Usage

Run the script

Example

Demonstrations

Edge Detection

Contour Detection

Perspective Transformation & Binarization

Text Detection & Recognition

Files

README.md

Latest commit

History

README.md

File metadata and controls

Document OCR

About the Project

Features

Getting Started

Create and activate new conda environment

Install pip requirements

Usage

Run the script

Example

Demonstrations

Edge Detection

Contour Detection

Perspective Transformation & Binarization

Text Detection & Recognition