Skip to content

Latest commit

 

History

History
60 lines (49 loc) · 2.98 KB

README.md

File metadata and controls

60 lines (49 loc) · 2.98 KB

Document OCR

About the Project

This project is part of the SJTU ICE4309 - Image Processing & Content Analysis course. We implemented an OCR framework for converting in-the-wild documents to digitally readable and recognizable text.

Features

The model architecture of Document OCR is shown below:

Model Architecture
  • The images undergo preprocessing, including edge detection, contour detection, perspective transformation and binarization to further enhance the image.
  • The text detection module uses the DBNet model with MobileNetV3 as the backbone network.
  • The text recognition module uses the CRNN model with MobileNetV3 as the backbone network.

Getting Started

To get started with your project, follow the steps below to set up your environment, install the necessary dependencies.

Create and activate new conda environment

conda create -n ocr python=3.9
conda activate ocr

Install pip requirements

pip install -r requirements.txt

Usage

Run the script

python run.py --img <IMG_DIR> --preprocess 

Replace <IMG_DIR> with the path to a single image. Specify --preprocess to preprocess the input image

Example

python run.py --img input_img/receipt.jpg --preprocess

Demonstrations

Edge Detection

Input Image Grayscale Conversion Gaussian Blur Closing Canny
image image image image image

Contour Detection

LSD Horizontal Line Segments Vertical Line Segments Final Contour
image image image image

Perspective Transformation & Binarization

Perspective Transformation Binarization
image image

Text Detection & Recognition

Text Detection Text Recognition
image image