Graphical Object Detection in document images

This repository contains end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).

This repository is built on jwyang/faster-rcnn.pytorch. This implementation has the following features:

It is pure Pytorch code. Of course, there are some CUDA code.
It supports multi-image batch training.
It supports multiple GPUs training.

The results of GOD on different datasets is listed in the paper.

Getting Started

Clone the repo:

    git clone https://github.com/rnjtsh/graphical-object-detector.git

Then, create a folder:

    cd GOD && mkdir data

prerequisites

Python 2.7 or 3.6
Pytorch 0.4.0
CUDA 8.0 or higher

Compilation

The compilation is done as instructed by jwyang/faster-rcnn.pytorch.

Dataset

This repository uses the dataset in the same format as PASCAL VOC. But other format of datasets can also be adapted as done by jwyang/faster-rcnn.pytorch. The dataset should be prepared as per the following tree structure.

    GODdevkit2019
      ├── GOD2019
          ├── JPEGImages
          │   ├──  GOD001.jpg
          │   ├──  GOD002.jpg
          │   ├──  ...
          ├── ImageSets
          │   ├──  Main
          │   │    ├──  train.txt
          │   │    ├──  val.txt
          │   │    ├──  test.txt
          │   │    ├──  ...
          └── Annotations
              ├──  GOD001.xml
              ├──  GOD002.xml
              ├──  ...

Pretrained Models

We used ImageNet pretrained weights (VGG16 and ResNets) from Caffe in our experiments. You can download these two models from:

VGG16
ResNet50, ResNet101, ResNet152

Download them and put them into the data/pretrained_model/.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Citation

If you find this work useful, please cite the following paper "Ranajit Saha, Ajoy Mondal and C V Jawahar, Graphical Object Detection in Document Images, ICDAR-2019"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cfgs		cfgs
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_init_paths.py		_init_paths.py
requirements.txt		requirements.txt
test_net.py		test_net.py
trainval_net.py		trainval_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphical Object Detection in document images

Getting Started

prerequisites

Compilation

Dataset

Pretrained Models

Citation

About

Releases

Packages

Languages

License

kapitsa2811/graphical-object-detector

Folders and files

Latest commit

History

Repository files navigation

Graphical Object Detection in document images

Getting Started

prerequisites

Compilation

Dataset

Pretrained Models

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages