Skip to content

kapitsa2811/graphical-object-detector

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graphical Object Detection in document images

This repository contains end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).

This repository is built on jwyang/faster-rcnn.pytorch. This implementation has the following features:

  • It is pure Pytorch code. Of course, there are some CUDA code.

  • It supports multi-image batch training.

  • It supports multiple GPUs training.

The results of GOD on different datasets is listed in the paper.

Getting Started

Clone the repo:

    git clone https://github.com/rnjtsh/graphical-object-detector.git

Then, create a folder:

    cd GOD && mkdir data

prerequisites

  • Python 2.7 or 3.6
  • Pytorch 0.4.0
  • CUDA 8.0 or higher

Compilation

The compilation is done as instructed by jwyang/faster-rcnn.pytorch.

Dataset

This repository uses the dataset in the same format as PASCAL VOC. But other format of datasets can also be adapted as done by jwyang/faster-rcnn.pytorch. The dataset should be prepared as per the following tree structure.

    GODdevkit2019
      ├── GOD2019
          ├── JPEGImages
          │   ├──  GOD001.jpg
          │   ├──  GOD002.jpg
          │   ├──  ...
          ├── ImageSets
          │   ├──  Main
          │   │    ├──  train.txt
          │   │    ├──  val.txt
          │   │    ├──  test.txt
          │   │    ├──  ...
          └── Annotations
              ├──  GOD001.xml
              ├──  GOD002.xml
              ├──  ...

Pretrained Models

We used ImageNet pretrained weights (VGG16 and ResNets) from Caffe in our experiments. You can download these two models from:

Download them and put them into the data/pretrained_model/.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Citation

If you find this work useful, please cite the following paper "Ranajit Saha, Ajoy Mondal and C V Jawahar, Graphical Object Detection in Document Images, ICDAR-2019"

About

Graphical Object Detection in Document Images

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 72.0%
  • C 15.0%
  • Cuda 11.4%
  • Other 1.6%