Skip to content

Mathematical formula detection and latex format extraction

License

Notifications You must be signed in to change notification settings

Trojan0101/formula_ocr

Repository files navigation

Formula OCR

Stakeholder: Raja Chelladurai, Axis API [Freelance_External]

  • Detect and extract text in multiple languages [English, Chinese traditional, Chinese simplified, Korean, Japanese].
  • Detect and extract mathematical, and chemical formulas.
  • Detect if diagram is present in the image.
  • Detect if text is handwritten or printed.
  • Convert data to latex format.

Steps to follow:

  1. Clone the repo into formula_ocr_main directory:

     git clone https://github.com/Trojan0101/formula_ocr.git
  2. Install dependencies:

    pip install virtualenv
    virtualenv formula_ocr_env
    source formula_ocr_env/bin/activate
    pip install -r ./requirements.txt
    pip install uwsgi
  3. Move modified pooling.py to torch/nn/modules/ pooling.py file:

     mv modified_site_packages/torch/pooling.py formula_ocr_env/<python_version>/site_packages/torch/nn/modules/pooling.py
  4. Run:

    nohup uwsgi --http :8080 --module app:app > formula_ocr_main.log 2>&1 &
    disown