Architecture of transformer-text-recognition model
This project will try to apply transformer to recognize the text from image. The input of model is a image and the output of the model is word taken from image. The input image feature is extracted by convolution network and then the extracted feature is used as a input sentence to train transformer model to translate image to text.#python3.7
pip install --upgrade pip
pip install -r requirements.txt
python run_demo_server.py --port PORT --model_folder FOLDER_PATH
PORT
: port to run server (default server will run on http://localhost:9595)model_folder
: folder store trained model
python training.py --model_type MODEL_TYPE
model_type
:1
: transformer-random-trg2
: transformer-no-trg3
: transformer-no-decoder4
: transformer-trg-same-src5
: transformer
- The training model will be saved to
./checkpoints/{model_type}.pt
python evaluate.py --model_type MODEL_TYPE
model_type
:1
: transformer-random-trg2
: transformer-no-trg3
: transformer-no-decoder4
: transformer-trg-same-src5
: transformer