Evaluation of the morphological quality of machine translation outputs. The automatically generated test suite in English should be translated into one of the supported target languages (French, Czech, German). The output is then analyzed and provides three types of information:
- Adequacy: has the morphological information been well conveyed from the source?
- Fluency: do we have local agreement?
- Consistency: how well is the system confident in its prediction?
- Python3
- Download the test suite and sentence tags
- (French) Download the dictionary (taken from the Lefff)
- (Czech) Download and install Morphodita version 1.3, as well as the dictionary
- (German) Download and install Smor version
old2
. With more recent versions, run./smor-infl
instead of./smor
below.
Translate the source file morpheval.limsi.v2.en.sents
and run the
Moses tokenizer on it (with arguments -no-escape
and -l {fr|cs|de}
). Then:
python3 evaluate_fr.py -i output.tokenized -n morpheval.limsi.v2.en.info -d lefff.pkl
cat output.tokenized | sed 's/$/\n/' | tr ' ' '\n' | morphodita/src/run_morpho_analyze dictionary --input=vertical --output=vertical > output.analysis
python3 evaluate_cs.py -i output.analysis -n morpheval.limsi.v2.en.info
cat output.tokenized | tr ' ' '\n' | sort -u | ./smor > output.smored
python3 evaluate_de.py -i output.tokenized -n morpheval.limsi.v2.en.info -d output.smored
Franck Burlot and François Yvon, Evaluating the morphological competence of machine translation systems. In Proceedings of the Second Conference on Machine Translation (WMT’17). Association for Computational Linguistics, Copenhagen, Denmark, 2017.