This repository contains the code and data for our paper on creative idiom translation. The paper has been accepted by both EMNLP 2024 Findings and the Eleventh Workshop on Asian Translation (WAT 2024).
You can find the preprint version of the paper in this repository, on arXiv, or in ACL Anthology.
You can find the source code in the src folder. The workflow to reproduce our results is summarized in workflow.pdf.
All the data that we generated can be found in data/data.tgz. To reproduce our results, you would also need to download external data from the sources mentioned in our paper.
[12/09/24] We used Llama-3.3-70B-Instruct to annotate the spans for 20000 translations of 500 Chinese idioms. Qualitatively, the annotation result is comparable to gpt-4-0125-preview and much better than gpt-3.5-turbo.
Translation examples can be found in our paper or our interative demo.
If you find our work useful, please consider citing our paper:
@inproceedings{tang-etal-2024-creative,
title = "Creative and Context-Aware Translation of {E}ast {A}sian Idioms with {GPT}-4",
author = "Tang, Kenan and
Song, Peiyang and
Qin, Yao and
Yan, Xifeng",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.544",
doi = "10.18653/v1/2024.findings-emnlp.544",
pages = "9285--9305"
}