Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs
Published:
Please cite:
@inproceedings{turghun_emnlp24_lrlmt_vison_pivot,
title={Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs},
author={Turghun Tayir, Lin Li, Xiaohui Tao, Mieradilijiang Maimaiti, Ming Li, and Jianquan Liu},
journal={Conference: International Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2024},
}
Abstract
Unsupervised multimodal machine translation (UMMT) aims to leverage vision information as a pivot between two languages to achieve better performance on low-resource language (LRL) pairs. However, there is presently a challenge: how to handle alignment between low-resource distant language pairs (DLPs) in UMMT. To this end, this paper proposes a visual pivoting UMMT method in low-resource DLPs. Specifically, we first construct a dataset containing two DLPs including English-Uyghur and Chinese-Uyghur. We then apply the visual pivoting method for both a pre-training language model and a UMMT model, and we observe that the images on the encoder and decoder of UMMT have noticeable effects on DLPs. Finally, we introduce informative multi-granularity image features to facilitate further alignment of the latent space between the two languages. Experimental results show that the proposed method significantly outperforms several baselines for UMMT on close language pairs (CLPs) and DLPs. Our dataset Multi30k-Distant will be available online for free access.