Low-Resource Machine Translation with Different Granularity Image Features
Published:
Please cite:
@inproceedings{turghun24_lrlmt_cv,
title={Low-Resource Machine Translation with Different Granularity Image Features},
author={Turghun Tayir, Lin Li, Mieradilijiang Maimaiti, and Yusnur Muhtar},
journal={Conference: Chinese Conference on Pattern Recognition and Computer Vision (PRCV)},
year={2024},
}
Abstract
Visual content improves the alignments in the language latent spaces since the physical visual sensation is similar for people speaking different languages. Therefore, some researchers have recently proposed an unsupervised multimodal machine translation (UMMT) method for low-resource settings, which leverages images as pseudo-pivots to facilitate latent space alignment. However, they only consider region or grid image features in high-resource close language pairs (CLP), e.g., English-German (En-De) and English-French (En-Fr), which ignores the effect of applying more informative features to UMMT in low-resource distant language pairs (DLP), e.g., Chinese-Uyghur (Zh-Uy) and English-Uyghur (En-Uy). In this paper, we exploit a pre-training language model and a UMMT model with different granularity of image features and study the influence of image features on DLP and CLP translation. The experimental results on the CLP dataset Multi30k and the DLP dataset Multi30K-Zh-Uy show that the proposed approach has significantly improved over the state-of-the-art methods. The code is available at https://github.com/Turghuns/UMMT-DGIF.