Home > Research > Publications & Outputs > Letz Translate

Links

Text available via DOI:

View graph of relations

Letz Translate: Low-Resource Machine Translation for Luxembourgish

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Letz Translate: Low-Resource Machine Translation for Luxembourgish. / Song, Yewei; Ezzini, Saad; Klein, Jacques et al.
2023 5th International Conference on Natural Language Processing. IEEE, 2023.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Song, Y, Ezzini, S, Klein, J, Bissyande, T, Lefebvre, C & Goujon, A 2023, Letz Translate: Low-Resource Machine Translation for Luxembourgish. in 2023 5th International Conference on Natural Language Processing. IEEE. https://doi.org/10.1109/ICNLP58431.2023.00036

APA

Song, Y., Ezzini, S., Klein, J., Bissyande, T., Lefebvre, C., & Goujon, A. (2023). Letz Translate: Low-Resource Machine Translation for Luxembourgish. In 2023 5th International Conference on Natural Language Processing IEEE. https://doi.org/10.1109/ICNLP58431.2023.00036

Vancouver

Song Y, Ezzini S, Klein J, Bissyande T, Lefebvre C, Goujon A. Letz Translate: Low-Resource Machine Translation for Luxembourgish. In 2023 5th International Conference on Natural Language Processing. IEEE. 2023 Epub 2023 Mar 24. doi: 10.1109/ICNLP58431.2023.00036

Author

Song, Yewei ; Ezzini, Saad ; Klein, Jacques et al. / Letz Translate : Low-Resource Machine Translation for Luxembourgish. 2023 5th International Conference on Natural Language Processing. IEEE, 2023.

Bibtex

@inproceedings{656c9f1fbf4a4cef8ea9711800aed2c8,
title = "Letz Translate: Low-Resource Machine Translation for Luxembourgish",
abstract = "Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.",
author = "Yewei Song and Saad Ezzini and Jacques Klein and Tegawende Bissyande and Cl{\'e}ment Lefebvre and Anne Goujon",
year = "2023",
month = sep,
day = "6",
doi = "10.1109/ICNLP58431.2023.00036",
language = "English",
isbn = "9798350302226",
booktitle = "2023 5th International Conference on Natural Language Processing",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - Letz Translate

T2 - Low-Resource Machine Translation for Luxembourgish

AU - Song, Yewei

AU - Ezzini, Saad

AU - Klein, Jacques

AU - Bissyande, Tegawende

AU - Lefebvre, Clément

AU - Goujon, Anne

PY - 2023/9/6

Y1 - 2023/9/6

N2 - Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.

AB - Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.

U2 - 10.1109/ICNLP58431.2023.00036

DO - 10.1109/ICNLP58431.2023.00036

M3 - Conference contribution/Paper

SN - 9798350302226

BT - 2023 5th International Conference on Natural Language Processing

PB - IEEE

ER -