Home > Research > Publications & Outputs > What do Large Language Models Need for Machine ...

Links

View graph of relations

What do Large Language Models Need for Machine Translation Evaluation?

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

What do Large Language Models Need for Machine Translation Evaluation? / Qian, Shenbin ; Sindhujan, Archchana; Kabra, Minnie et al.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. ed. / Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen. Association for Computational Linguistics (ACL Anthology), 2024. p. 3660-3674.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Qian, S, Sindhujan, A, Kabra, M, Kanojia, D, Orasan, C, Ranasinghe, T & Blain, F 2024, What do Large Language Models Need for Machine Translation Evaluation? in Y Al-Onaizan, M Bansal & Y-N Chen (eds), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL Anthology), pp. 3660-3674, The 2024 Conference on Empirical Methods in Natural Language Processing, Miami, United States, 12/11/24. <https://aclanthology.org/2024.emnlp-main.214/>

APA

Qian, S., Sindhujan, A., Kabra, M., Kanojia, D., Orasan, C., Ranasinghe, T., & Blain, F. (2024). What do Large Language Models Need for Machine Translation Evaluation? In Y. Al-Onaizan, M. Bansal, & Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 3660-3674). Association for Computational Linguistics (ACL Anthology). https://aclanthology.org/2024.emnlp-main.214/

Vancouver

Qian S, Sindhujan A, Kabra M, Kanojia D, Orasan C, Ranasinghe T et al. What do Large Language Models Need for Machine Translation Evaluation? In Al-Onaizan Y, Bansal M, Chen YN, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL Anthology). 2024. p. 3660-3674

Author

Qian, Shenbin ; Sindhujan, Archchana ; Kabra, Minnie et al. / What do Large Language Models Need for Machine Translation Evaluation?. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. editor / Yaser Al-Onaizan ; Mohit Bansal ; Yun-Nung Chen. Association for Computational Linguistics (ACL Anthology), 2024. pp. 3660-3674

Bibtex

@inproceedings{afd055ef763c40359e5645ba2eaf5fc6,
title = "What do Large Language Models Need for Machine Translation Evaluation?",
abstract = "Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.",
author = "Shenbin Qian and Archchana Sindhujan and Minnie Kabra and Diptesh Kanojia and Constantin Orasan and Tharindu Ranasinghe and Frederic Blain",
year = "2024",
month = nov,
day = "9",
language = "English",
pages = "3660--3674",
editor = "Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
publisher = "Association for Computational Linguistics (ACL Anthology)",
note = "The 2024 Conference on Empirical Methods in Natural Language Processing ; Conference date: 12-11-2024 Through 16-11-2024",
url = "https://2024.emnlp.org/",

}

RIS

TY - GEN

T1 - What do Large Language Models Need for Machine Translation Evaluation?

AU - Qian, Shenbin

AU - Sindhujan, Archchana

AU - Kabra, Minnie

AU - Kanojia, Diptesh

AU - Orasan, Constantin

AU - Ranasinghe, Tharindu

AU - Blain, Frederic

PY - 2024/11/9

Y1 - 2024/11/9

N2 - Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.

AB - Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT prompting, than smaller models. We also observe that LLMs do not always provide a numerical score when generating evaluations, which poses a question on their reliability for the task. Our work presents a comprehensive analysis for resource-constrained and training-less LLM-based evaluation of machine translation. We release the accrued prompt templates, code and data publicly for reproducibility.

M3 - Conference contribution/Paper

SP - 3660

EP - 3674

BT - Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

A2 - Al-Onaizan, Yaser

A2 - Bansal, Mohit

A2 - Chen, Yun-Nung

PB - Association for Computational Linguistics (ACL Anthology)

T2 - The 2024 Conference on Empirical Methods in Natural Language Processing

Y2 - 12 November 2024 through 16 November 2024

ER -