Home > Research > Publications & Outputs > A Comparative Study of Evaluation Metrics for L...

Electronic data

Links

Text available via DOI:

View graph of relations

A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. / Zmandar, Nadhem; El-Haj, Mahmoud; Rayson, Paul.
Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings. ed. / Elisabeth Métais; Farid Meziane; Warren Manning; Stephan Reiff-Marganiec; Vijayan Sugumaran. Cham: Springer, 2023. p. 391-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13913 LNCS).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Zmandar, N, El-Haj, M & Rayson, P 2023, A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. in E Métais, F Meziane, W Manning, S Reiff-Marganiec & V Sugumaran (eds), Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13913 LNCS, Springer, Cham, pp. 391-403, 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Derby, United Kingdom, 21/06/23. https://doi.org/10.1007/978-3-031-35320-8_28

APA

Zmandar, N., El-Haj, M., & Rayson, P. (2023). A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. In E. Métais, F. Meziane, W. Manning, S. Reiff-Marganiec, & V. Sugumaran (Eds.), Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings (pp. 391-403). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13913 LNCS). Springer. https://doi.org/10.1007/978-3-031-35320-8_28

Vancouver

Zmandar N, El-Haj M, Rayson P. A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. In Métais E, Meziane F, Manning W, Reiff-Marganiec S, Sugumaran V, editors, Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings. Cham: Springer. 2023. p. 391-403. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-35320-8_28

Author

Zmandar, Nadhem ; El-Haj, Mahmoud ; Rayson, Paul. / A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers. Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings. editor / Elisabeth Métais ; Farid Meziane ; Warren Manning ; Stephan Reiff-Marganiec ; Vijayan Sugumaran. Cham : Springer, 2023. pp. 391-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Bibtex

@inproceedings{5747c45ec7bb47dca8bc05e44cb629b8,
title = "A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers",
abstract = "There are more than 2,000 listed companies on the UK{\textquoteright}s London Stock Exchange, divided into 11 sectors who are required to communicate their financial results at least twice in a single financial year. UK annual reports are very lengthy documents with around 80 pages on average. In this study, we aim to benchmark a variety of summarisation methods on a set of different pre-trained transformers with different extraction techniques. In addition, we considered multiple evaluation metrics in order to investigate their differing behaviour and applicability on a dataset from the Financial Narrative Summarisation (FNS 2020) shared task, which is composed of annual reports published by firms listed on the London Stock Exchange and their corresponding summaries. We hypothesise that some evaluation metrics do not reflect true summarisation ability and propose a novel BRUGEscore metric, as the harmonic mean of ROUGE-2 and BERTscore. Finally, we perform a statistical significance test on our results to verify whether they are statistically robust, alongside an adversarial analysis task with three different corruption methods.",
keywords = "Benchmarking, Evaluation Metrics, Long Document sumamrization",
author = "Nadhem Zmandar and Mahmoud El-Haj and Paul Rayson",
year = "2023",
month = jun,
day = "21",
doi = "10.1007/978-3-031-35320-8_28",
language = "English",
isbn = "9783031353192",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "391--403",
editor = "Elisabeth M{\'e}tais and Farid Meziane and Warren Manning and Stephan Reiff-Marganiec and Vijayan Sugumaran",
booktitle = "Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings",
note = "28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023 ; Conference date: 21-06-2023 Through 23-06-2023",

}

RIS

TY - GEN

T1 - A Comparative Study of Evaluation Metrics for Long-Document Financial Narrative Summarization with Transformers

AU - Zmandar, Nadhem

AU - El-Haj, Mahmoud

AU - Rayson, Paul

PY - 2023/6/21

Y1 - 2023/6/21

N2 - There are more than 2,000 listed companies on the UK’s London Stock Exchange, divided into 11 sectors who are required to communicate their financial results at least twice in a single financial year. UK annual reports are very lengthy documents with around 80 pages on average. In this study, we aim to benchmark a variety of summarisation methods on a set of different pre-trained transformers with different extraction techniques. In addition, we considered multiple evaluation metrics in order to investigate their differing behaviour and applicability on a dataset from the Financial Narrative Summarisation (FNS 2020) shared task, which is composed of annual reports published by firms listed on the London Stock Exchange and their corresponding summaries. We hypothesise that some evaluation metrics do not reflect true summarisation ability and propose a novel BRUGEscore metric, as the harmonic mean of ROUGE-2 and BERTscore. Finally, we perform a statistical significance test on our results to verify whether they are statistically robust, alongside an adversarial analysis task with three different corruption methods.

AB - There are more than 2,000 listed companies on the UK’s London Stock Exchange, divided into 11 sectors who are required to communicate their financial results at least twice in a single financial year. UK annual reports are very lengthy documents with around 80 pages on average. In this study, we aim to benchmark a variety of summarisation methods on a set of different pre-trained transformers with different extraction techniques. In addition, we considered multiple evaluation metrics in order to investigate their differing behaviour and applicability on a dataset from the Financial Narrative Summarisation (FNS 2020) shared task, which is composed of annual reports published by firms listed on the London Stock Exchange and their corresponding summaries. We hypothesise that some evaluation metrics do not reflect true summarisation ability and propose a novel BRUGEscore metric, as the harmonic mean of ROUGE-2 and BERTscore. Finally, we perform a statistical significance test on our results to verify whether they are statistically robust, alongside an adversarial analysis task with three different corruption methods.

KW - Benchmarking

KW - Evaluation Metrics

KW - Long Document sumamrization

U2 - 10.1007/978-3-031-35320-8_28

DO - 10.1007/978-3-031-35320-8_28

M3 - Conference contribution/Paper

AN - SCOPUS:85164679667

SN - 9783031353192

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 391

EP - 403

BT - Natural Language Processing and Information Systems - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023, Proceedings

A2 - Métais, Elisabeth

A2 - Meziane, Farid

A2 - Manning, Warren

A2 - Reiff-Marganiec, Stephan

A2 - Sugumaran, Vijayan

PB - Springer

CY - Cham

T2 - 28th International Conference on Applications of Natural Language to Information Systems, NLDB 2023

Y2 - 21 June 2023 through 23 June 2023

ER -