Accepted author manuscript, 205 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Towards a Multilingual Financial Narrative Processing System
AU - El Haj, Mahmoud
AU - Rayson, Paul Edward
AU - Alves, Paulo
AU - Young, Steven Eric
PY - 2018/5/7
Y1 - 2018/5/7
N2 - Large scale financial narrative processing for UK annual reports has only become possible in the last few years with our prior work on automatically understanding and extracting the structure of unstructured PDF glossy reports. This has levelled the playing field somewhat relative to US research where annual reports (10-K Forms) have a rigid structure imposed on them by legislation and are submitted in plain text format. The structure extraction is just the first step in a pipeline of analyses to examine disclosure quality and change over time relative to financial results. In this paper, we describe and evaluate the use of similar Information Extraction and Natural Language Processing methods for extraction and analysis of annual financial reports in a second language (Portuguese) in order to evaluate the applicability of our techniques in another national context (Portugal). Extraction accuracy varies between languages with English exceeding 95%. To further examine the robustness of our techniques, we apply the extraction methods on a comprehensivesample of annual reports published by UK and Portuguese non-financial firms between 2003 and 2015.
AB - Large scale financial narrative processing for UK annual reports has only become possible in the last few years with our prior work on automatically understanding and extracting the structure of unstructured PDF glossy reports. This has levelled the playing field somewhat relative to US research where annual reports (10-K Forms) have a rigid structure imposed on them by legislation and are submitted in plain text format. The structure extraction is just the first step in a pipeline of analyses to examine disclosure quality and change over time relative to financial results. In this paper, we describe and evaluate the use of similar Information Extraction and Natural Language Processing methods for extraction and analysis of annual financial reports in a second language (Portuguese) in order to evaluate the applicability of our techniques in another national context (Portugal). Extraction accuracy varies between languages with English exceeding 95%. To further examine the robustness of our techniques, we apply the extraction methods on a comprehensivesample of annual reports published by UK and Portuguese non-financial firms between 2003 and 2015.
KW - Financial Narrative Processing
KW - NLP
KW - annual reports
KW - Information Extraction
KW - Multilingual
M3 - Conference contribution/Paper
SN - 9791095546238
SP - 52
EP - 58
BT - The First Financial Narrative Processing Workshop
A2 - El-Haj, Mahmoud
A2 - Rayson, Paul
A2 - Moore, Andrew
T2 - The 1st Financial Narrative Processing Workshop in LREC 2018
Y2 - 7 May 2018
ER -