Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files

Home > Research > Publications & Outputs > Retrieving, Classifying and Analysing Narrative...

Associated organisational units

Electronic data

ElHaj_et_al_Oct18v6
Accepted author manuscript, 738 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1080/00014788.2019.1609346
Final published version
Available under license: CC BY: Creative Commons Attribution 4.0 International License

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

More...

<mark>Journal publication date</mark>	1/01/2020
<mark>Journal</mark>	Accounting and Business Research
Issue number	1
Volume	50
Number of pages	29
Pages (from-to)	6-34
Publication Status	Published
Early online date	25/07/19
<mark>Original language</mark>	English

Abstract

We provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.

Research

Associated organisational units

Electronic data

Links

Text available via DOI: