Home > Research > Publications & Outputs > Retrieving, Classifying and Analysing Narrative...

Electronic data

  • ElHaj_et_al_Oct18v6

    Accepted author manuscript, 738 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files. / El Haj, Mahmoud; Alves, Paulo; Rayson, Paul et al.
In: Accounting and Business Research, Vol. 50, No. 1, 01.01.2020, p. 6-34.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

El Haj M, Alves P, Rayson P, Walker M, Young S. Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files. Accounting and Business Research. 2020 Jan 1;50(1):6-34. Epub 2019 Jul 25. doi: 10.1080/00014788.2019.1609346

Author

Bibtex

@article{dc79047238d249f1a901489053a23246,
title = "Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files",
abstract = "We provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.",
author = "{El Haj}, Mahmoud and Paulo Alves and Paul Rayson and Martin Walker and Steven Young",
year = "2020",
month = jan,
day = "1",
doi = "10.1080/00014788.2019.1609346",
language = "English",
volume = "50",
pages = "6--34",
journal = "Accounting and Business Research",
issn = "0001-4788",
publisher = "Routledge",
number = "1",

}

RIS

TY - JOUR

T1 - Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files

AU - El Haj, Mahmoud

AU - Alves, Paulo

AU - Rayson, Paul

AU - Walker, Martin

AU - Young, Steven

PY - 2020/1/1

Y1 - 2020/1/1

N2 - We provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.

AB - We provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.

U2 - 10.1080/00014788.2019.1609346

DO - 10.1080/00014788.2019.1609346

M3 - Journal article

VL - 50

SP - 6

EP - 34

JO - Accounting and Business Research

JF - Accounting and Business Research

SN - 0001-4788

IS - 1

ER -