Home > Research > Datasets > Annual Reports Key Sections Corpora 2003 to 2017

Electronic data

View graph of relations

Annual Reports Key Sections Corpora 2003 to 2017

Dataset

Description

UK Annual Reports Key Sections

Plain text content extracted from an initial sample of 31,464 annual reports published between January 2002 and December 2017 by firms listed on the London Stock Exchange (LSE). Annual reports provided as PDF files are processed using the CFIE-FRSE tool downloadable from https://github.com/drelhaj/CFIE-FRSE and described in the companion paper available at http://ssrn.com/abstract=2803275. The tool processed 26,284 reports from the initial sample (83.5%). The final sample includes reports published by financial and non-financial firms listed on either the LSE Main Market or the Alternative Investment Market (AIM). The document table of contents (TOC) forms the basis of extraction for 15,883 reports (approximately 60%); pre-existing document bookmarks are used to process the remaining 10,401 reports. The CFIE-FRSE tool partitions annual reports into the “front-end” narratives component and the “back-end” financials component (including the auditor’s report, mandatory financial statements and associated footnotes, and miscellaneous disclosures). We further partition the narratives component into a set of commonly occurring annual report sections that feature prominently in prior research. These narrative subsections (together with the auditor’s report) are numbered 1-12 and described in more detail in the following table. Text extracts are provided by report calendar year in separate files of one-million words for each core section 1-12. All extracted content is provided for the pooled set of reports processed using TOC (N = 15,883) to ensure classification consistency across reports.


[HeaderType]----[Annual report corpora]----[ Number of reports ]----[ Number of firms ]----[ Number of words]
[1]----[Letter from board chair]----[ 14,032 ]----[ 2,752 ]----[ 15,389,643 ]
[2]----[CEO review]----[ 7,160 ]----[ 1,640 ]----[ 13,947,211 ]
[3]----[Governance statement]----[ 12,766 ]----[ 2,500 ]----[ 43,695,127 ]
[4]----[Remuneration report]----[ 12,725 ]----[ 2,269 ]----[ 39,668,122 ]
[5]----[Business review]----[ 2,689 ]----[ 795 ]----[ 8,674,686 ]
[6]----[Financial review]----[ 8,460 ]----[ 1,686 ]----[ 20,013,680 ]
[7]----[Operating review]----[ 2,819 ]----[ 794 ]----[ 7,008,451 ]
[8]----[Highlights]----[ 11,099 ]----[ 2,082 ]----[ 3,750,407 ]
[9]----[Auditors report]----[ 15,038 ]----[ 2,884 ]----[ 19,036,357 ]
[10]----[Risk management]----[ 4,715 ]----[ 1,090 ]----[ 11,781,738 ]
[11]----[Chairman’s governance introduction]----[ 1,137 ]----[ 430 ] ----[ 1,338,304 ]
[12]----[CSR disclosures]----[ 6,630 ]----[ 1,148 ]----[ 12,948,932 ]
[2+5+6+7]----[Management commentary]----[ 11,507 ]----[ 2,261 ]----[ 49,644,028 ]
[3+11] ----[Governance commentary]----[ 12,844]----[ 2,513]----[ 45,033,431 ]

Entire Narratives component (excluding audit report) [ 15,883 ]----[ 2,925 ]----[ 178,216,301 ]
Entire Narratives component (including audit report) [ 15,883 ]----[ 2,936 ]----[ 197,252,658 ]
Date made available13/03/2019
PublisherLancaster University
Date of data production1/01/2002 - 1/01/2018

Contact person

Relations

Research outputs