This chapter describes and evaluates the use of Information Extraction and Natural Language Processing methods for extraction and analysis of financial annual reports in three languages: English, Spanish and Portuguese.
The work described retains information on document structure which is needed to enable a clear distinction between narrative and financial statement components of annual reports and between individual sections within the narratives component. Extraction accuracy varies between languages with English exceeding 95 %. We apply the extraction methods on a comprehensive sample of annual reports published by UK, Spanish and Portuguese non-financial firms between 2003 and 2014.