Home > Research > Publications & Outputs > Who’s the Fairest of them All?

Electronic data

View graph of relations

Who’s the Fairest of them All?: A Comparison of Methods for Classifying Tone and Attribution in Earnings-related Management Discourse

Research output: Working paper

Publication date2020
Number of pages49
<mark>Original language</mark>English


We compare the relative and absolute performance of various machine learning algorithms and wordlists at replicating manual coding results for tone and attribution by domain experts in management performance commentary. Our suite of learning classifiers comprises Naïve Bayes, random forest, support vector machines, and an artificial neural network called multilayer perceptron. We use wordlists proposed by Henry (2006, 2008) and Loughran and McDonald (2010) to classify tone. Wordlists for attribution are based on the causal reasoning list from Language Inquirer and Word Count (LIWC), together with two self-constructed lists. We use a self-constructed wordlist to distinguish between internal and external attributions. We train learning classifiers using a large sample of manually annotated performance sentences. Results for all classifiers are assessed using a separate manually annotated holdout sample. Conclusions regarding the best classification method vary according to the classification task. None of the approaches are capable of identifying the presence of an attribution reliably. Even for more reliable classification tasks such as tone and attribution type, absolute measurement errors often exceed 20%. We conclude that while automated textual analysis methods offer important opportunities in certain settings, manual content analysis remains an essential tool for researchers interested in studying the properties and consequences of financial discourse.