Tracing verbal aggression over time, using the Historical Thesaurus of English

Linguistics and English Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper

Published

Beth Malory

Publication date	21/07/2015
Host publication	Corpus Linguistics 2015
Pages	27-27
<mark>Original language</mark>	English
Event	University Centre for Computer Corpus Research on Language. August 2014 - Lancaster University, Lancaster , United Kingdom Duration: 21/07/2015 → 24/07/2015

Conference

Conference	University Centre for Computer Corpus Research on Language. August 2014
Country/Territory	United Kingdom
City	Lancaster
Period	21/07/15 → 24/07/15

Conference

Conference	University Centre for Computer Corpus Research on Language. August 2014
Country/Territory	United Kingdom
City	Lancaster
Period	21/07/15 → 24/07/15

Abstract

The work reported here seeks to demonstrate that
automatic content analysis tools can be used
effectively to trace pragmatic phenomena –
including aggression – over time. In doing so, it
builds upon preliminary work conducted by Archer
(2014), using Wmatrix (Rayson 2008), in which
Archer used six semtags – Q2.2 (speech acts),
A5.1+/- (‘good/bad’ evaluation), A5.2+/-
(‘true/false’ evaluation), E3- (‘angry/violent’),
S1.2.4+/- (‘im/politeness’), and S7.2+/-
(‘respect/lack of respect’) – to examine aggression
in 200 Old Bailey trial texts covering the decade
1783-93.
Having annotated the aforementioned Old Bailey
dataset using Wmatrix, Archer (2014) targeted the
utterances captured by the semtags listed above.
This afforded her a useful “way in” to (by providing
multiple potential indicators of) verbal aggression in
the late eighteenth-century English courtroom.
Using the ‘expand context’ facility within Wmatrix,
and consulting the original trial transcripts, those
incidences identified as verbally aggressive were
then re-contextualised – thereby allowing Archer to
disregard any that did not point to aggression in the
final instance. The success of this approach allowed
her to conclude that automatic content analysis tools
like USAS can indeed be used to trace pragmatic
phenomena (and in historical as well as modern
texts).
This approach was not without its teething
problems, however. First, apart from those semtags
which were used in conjunction with others, as
portmanteau tags (e.g. Q2.2 with E3- to capture
aggressive speech acts), the approach necessitated
the targeting of individual semtags within a given
text. The need to perform a time-intensive manual
examination of the wider textual context thus made
the use of large datasets prohibitive. Furthermore,
there was a closely related problem concerning the
tagset’s basis in The Longman Lexicon of
Contemporary English (McArthur, 1981), and its
consequent inability to take account of diachronic
meaning change. This tended to result in the
occasional mis-assignment of words which have
been subject to significant semantic change over
time, including politely, insult and insulted. In one instance, for example, politely was used to describe
the deftness with which a thief picked his victim’s
pocket! The need for manual checks to prevent such
mis-assignments from affecting results further
necessitated the narrowness of scope to which
Archer (2014) was subject.
In the extension to this work, reported here, the
authors present their solutions to these problems.
These solutions have at their core an innovation
which allows historical datasets to be tagged
semantically, using themes derived from
the Historical Thesaurus of the Oxford English
Dictionary (henceforth HTOED). These themes have
been identified as part of an AHRC/ESRC funded
project entitled “Semantic Annotation and Mark Up
for Enhancing Lexical Searches”, henceforth
SAMUELS11 (grant reference AH/L010062/1). The
SAMUELS project has also enabled researchers
from the Universities Glasgow, Lancaster,
Huddersfield, Strathclyde and Central Lancashire to
work together to develop a semantic annotation tool
which, thanks to its advanced disambiguation
facility, enables the automatic annotation of words,
as well as multi-word units, in historical texts with
their precise meanings. This means that pragmatic
phenomena such as aggression can be more
profitably sought automatically following the initial
identification of what the authors have termed a
‘meaning chain’, that is, a series of HTOED-derived
‘themes’ analogous to DNA strings.
This paper reports, first, on the authors’
identification of 68 potentially pertinent HTOED
‘themes’ and, second, on their investigation of the
possible permutations of these themes, and the
process by which they assessed which themes in
which combinations best identified and
captured aggression in their four datasets.
The datasets used for this research are drawn
from Hansard and from Historic Hansard; and are
taken from periods judged to be characterized, in
some way, by political/national unrest or
disquiet. The datasets represent the periods 1812-14
(i.e., “The War of 1812” between Great Britain and
America), 1879-81 (a period of complex wrangling
between two English governments and their
opposition, led by fierce rivals Disraeli and
Gladstone), 1913-19 (the First World War, including
its immediate build-up and aftermath), and 1978-9
(“The Winter of Discontent”).

Research

Links