Twenty eight years of semantic tagging

In this talk for the Atlas lecture series, I will explain the motivation for the Natural Language Processing (NLP) task of semantic annotation, and how it can be applied widely along with Corpus Linguistics (CL) methods for content analysis, market research, political discourse analysis, metaphor analysis, topic modelling, as well as in number of forensic, legal and policing scenarios. I will provide an overview of my work spanning 28 years on the UCREL Semantic Analysis System (USAS) which will serve to illustrate how challenging a task it is to teach a computer to understand natural language semantics, as well as showing the changing trends in methods in NLP from knowledge-based and empirical approaches utilising vast corpora and web resources that are now available. Originally designed for British English, I will highlight experiments we've undertaken to extend the methods and linguistic resources to more than 12 languages, working with multiple teams of researchers around the world. Finally, I will describe our recent research to expand coverage over time with a much larger and historically sensitive taxonomy in the Historical Thesaurus Semantic Tagger (HTST), employ crowdsourcing, learn Welsh, and adapt USAS to specific biomedical domains in the Gene Ontology Semantic Tagger (GOST).

