An experiment in automatic indexing using the HASSET thesaurus

Computing and Communications

Associated organisational unit

UCREL - University Centre for Computer Corpus Research on Language

Text available via DOI:

https://doi.org/10.1109/CEEC.2013.6659437
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

An experiment in automatic indexing using the HASSET thesaurus. / El-Haj, M.; Balkan, L.; Barbalet, S. et al.
Computer Science and Electronic Engineering Conference (CEEC), 2013 5th. IEEE, 2013. p. 13-18.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

El-Haj, M, Balkan, L, Barbalet, S, Bell, L & Shepherdson, J 2013, An experiment in automatic indexing using the HASSET thesaurus. in Computer Science and Electronic Engineering Conference (CEEC), 2013 5th. IEEE, pp. 13-18. https://doi.org/10.1109/CEEC.2013.6659437

APA

El-Haj, M., Balkan, L., Barbalet, S., Bell, L., & Shepherdson, J. (2013). An experiment in automatic indexing using the HASSET thesaurus. In Computer Science and Electronic Engineering Conference (CEEC), 2013 5th (pp. 13-18). IEEE. https://doi.org/10.1109/CEEC.2013.6659437

Vancouver

El-Haj M, Balkan L, Barbalet S, Bell L, Shepherdson J. An experiment in automatic indexing using the HASSET thesaurus. In Computer Science and Electronic Engineering Conference (CEEC), 2013 5th. IEEE. 2013. p. 13-18 doi: 10.1109/CEEC.2013.6659437

Author

El-Haj, M. ; Balkan, L. ; Barbalet, S. et al. / An experiment in automatic indexing using the HASSET thesaurus. Computer Science and Electronic Engineering Conference (CEEC), 2013 5th. IEEE, 2013. pp. 13-18

Bibtex

@inproceedings{2a567ef2c415429d8924405ab3903a27,

title = "An experiment in automatic indexing using the HASSET thesaurus",

abstract = "In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.",

keywords = "document handling, indexing, learning (artificial intelligence), thesauri, HASSET, HASSET thesaurus, Jisc-funded SKOS-HASSET project, KEA, UK data archive-UK data service data related document collection, automated indexing decisions, automatic indexer, automatic indexing, controlled vocabulary, human indexer, humanities and social science electronic thesaurus, keyphrase extraction algorithm, machine learning tool, text mining, Gold, Indexing, Machine assisted indexing, Manuals, Standards, Thesauri, Training",

author = "M. El-Haj and L. Balkan and S. Barbalet and L. Bell and J. Shepherdson",

year = "2013",

month = sep,

day = "17",

doi = "10.1109/CEEC.2013.6659437",

language = "English",

pages = "13--18",

booktitle = "Computer Science and Electronic Engineering Conference (CEEC), 2013 5th",

publisher = "IEEE",

}

RIS

TY - GEN

T1 - An experiment in automatic indexing using the HASSET thesaurus

AU - El-Haj, M.

AU - Balkan, L.

AU - Barbalet, S.

AU - Bell, L.

AU - Shepherdson, J.

PY - 2013/9/17

Y1 - 2013/9/17

N2 - In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.

AB - In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEA's output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.

KW - document handling

KW - indexing

KW - learning (artificial intelligence)

KW - thesauri

KW - HASSET

KW - HASSET thesaurus

KW - Jisc-funded SKOS-HASSET project

KW - KEA

KW - UK data archive-UK data service data related document collection

KW - automated indexing decisions

KW - automatic indexer

KW - automatic indexing

KW - controlled vocabulary

KW - human indexer

KW - humanities and social science electronic thesaurus

KW - keyphrase extraction algorithm

KW - machine learning tool

KW - text mining

KW - Gold

KW - Indexing

KW - Machine assisted indexing

KW - Manuals

KW - Standards

KW - Thesauri

KW - Training

U2 - 10.1109/CEEC.2013.6659437

DO - 10.1109/CEEC.2013.6659437

M3 - Conference contribution/Paper

SP - 13

EP - 18

BT - Computer Science and Electronic Engineering Conference (CEEC), 2013 5th

PB - IEEE

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords