Home > Research > Publications & Outputs > Profiling Medical Journal Articles Using a Gene...

Electronic data

  • 706_Paper

    Accepted author manuscript, 243 KB, PDF-document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

  • biotm-lrec2018-proc

    Final published version, 254 KB, PDF-document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

View graph of relations

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paper

Published
Publication date11/05/2018
Host publicationLREC 2018, Eleventh International Conference on Language Resources and Evaluation
EditorsNicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga
PublisherEuropean Language Resources Association (ELRA)
Pages4593-4597
Number of pages5
ISBN (Print)9791095546009
Original languageEnglish
EventThe 11th Edition of the Language Resources and Evaluation Conference (LREC2018) - Miyazaki, Japan
Duration: 7/05/201812/05/2018
http://lrec2018.lrec-conf.org/

Conference

ConferenceThe 11th Edition of the Language Resources and Evaluation Conference (LREC2018)
CountryJapan
CityMiyazaki
Period7/05/1812/05/18
Internet address

Conference

ConferenceThe 11th Edition of the Language Resources and Evaluation Conference (LREC2018)
CountryJapan
CityMiyazaki
Period7/05/1812/05/18
Internet address

Abstract

In many areas of academic publishing, there is an explosion of literature, and sub-division of fields into subfields, leading to stove-piping where sub-communities of expertise become disconnected from each other. This is especially true in the genetics literature over the last 10 years where researchers are no longer able to maintain knowledge of previously related areas. This paper extends several approaches based on natural language processing and corpus linguistics which allow us to examine corpora derived from bodies of genetics literature and will help to make comparisons and improve retrieval methods using domain knowledge via an existing gene ontology. We derived two open access medical journal corpora from PubMed related to psychiatric genetics and immune disorder genetics. We created a novel Gene Ontology Semantic Tagger (GOST) and lexicon to annotate the corpora and are then able to compare subsets of literature to understand the relative distributions of genetic terminology, thereby enabling researchers to make improved connections between them.