Home > Research > Publications & Outputs > Creating large semantic lexical resources for t...

Electronic data

  • 2017lofbergphd

    Final published version, 4.95 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

View graph of relations

Creating large semantic lexical resources for the Finnish language

Research output: ThesisDoctoral Thesis

Published

Standard

Creating large semantic lexical resources for the Finnish language. / Lofberg, Laura.
Lancaster University, 2017. 422 p.

Research output: ThesisDoctoral Thesis

Harvard

APA

Lofberg, L. (2017). Creating large semantic lexical resources for the Finnish language. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/3

Vancouver

Lofberg L. Creating large semantic lexical resources for the Finnish language. Lancaster University, 2017. 422 p. doi: 10.17635/lancaster/thesis/3

Author

Lofberg, Laura. / Creating large semantic lexical resources for the Finnish language. Lancaster University, 2017. 422 p.

Bibtex

@phdthesis{cc08322cf6a44c2b8c43e447f3d1201a,
title = "Creating large semantic lexical resources for the Finnish language",
abstract = "Finnish belongs into the Finno-Ugric language family, and it is spoken by the vast majority of the people living in Finland. The motivation for this thesis is to contribute to the development of a semantic tagger for Finnish. This tool is a parallel of the English Semantic Tagger which has been developed at the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University since the beginning of the 1990s and which has over the years proven to be a very powerful tool in automatic semantic analysis of English spoken and written data. The English Semantic Tagger has various successful applications in the fields of natural language processing and corpus linguistics, and new application areas emerge all the time. The semantic lexical resources that I have created in this thesis provide the knowledge base for the Finnish Semantic Tagger. My main contributions are the lexical resources themselves, along with a set of methods and guidelines for their creation and expansion as a general language resource and as tailored for domain-specific applications. Furthermore, I propose and carry out several methods for evaluating semantic lexical resources. In addition to the English Semantic Tagger, which was developed first, and the Finnish Semantic Tagger second, equivalent semantic taggers have now been developed for Czech, Chinese, Dutch, French, Italian, Malay, Portuguese, Russian, Spanish, Urdu, and Welsh. All these semantictaggers taken together form a program framework called the UCREL Semantic Analysis System (USAS) which enables the development of not only monolingual but also various types of multilingual applications.Large-scale semantic lexical resources designed for Finnish using semantic fields as the organizing principle have not been attempted previously. Thus, the Finnish semantic lexicons created in this thesis are a unique and novel resource. The lexical coverage on the test corpora containing general modern standard Finnish, which has been the focus of the lexicon development, ranges from 94.58% to 97.91%. However, the results are also very promising in the analysis of domain-specific text (95.36%), older Finnish text (92.11–93.05%), and Internet discussions (91.97–94.14%). The results of the evaluation of lexical coverage arecomparable to the results obtained with the English equivalents and thus indicate that the Finnish semantic lexical resources indeed cover the majority of core Finnish vocabulary.",
keywords = "finnish semantic tagger, semantic tagging, semantics, finnish language, UCREL",
author = "Laura Lofberg",
year = "2017",
doi = "10.17635/lancaster/thesis/3",
language = "English",
publisher = "Lancaster University",
school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Creating large semantic lexical resources for the Finnish language

AU - Lofberg, Laura

PY - 2017

Y1 - 2017

N2 - Finnish belongs into the Finno-Ugric language family, and it is spoken by the vast majority of the people living in Finland. The motivation for this thesis is to contribute to the development of a semantic tagger for Finnish. This tool is a parallel of the English Semantic Tagger which has been developed at the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University since the beginning of the 1990s and which has over the years proven to be a very powerful tool in automatic semantic analysis of English spoken and written data. The English Semantic Tagger has various successful applications in the fields of natural language processing and corpus linguistics, and new application areas emerge all the time. The semantic lexical resources that I have created in this thesis provide the knowledge base for the Finnish Semantic Tagger. My main contributions are the lexical resources themselves, along with a set of methods and guidelines for their creation and expansion as a general language resource and as tailored for domain-specific applications. Furthermore, I propose and carry out several methods for evaluating semantic lexical resources. In addition to the English Semantic Tagger, which was developed first, and the Finnish Semantic Tagger second, equivalent semantic taggers have now been developed for Czech, Chinese, Dutch, French, Italian, Malay, Portuguese, Russian, Spanish, Urdu, and Welsh. All these semantictaggers taken together form a program framework called the UCREL Semantic Analysis System (USAS) which enables the development of not only monolingual but also various types of multilingual applications.Large-scale semantic lexical resources designed for Finnish using semantic fields as the organizing principle have not been attempted previously. Thus, the Finnish semantic lexicons created in this thesis are a unique and novel resource. The lexical coverage on the test corpora containing general modern standard Finnish, which has been the focus of the lexicon development, ranges from 94.58% to 97.91%. However, the results are also very promising in the analysis of domain-specific text (95.36%), older Finnish text (92.11–93.05%), and Internet discussions (91.97–94.14%). The results of the evaluation of lexical coverage arecomparable to the results obtained with the English equivalents and thus indicate that the Finnish semantic lexical resources indeed cover the majority of core Finnish vocabulary.

AB - Finnish belongs into the Finno-Ugric language family, and it is spoken by the vast majority of the people living in Finland. The motivation for this thesis is to contribute to the development of a semantic tagger for Finnish. This tool is a parallel of the English Semantic Tagger which has been developed at the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University since the beginning of the 1990s and which has over the years proven to be a very powerful tool in automatic semantic analysis of English spoken and written data. The English Semantic Tagger has various successful applications in the fields of natural language processing and corpus linguistics, and new application areas emerge all the time. The semantic lexical resources that I have created in this thesis provide the knowledge base for the Finnish Semantic Tagger. My main contributions are the lexical resources themselves, along with a set of methods and guidelines for their creation and expansion as a general language resource and as tailored for domain-specific applications. Furthermore, I propose and carry out several methods for evaluating semantic lexical resources. In addition to the English Semantic Tagger, which was developed first, and the Finnish Semantic Tagger second, equivalent semantic taggers have now been developed for Czech, Chinese, Dutch, French, Italian, Malay, Portuguese, Russian, Spanish, Urdu, and Welsh. All these semantictaggers taken together form a program framework called the UCREL Semantic Analysis System (USAS) which enables the development of not only monolingual but also various types of multilingual applications.Large-scale semantic lexical resources designed for Finnish using semantic fields as the organizing principle have not been attempted previously. Thus, the Finnish semantic lexicons created in this thesis are a unique and novel resource. The lexical coverage on the test corpora containing general modern standard Finnish, which has been the focus of the lexicon development, ranges from 94.58% to 97.91%. However, the results are also very promising in the analysis of domain-specific text (95.36%), older Finnish text (92.11–93.05%), and Internet discussions (91.97–94.14%). The results of the evaluation of lexical coverage arecomparable to the results obtained with the English equivalents and thus indicate that the Finnish semantic lexical resources indeed cover the majority of core Finnish vocabulary.

KW - finnish semantic tagger

KW - semantic tagging

KW - semantics

KW - finnish language

KW - UCREL

U2 - 10.17635/lancaster/thesis/3

DO - 10.17635/lancaster/thesis/3

M3 - Doctoral Thesis

PB - Lancaster University

ER -