Classifying encyclopedia articles - Research Portal

Links

https://www.sciencedirect.com/science/article/pii/S0169023X22000891
Final published version

Text available via DOI:

https://doi.org/10.1016/j.datak.2022.102098
Final published version

View graph of relations

Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. / Brenon, Alice; Moncla, Ludovic; Mcdonough, Katherine.
In: Data and Knowledge Engineering, Vol. 142, 102098, 30.11.2022.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Brenon, A, Moncla, L & Mcdonough, K 2022, 'Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions', Data and Knowledge Engineering, vol. 142, 102098. https://doi.org/10.1016/j.datak.2022.102098

APA

Brenon, A., Moncla, L., & Mcdonough, K. (2022). Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data and Knowledge Engineering, 142, Article 102098. https://doi.org/10.1016/j.datak.2022.102098

Vancouver

Brenon A, Moncla L, Mcdonough K. Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions. Data and Knowledge Engineering. 2022 Nov 30;142:102098. Epub 2022 Nov 14. doi: 10.1016/j.datak.2022.102098

Author

Brenon, Alice ; Moncla, Ludovic ; Mcdonough, Katherine. / Classifying encyclopedia articles : Comparing machine and deep learning methods and exploring their predictions. In: Data and Knowledge Engineering. 2022 ; Vol. 142.

Bibtex

@article{e20e983d73d64fc287f59ae9c2fbfaee,

title = "Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions",

abstract = "This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset is composed of 17 volumes of text from the Encyclop{\'e}die by Diderot and d'Alembert (1751-72) including about 70,000 articles. We combine text vectorization (bag-of-words and word embeddings) with machine learning methods, deep learning, and transformer architectures. In addition evaluating these approaches, we review the classification predictions using a variety of quantitative and qualitative methods. The best model obtains 86% as an average f-score for 38 classes. Using network analysis we highlight the difficulty of classifying semantically close classes. We also introduce examples of opportunities for qualitative evaluation of {"}misclassifications{"} in order to understand the relationship between content and different ways of ordering knowledge. We openly release all code and results obtained during this research.",

author = "Alice Brenon and Ludovic Moncla and Katherine Mcdonough",

year = "2022",

month = nov,

day = "30",

doi = "10.1016/j.datak.2022.102098",

language = "English",

volume = "142",

journal = "Data and Knowledge Engineering",

issn = "0169-023X",

publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Classifying encyclopedia articles

T2 - Comparing machine and deep learning methods and exploring their predictions

AU - Brenon, Alice

AU - Moncla, Ludovic

AU - Mcdonough, Katherine

PY - 2022/11/30

Y1 - 2022/11/30

N2 - This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset is composed of 17 volumes of text from the Encyclopédie by Diderot and d'Alembert (1751-72) including about 70,000 articles. We combine text vectorization (bag-of-words and word embeddings) with machine learning methods, deep learning, and transformer architectures. In addition evaluating these approaches, we review the classification predictions using a variety of quantitative and qualitative methods. The best model obtains 86% as an average f-score for 38 classes. Using network analysis we highlight the difficulty of classifying semantically close classes. We also introduce examples of opportunities for qualitative evaluation of "misclassifications" in order to understand the relationship between content and different ways of ordering knowledge. We openly release all code and results obtained during this research.

AB - This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset is composed of 17 volumes of text from the Encyclopédie by Diderot and d'Alembert (1751-72) including about 70,000 articles. We combine text vectorization (bag-of-words and word embeddings) with machine learning methods, deep learning, and transformer architectures. In addition evaluating these approaches, we review the classification predictions using a variety of quantitative and qualitative methods. The best model obtains 86% as an average f-score for 38 classes. Using network analysis we highlight the difficulty of classifying semantically close classes. We also introduce examples of opportunities for qualitative evaluation of "misclassifications" in order to understand the relationship between content and different ways of ordering knowledge. We openly release all code and results obtained during this research.

UR - https://hal.science/hal-03821073

U2 - 10.1016/j.datak.2022.102098

DO - 10.1016/j.datak.2022.102098

M3 - Journal article

VL - 142

JO - Data and Knowledge Engineering

JF - Data and Knowledge Engineering

SN - 0169-023X

M1 - 102098

ER -

Research

Links

Text available via DOI:

Classifying encyclopedia articles: Comparing machine and deep learning methods and exploring their predictions

Standard

Harvard

APA

Vancouver

Author

Bibtex

RIS

Quick Links

Connect With Us

Faculties & Depts

Contact Us