Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter (peer-reviewed) › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Chapter (peer-reviewed) › peer-review
}
TY - CHAP
T1 - Gender Classification of Web Authors Using Feature Selection and Language Models
AU - Aravantinou, Christina
AU - Simaki, Vasiliki
AU - Mporas, Iosif
AU - Megalooikonomou, Vasileios
PY - 2015
Y1 - 2015
N2 - In the present article, we address the problem of automatic gender classification of web blog authors. More specifically, we employ eight widely used machine learning algorithms, in order to study the effectiveness of feature selection on improving the accuracy of gender classification. The feature ranking is performed over a set of statistical, part-of-speech tagging and language model features. In the experiments, we employed classification models based on decision trees, support vector machines and lazy-learning algorithms. The experimental evaluation performed on blog author gender classification data demonstrated the importance of language model features for this task and that feature selection significantly improves the accuracy of gender classification, regardless of the type of the machine learning algorithm used.
AB - In the present article, we address the problem of automatic gender classification of web blog authors. More specifically, we employ eight widely used machine learning algorithms, in order to study the effectiveness of feature selection on improving the accuracy of gender classification. The feature ranking is performed over a set of statistical, part-of-speech tagging and language model features. In the experiments, we employed classification models based on decision trees, support vector machines and lazy-learning algorithms. The experimental evaluation performed on blog author gender classification data demonstrated the importance of language model features for this task and that feature selection significantly improves the accuracy of gender classification, regardless of the type of the machine learning algorithm used.
KW - Text classification
KW - Gender identification
KW - Feature selection
U2 - 10.1007/978-3-319-23132-7_28
DO - 10.1007/978-3-319-23132-7_28
M3 - Chapter (peer-reviewed)
SN - 9783319231310
T3 - Lecture Notes in Computer Science
SP - 226
EP - 233
BT - SPECOM 2015
A2 - Ronzhin, A.
A2 - Potapova, R.
A2 - Fakotakis, N.
PB - Springer
CY - Cham
ER -