Rights statement: This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Quantitative Linguistics on 07/10/2016, available online: http://www.tandfonline.com/10.1080/09296174.2016.1226430
Accepted author manuscript, 659 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Sociolinguistic Features for Author Gender Identification
T2 - From Qualitative Evidence to Quantitative Analysis.
AU - Simaki, Vasiliki
AU - Aravantinou, Christina
AU - Mporas, Iosif
AU - Kondyli, Marianna
AU - Megalooikonomou, Vasileios
N1 - This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Quantitative Linguistics on 07/10/2016, available online: http://www.tandfonline.com/10.1080/09296174.2016.1226430
PY - 2017
Y1 - 2017
N2 - Theoretical and empirical studies prove the strong relationship between social factors and the individual linguistic attitudes. Different social categories, such as gender, age, education, profession and social status, are strongly related with the linguistic diversity of people’s everyday spoken and written interaction. In this paper, sociolinguistic studies addressed to gender differentiation are overviewed in order to identify how various linguistic characteristics differ between women and men. Thereafter, it is examined if and how these qualitative features can become quantitative metrics for the task of gender identification from texts on web blogs. The evaluation results showed that the “syntactic complexity”, the “tag questions”, the “period length”, the “adjectives” and the “vocabulary richness” characteristics seem to be significantly distinctive with respect to the author’s gender.
AB - Theoretical and empirical studies prove the strong relationship between social factors and the individual linguistic attitudes. Different social categories, such as gender, age, education, profession and social status, are strongly related with the linguistic diversity of people’s everyday spoken and written interaction. In this paper, sociolinguistic studies addressed to gender differentiation are overviewed in order to identify how various linguistic characteristics differ between women and men. Thereafter, it is examined if and how these qualitative features can become quantitative metrics for the task of gender identification from texts on web blogs. The evaluation results showed that the “syntactic complexity”, the “tag questions”, the “period length”, the “adjectives” and the “vocabulary richness” characteristics seem to be significantly distinctive with respect to the author’s gender.
U2 - 10.1080/09296174.2016.1226430
DO - 10.1080/09296174.2016.1226430
M3 - Journal article
VL - 24
SP - 65
EP - 84
JO - Journal of Quantitative Linguistics
JF - Journal of Quantitative Linguistics
SN - 0929-6174
IS - 1
ER -