Home > Research > Publications & Outputs > Interpretable Machine Learning for Societal Lan...

Links

View graph of relations

Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language. / Akef, Soroosh; Meurers, Detmar; Mendes, Amalia et al.
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning. University of Tartu Library, 2025. p. 50-62.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Akef, S, Meurers, D, Mendes, A & Rebuschat, P 2025, Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language. in Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning. University of Tartu Library, pp. 50-62. <https://aclanthology.org/2025.nlp4call-1.4.pdf>

APA

Akef, S., Meurers, D., Mendes, A., & Rebuschat, P. (2025). Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language. In Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning (pp. 50-62). University of Tartu Library. https://aclanthology.org/2025.nlp4call-1.4.pdf

Vancouver

Akef S, Meurers D, Mendes A, Rebuschat P. Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language. In Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning. University of Tartu Library. 2025. p. 50-62 Epub 2025 Mar 5.

Author

Akef, Soroosh ; Meurers, Detmar ; Mendes, Amalia et al. / Interpretable Machine Learning for Societal Language Identification : Modeling English and German Influences on Portuguese Heritage Language. Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning. University of Tartu Library, 2025. pp. 50-62

Bibtex

@inproceedings{a033c83d7c37412eb9d9ec7a497c1c01,
title = "Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language",
abstract = "This study leverages interpretable machine learning to investigate how different societal languages (SLs) influence the written production of Portuguese heritage language (HL) learners. Using a corpus of learner texts from adolescents in Germany and the UK, we systematically control for topic and proficiency level to isolate the cross-linguistic effects that each SL may exert on the HL. We automatically extract a wide range of linguistic complexity measures, including lexical, morphological, syntactic, discursive, and grammatical measures, and apply clustering-based undersampling to ensure balanced and representative data. Utilizing an explainable boosting machine, a class of inherently interpretable machine learning models, our approach identifies predictive patterns that discriminate between English- and German-influenced HL texts. The findings highlight distinct lexical and morphosyntactic patterns associated with each SL, with some patterns in the HL mirroring the structures of the SL. These results support the role of the SL in characterizing HL output. Beyond offering empirical evidence of cross-linguistic influence, this work demonstrates how interpretable machine learning can serve as an empirical test bed for language acquisition research.",
author = "Soroosh Akef and Detmar Meurers and Amalia Mendes and Patrick Rebuschat",
year = "2025",
month = may,
day = "1",
language = "English",
pages = "50--62",
booktitle = "Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning",
publisher = "University of Tartu Library",

}

RIS

TY - GEN

T1 - Interpretable Machine Learning for Societal Language Identification

T2 - Modeling English and German Influences on Portuguese Heritage Language

AU - Akef, Soroosh

AU - Meurers, Detmar

AU - Mendes, Amalia

AU - Rebuschat, Patrick

PY - 2025/5/1

Y1 - 2025/5/1

N2 - This study leverages interpretable machine learning to investigate how different societal languages (SLs) influence the written production of Portuguese heritage language (HL) learners. Using a corpus of learner texts from adolescents in Germany and the UK, we systematically control for topic and proficiency level to isolate the cross-linguistic effects that each SL may exert on the HL. We automatically extract a wide range of linguistic complexity measures, including lexical, morphological, syntactic, discursive, and grammatical measures, and apply clustering-based undersampling to ensure balanced and representative data. Utilizing an explainable boosting machine, a class of inherently interpretable machine learning models, our approach identifies predictive patterns that discriminate between English- and German-influenced HL texts. The findings highlight distinct lexical and morphosyntactic patterns associated with each SL, with some patterns in the HL mirroring the structures of the SL. These results support the role of the SL in characterizing HL output. Beyond offering empirical evidence of cross-linguistic influence, this work demonstrates how interpretable machine learning can serve as an empirical test bed for language acquisition research.

AB - This study leverages interpretable machine learning to investigate how different societal languages (SLs) influence the written production of Portuguese heritage language (HL) learners. Using a corpus of learner texts from adolescents in Germany and the UK, we systematically control for topic and proficiency level to isolate the cross-linguistic effects that each SL may exert on the HL. We automatically extract a wide range of linguistic complexity measures, including lexical, morphological, syntactic, discursive, and grammatical measures, and apply clustering-based undersampling to ensure balanced and representative data. Utilizing an explainable boosting machine, a class of inherently interpretable machine learning models, our approach identifies predictive patterns that discriminate between English- and German-influenced HL texts. The findings highlight distinct lexical and morphosyntactic patterns associated with each SL, with some patterns in the HL mirroring the structures of the SL. These results support the role of the SL in characterizing HL output. Beyond offering empirical evidence of cross-linguistic influence, this work demonstrates how interpretable machine learning can serve as an empirical test bed for language acquisition research.

M3 - Conference contribution/Paper

SP - 50

EP - 62

BT - Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning

PB - University of Tartu Library

ER -