Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Rhetorical move detection in English abstracts
T2 - multi-label sentence classifiers and their annotated corpora
AU - Dayrell, Carmen
AU - Candido Jr, Arnaldo
AU - Lima, Gabriel
AU - Machado Jr, Danilo
AU - Copestake, Ann
AU - Feltrim, Valéria
AU - Tagnin, Stella
AU - Aluísio, Sandra
PY - 2012
Y1 - 2012
N2 - The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. Thisstudy focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of currentmachine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approachdoes not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here,we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifiesrhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to variousother NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering(PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and alsodiscusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also makeour two training corpora publicly available so that they may serve as benchmark for this new task.
AB - The relevance of automatically identifying rhetorical moves in scientific texts has been widely acknowledged in the literature. Thisstudy focuses on abstracts of standard research papers written in English and aims to tackle a fundamental limitation of currentmachine-learning classifiers: they are mono-labeled, that is, a sentence can only be assigned one single label. However, such approachdoes not adequately reflect actual language use since a move can be realized by a clause, a sentence, or even several sentences. Here,we present MAZEA (Multi-label Argumentative Zoning for English Abstracts), a multi-label classifier which automatically identifiesrhetorical moves in abstracts but allows for a given sentence to be assigned as many labels as appropriate. We have resorted to variousother NLP tools and used two large training corpora: (i) one corpus consists of 645 abstracts from physical sciences and engineering(PE) and (ii) the other corpus is made up of 690 from life and health sciences (LH). This paper presents our preliminary results and alsodiscusses the various challenges involved in multi-label tagging and works towards satisfactory solutions. In addition, we also makeour two training corpora publicly available so that they may serve as benchmark for this new task.
KW - corpus linguistics
KW - English Abstract
KW - rhetorical moves
KW - multi-label sentence classifier
M3 - Conference contribution/Paper
BT - Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012)
ER -