Home > Research > Publications & Outputs > The Causal News Corpus

Links

View graph of relations

The Causal News Corpus: Annotating Causal Relations in Event Sentences from News

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. / Tan, Fiona Anting; Hürriyetoğlu, Ali; Caselli, Tommaso et al.
Proceedings of the Thirteenth Language Resources and Evaluation Conference. Paris: European Language Resources Association (ELRA), 2022. p. 2298-2310.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Tan, FA, Hürriyetoğlu, A, Caselli, T, Oostdijk, N, Nomoto, T, Hettiarachchi, H, Ameer, I, Uca, O, Liza, FF & Hu, T 2022, The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. in Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association (ELRA), Paris, pp. 2298-2310. <https://aclanthology.org/2022.lrec-1.246>

APA

Tan, F. A., Hürriyetoğlu, A., Caselli, T., Oostdijk, N., Nomoto, T., Hettiarachchi, H., Ameer, I., Uca, O., Liza, F. F., & Hu, T. (2022). The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2298-2310). European Language Resources Association (ELRA). https://aclanthology.org/2022.lrec-1.246

Vancouver

Tan FA, Hürriyetoğlu A, Caselli T, Oostdijk N, Nomoto T, Hettiarachchi H et al. The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. In Proceedings of the Thirteenth Language Resources and Evaluation Conference. Paris: European Language Resources Association (ELRA). 2022. p. 2298-2310

Author

Tan, Fiona Anting ; Hürriyetoğlu, Ali ; Caselli, Tommaso et al. / The Causal News Corpus : Annotating Causal Relations in Event Sentences from News. Proceedings of the Thirteenth Language Resources and Evaluation Conference. Paris : European Language Resources Association (ELRA), 2022. pp. 2298-2310

Bibtex

@inproceedings{e37f439274614a91b249c2c0a74a910a,
title = "The Causal News Corpus: Annotating Causal Relations in Event Sentences from News",
abstract = "Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.",
author = "Tan, {Fiona Anting} and Ali H{\"u}rriyetoğlu and Tommaso Caselli and Nelleke Oostdijk and Tadashi Nomoto and Hansi Hettiarachchi and Iqra Ameer and Onur Uca and Liza, {Farhana Ferdousi} and Tiancheng Hu",
year = "2022",
month = jun,
day = "1",
language = "English",
pages = "2298--2310",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
publisher = "European Language Resources Association (ELRA)",

}

RIS

TY - GEN

T1 - The Causal News Corpus

T2 - Annotating Causal Relations in Event Sentences from News

AU - Tan, Fiona Anting

AU - Hürriyetoğlu, Ali

AU - Caselli, Tommaso

AU - Oostdijk, Nelleke

AU - Nomoto, Tadashi

AU - Hettiarachchi, Hansi

AU - Ameer, Iqra

AU - Uca, Onur

AU - Liza, Farhana Ferdousi

AU - Hu, Tiancheng

PY - 2022/6/1

Y1 - 2022/6/1

N2 - Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.

AB - Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.

M3 - Conference contribution/Paper

SP - 2298

EP - 2310

BT - Proceedings of the Thirteenth Language Resources and Evaluation Conference

PB - European Language Resources Association (ELRA)

CY - Paris

ER -