Accepted author manuscript, 3.28 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Learning with noise
T2 - Annual Meeting of the Association for Computational Linguistics
AU - Luo, Binfeng
AU - Feng, Yansong
AU - Wang, Zheng
AU - Zhu, Zhanxing
AU - Huang, Songfang
AU - Yan, Rui
AU - Zhao, Dongyan
PY - 2017/7/30
Y1 - 2017/7/30
N2 - Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction.We show that the dynamic transition matrix can effectively characterize the noise in the training data built by distant supervision. The transition matrix can be effectively trained using a novel curriculum learning based method without any direct supervision about the noise. We thoroughly evaluate our approach under a wide range of extraction scenarios. Experimental results show that our approach consistently improves the extraction results and outperforms the state-of-the-art in various evaluation scenarios.
AB - Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction.We show that the dynamic transition matrix can effectively characterize the noise in the training data built by distant supervision. The transition matrix can be effectively trained using a novel curriculum learning based method without any direct supervision about the noise. We thoroughly evaluate our approach under a wide range of extraction scenarios. Experimental results show that our approach consistently improves the extraction results and outperforms the state-of-the-art in various evaluation scenarios.
U2 - 10.18653/v1/P17-1040
DO - 10.18653/v1/P17-1040
M3 - Conference contribution/Paper
SN - 9781945626753
BT - The 55th Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics
CY - Stroudsburg, Pa.
Y2 - 30 July 2017 through 4 August 2017
ER -