Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Related or duplicate
T2 - 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018
AU - Zhang, Wei Emma
AU - Sheng, Quan Z.
AU - Tang, Zhejun
AU - Ruan, Wenjie
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Plenty of research attempts target the automatic duplicate detection in Community Question Answering (CQA) systems and frame the task as a supervised learning problem on the question pairs. However, these methods rely on handcrafted features, leading to the difficulty of distinguishing related and duplicate questions as they are often textually similar. To tackle this issue, we propose to leverage neural network architecture to extract "deep" features to identify whether a question pair is duplicate or related. In particular, we construct question correlation matrices, which capture the word-wise similarities between questions. The constructed matrices are input to our proposed convolutional neural network (CNN), in which the convolutional operation moves through the two dimensions of the matrices. Empirical studies on a range of real-world CQA datasets confirm the effectiveness of our proposed correlation matrices and the CNN. Our method outperforms the state-of-the-art methods and achieves better classification performance.
AB - Plenty of research attempts target the automatic duplicate detection in Community Question Answering (CQA) systems and frame the task as a supervised learning problem on the question pairs. However, these methods rely on handcrafted features, leading to the difficulty of distinguishing related and duplicate questions as they are often textually similar. To tackle this issue, we propose to leverage neural network architecture to extract "deep" features to identify whether a question pair is duplicate or related. In particular, we construct question correlation matrices, which capture the word-wise similarities between questions. The constructed matrices are input to our proposed convolutional neural network (CNN), in which the convolutional operation moves through the two dimensions of the matrices. Empirical studies on a range of real-world CQA datasets confirm the effectiveness of our proposed correlation matrices and the CNN. Our method outperforms the state-of-the-art methods and achieves better classification performance.
KW - Convolutional neural networks
KW - Question answering
KW - Search quality
U2 - 10.1145/3209978.3210110
DO - 10.1145/3209978.3210110
M3 - Conference contribution/Paper
AN - SCOPUS:85051505258
SP - 1153
EP - 1156
BT - SIGIR '18 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
PB - Association for Computing Machinery, Inc
CY - New York
Y2 - 8 July 2018 through 12 July 2018
ER -