UPPC - Urdu Paraphrase Plagiarism Corpus - Research Portal

Associated organisational units

Electronic data

uppc-urdu-paraphrase
Accepted author manuscript, 333 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Keywords

Paraphrase Plagiarism, Corpus Generation, Urdu Plagiarism Detection, Natural Language Processing

View graph of relations

UPPC - Urdu Paraphrase Plagiarism Corpus

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Sharjeel Muhammad
Paul Edward Rayson
Rao Muhammad Adeel Nawab

More...

Publication date	23/05/2016
Host publication	Proceedings of LREC 2016, Tenth International Conference on Language Resources and Evaluation
Editors	Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Publisher	European Language Resources Association (ELRA)
Pages	1832-1836
Number of pages	5
ISBN (print)	9782951740891
<mark>Original language</mark>	English

Abstract

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.

Research

Associated organisational units

Electronic data

Links

Keywords

UPPC - Urdu Paraphrase Plagiarism Corpus

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us