Home > Research > Datasets > Urdu Paraphrase Plagiarism Corpus (UPPC)

Electronic data

  • UPPC.zip

    211 KB, multipart/x-zip

    Text

    Available under license: CC BY-NC-SA

View graph of relations

Urdu Paraphrase Plagiarism Corpus (UPPC)

Dataset

  • Sharjeel Muhammad (Creator)
  • Paul Rayson (Creator)
  • Rao Muhammad Adeel Nawab (Creator)

Description

This corpus contains 160 Urdu text documents in total. 20 documents are original Wikipedia articles on well-known people whereas 140 documents (manually created by volunteers) are paraphrase plagiarise and non-plagiarise versions of the original articles. 75 documents are paraphrased by 5 university students using different paraphrasing techniques. 65 documents are independently written without considering the source article.
Date made available2016
PublisherLancaster University

Contact person