Home > Research > Datasets > Urdu Short Text Reuse Corpus (USTRC)

Electronic data

  • USTRC.zip

    410 KB, multipart/x-zip


    Available under license: CC BY-NC-SA

View graph of relations

Urdu Short Text Reuse Corpus (USTRC)


  • Sara Sameen (Creator)
  • Sharjeel Muhammad (Creator)
  • Rao Muhammad Adeel Nawab (Creator)
  • Paul Rayson (Creator)
  • Iqra Muneer (Creator)


USTRC is a gold standard benchmark corpus to measure short text reuse in the Urdu language. It contains in total 2,684 source-reused short text pairs.
Date made available2017
PublisherLancaster University
Date of data production2017

Contact person


Research outputs