Home > Research > Datasets > Cross-Language English-Urdu Corpus (CLEU)

Electronic data

  • CLEU.zip

    723 KB, multipart/x-zip


    Available under license: CC BY-NC-SA

View graph of relations

Cross-Language English-Urdu Corpus (CLEU)


  • Iqra Muneer (Creator)
  • Sharjeel Muhammad (Creator)
  • Muntaha Iqbal (Creator)
  • Rao Muhammad Adeel Nawab (Creator)
  • Paul Rayson (Creator)


The Cross-Language English-Urdu Corpus (CLEU) has source text in English while the derived text is in Urdu. It contains in total 3,235 sentence/passage pairs manually tagged into three categories i.e., near copy, paraphrased copy and independently written.
Date made available2017
PublisherLancaster University

Contact person