Home > Research > Researchers > Professor Paul Rayson > Datasets

Professor Paul Rayson

Professor of Natural Language Processing

  1. Urdu Paraphrase Plagiarism Corpus (UPPC)

    Muhammad, S. (Creator), Rayson, P. (Creator), Nawab, R. M. A. (Creator), Lancaster University, 2016, 10.17635/lancaster/researchdata/67

    Dataset

  2. COrpus of Urdu News TExt Reuse (COUNTER)

    Muhammad, S. (Creator), Nawab, R. M. A. (Creator), Rayson, P. (Creator), Lancaster University, 2016, 10.17635/lancaster/researchdata/96

    Dataset

  3. Cross-Language English-Urdu Corpus (CLEU)

    Muneer, I. (Creator), Muhammad, S. (Creator), Iqbal, M. (Creator), Nawab, R. M. A. (Creator), Rayson, P. (Creator), Lancaster University, 2017, 10.17635/lancaster/researchdata/176

    Dataset

  4. Urdu Short Text Reuse Corpus (USTRC)

    Sameen, S. (Creator), Muhammad, S. (Creator), Nawab, R. M. A. (Creator), Rayson, P. (Creator), Muneer, I. (Creator), Lancaster University, 2017, 10.17635/lancaster/researchdata/192

    Dataset

  5. N-gram list for the StratScore metric

    Athanasakou, V. (Creator), El-Haj, M. (Creator), Rayson, P. (Creator), Walker, M. (Creator), Young, S. (Creator), Lancaster University, 2018, 10.17635/lancaster/researchdata/232

    Dataset

  6. UK Annual Reports Key Sections

    El-Haj, M. (Creator), Young, S. (Creator), Rayson, P. (Creator), Lancaster University, 28/02/2019, 10.17635/lancaster/researchdata/262

    Dataset

  7. Annual Reports Key Sections Corpora 2003 to 2017

    El-Haj, M. (Creator), Young, S. (Creator), Rayson, P. (Creator), Lancaster University, 13/03/2019, 10.17635/lancaster/researchdata/271

    Dataset

  8. Arabic tweets about infectious diseases.

    Alsudias, L. (Creator), Rayson, P. (Creator), Lancaster University, 21/06/2019, 10.17635/lancaster/researchdata/303

    Dataset

  9. Arabic Infectious Disease Ontology

    Alsudias, L. (Creator), Rayson, P. (Creator), Lancaster University, 25/02/2020, 10.17635/lancaster/researchdata/350

    Dataset

  10. Igbo-English Machine Translation: An Evaluation Benchmark

    Ezeani, I. (Creator), Onyenwe, I. E. (Creator), Chinedu, U. (Creator), Rayson, P. (Creator), Hepple, M. (Creator), Github, 1/04/2020

    Dataset

  11. Human Judgements of Sentiment Values

    Pak, I. (Creator), Teh, P. L. (Creator), Rayson, P. (Creator), Piao, S. (Creator), Ho, J. S. Y. (Creator), Moore, A. (Creator), Cheah, Y. (Creator), Lancaster University, 2020, 10.17635/lancaster/researchdata/368

    Dataset

  12. COVID-19 Arabic tweets

    Alsudias, L. (Creator), Rayson, P. (Creator), Lancaster University, 7/07/2020, 10.17635/lancaster/researchdata/375

    Dataset

  13. Data and scripts for extracting plant names and collocates from historical texts

    Smail, R. (Creator), Donaldson, C. (Creator), Stevens, C. (Creator), Rayson, P. (Creator), Govaerts, R. (Creator), Lancaster University, 2020, 10.17635/lancaster/researchdata/385

    Dataset

  14. COVID-19 Arabic tweets

    Alsudias, L. (Creator), Rayson, P. (Creator), Lancaster University, 2020, 10.17635/lancaster/researchdata/394

    Dataset

  15. UNLT: Urdu Natural Language Toolkit

    Shafi, J. (Creator), Nawab, R. M. A. (Creator), Rayson, P. (Creator), Iqbal, R. (Creator), Lancaster University, 2021, 10.17635/lancaster/researchdata/494

    Dataset

  16. CorCenCC: Corpws Cenedlaethol Cymraeg Cyfoes – the National Corpus of Contemporary Welsh

    Knight, D. (Creator), Morris, S. (Creator), Fitzpatrick, T. (Creator), Rayson, P. (Creator), Spasić, I. (Creator), Thomas, E. M. (Creator), Lovell, A. (Creator), Morris, J. (Creator), Evas, J. (Creator), Stonelake, M. (Creator), Arman, L. (Creator), Davies, J. (Creator), Ezeani, I. (Creator), Neale, S. (Creator), Needs, J. (Creator), Piao, S. (Creator), Rees, M. (Creator), Watkins, G. (Creator), Williams, L. (Creator), Muralidaran, V. (Creator), Tovey-Walsh, B. (Creator), Anthony, L. (Creator), Cobb, T. M. (Creator), Deuchar, M. (Creator), Donnelly, K. (Creator), McCarthy, M. (Creator), Scannell, K. (Creator), Cardiff University, 2020, 10.17035/d.2020.0119878310

    Dataset

  17. E-commerce dataset

    Pak, I. (Creator), Mendeley Data, 27/11/2019, 10.17632/vwdmctbkr9.2

    Dataset

Back to top