Directory Home
Researchers
Departments
Publications
Projects
Activities
Datasets
Home
>
Research
>
Datasets
>
UNLT: Urdu Natural Language Toolkit
Research
Research at Lancaster
Researchers
Departments & Centres
Publications & Outputs
Projects
Activities
Datasets
Electronic data
UNLT.zip
25.8 MB, multipart/x-zip
Dataset
Available under license:
CC BY
Date added:
11/11/21
DOI
https://doi.org/10.17635/lancaster/researchdata/494
View graph of relations
UNLT: Urdu Natural Language Toolkit
Dataset
Overview
Cite this
Jawad Shafi
(Creator)
Rao Muhammad Adeel Nawab
(Creator)
Paul Rayson
(Creator)
Rizwan Iqbal
(Creator)
Data Science Institute
Computing and Communications
UCREL - University Centre for Computer Corpus Research on Language
DSI - Foundations
Data Science
Description
The zip file contains the first version of the UNLT (Urdu Natural Language Toolkit) which includes three key text processing tools required for an Urdu NLP pipeline; word tokenizer, sentence tokenizer and Part-Of-Speech (POS) tagger.
Date made available
2021
Publisher
Lancaster University
Date of data production
2021
Contact person
rdm@lancaster.ac.uk
Links
GitHub repository