With natural language processing (NLP), researchers aim to get the computer to identify and understand the patterns in human languages. This is often difficult because a language embeds many dynamic and varied properties in its syntaxes, pragmatics and phonology, which needs to be captured and processed. Over 95% of the world’s 7000 languages are low-resourced for NLP i.e. they have little or no data, tools, and techniques for NLP work.
This project contributes to the efforts in bridging the digital divide between the well-resourced (and researched) languages and the rest by building the translation benchmark dataset and baseline model for the Igbo language.