Home > Research > Publications & Outputs > Abstractive Hindi Text Summarization

Electronic data

  • 2023.icon-1.58

    Final published version, 1.15 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

View graph of relations

Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting. / Lal, Daisy; Rayson, Paul; Singh, Krishna Patap et al.
Proceedings of the 20th International Conference on Natural Language Processing (ICON). ed. / Jyoti D. Pawar; Spbha Lalitha Devi. NLP Association of India (NLPAI), 2023. p. 603-612.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Lal, D, Rayson, P, Singh, KP & Tiwary, US 2023, Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting. in JD Pawar & SL Devi (eds), Proceedings of the 20th International Conference on Natural Language Processing (ICON). NLP Association of India (NLPAI), pp. 603-612. <https://aclanthology.org/2023.icon-1.58/>

APA

Lal, D., Rayson, P., Singh, K. P., & Tiwary, U. S. (2023). Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting. In J. D. Pawar, & S. L. Devi (Eds.), Proceedings of the 20th International Conference on Natural Language Processing (ICON) (pp. 603-612). NLP Association of India (NLPAI). https://aclanthology.org/2023.icon-1.58/

Vancouver

Lal D, Rayson P, Singh KP, Tiwary US. Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting. In Pawar JD, Devi SL, editors, Proceedings of the 20th International Conference on Natural Language Processing (ICON). NLP Association of India (NLPAI). 2023. p. 603-612

Author

Lal, Daisy ; Rayson, Paul ; Singh, Krishna Patap et al. / Abstractive Hindi Text Summarization : A Challenge in a Low-Resource Setting. Proceedings of the 20th International Conference on Natural Language Processing (ICON). editor / Jyoti D. Pawar ; Spbha Lalitha Devi. NLP Association of India (NLPAI), 2023. pp. 603-612

Bibtex

@inproceedings{6323f2543a4a41e3a258005ef14fa763,
title = "Abstractive Hindi Text Summarization: A Challenge in a Low-Resource Setting",
abstract = "The Internet has led to a surge in text data in Indian languages; hence, text summarization tools have become essential for information retrieval. Due to a lack of data resources, prevailing summarizing systems in Indian languages have been primarily dependent on and derived from English text summarization approaches. Despite Hindi being the most widely spoken language in India, progress in Hindi summarization is being delayed due to the lack of proper labeled datasets. In this preliminary work we address two major challenges in abstractive Hindi text summarization: creating Hindi language summaries and assessing the efficacy of the produced summaries. Since transfer learning (TL) has shown to be effective in low-resource settings, in order to assess the effectiveness of TL-based approach for summarizing Hindi text, we perform a comparative analysis using three encoder-decoder models: attention-based (BASE), multi-level (MED), and TL-based model (RETRAIN). In relation to the second challenge, we introduce the ICE-H evaluation metric based on the ICE metric for assessing English language summaries. The Rouge and ICE-H metrics are used for evaluating the BASE, MED, and RETRAIN models. According to the Rouge results, the RETRAIN model produces slightly better abstracts than the BASE and MED models for 20k and 100k training samples. The ICE-H metric, on the other hand, produces inconclusive results, which may be attributed to the limitations of existing Hindi NLP resources, such as word embeddings and POS taggers.",
author = "Daisy Lal and Paul Rayson and Singh, {Krishna Patap} and Tiwary, {Uma Shanker}",
year = "2023",
month = dec,
day = "17",
language = "English",
pages = "603--612",
editor = "Pawar, {Jyoti D.} and Devi, {Spbha Lalitha}",
booktitle = "Proceedings of the 20th International Conference on Natural Language Processing (ICON)",
publisher = "NLP Association of India (NLPAI)",

}

RIS

TY - GEN

T1 - Abstractive Hindi Text Summarization

T2 - A Challenge in a Low-Resource Setting

AU - Lal, Daisy

AU - Rayson, Paul

AU - Singh, Krishna Patap

AU - Tiwary, Uma Shanker

PY - 2023/12/17

Y1 - 2023/12/17

N2 - The Internet has led to a surge in text data in Indian languages; hence, text summarization tools have become essential for information retrieval. Due to a lack of data resources, prevailing summarizing systems in Indian languages have been primarily dependent on and derived from English text summarization approaches. Despite Hindi being the most widely spoken language in India, progress in Hindi summarization is being delayed due to the lack of proper labeled datasets. In this preliminary work we address two major challenges in abstractive Hindi text summarization: creating Hindi language summaries and assessing the efficacy of the produced summaries. Since transfer learning (TL) has shown to be effective in low-resource settings, in order to assess the effectiveness of TL-based approach for summarizing Hindi text, we perform a comparative analysis using three encoder-decoder models: attention-based (BASE), multi-level (MED), and TL-based model (RETRAIN). In relation to the second challenge, we introduce the ICE-H evaluation metric based on the ICE metric for assessing English language summaries. The Rouge and ICE-H metrics are used for evaluating the BASE, MED, and RETRAIN models. According to the Rouge results, the RETRAIN model produces slightly better abstracts than the BASE and MED models for 20k and 100k training samples. The ICE-H metric, on the other hand, produces inconclusive results, which may be attributed to the limitations of existing Hindi NLP resources, such as word embeddings and POS taggers.

AB - The Internet has led to a surge in text data in Indian languages; hence, text summarization tools have become essential for information retrieval. Due to a lack of data resources, prevailing summarizing systems in Indian languages have been primarily dependent on and derived from English text summarization approaches. Despite Hindi being the most widely spoken language in India, progress in Hindi summarization is being delayed due to the lack of proper labeled datasets. In this preliminary work we address two major challenges in abstractive Hindi text summarization: creating Hindi language summaries and assessing the efficacy of the produced summaries. Since transfer learning (TL) has shown to be effective in low-resource settings, in order to assess the effectiveness of TL-based approach for summarizing Hindi text, we perform a comparative analysis using three encoder-decoder models: attention-based (BASE), multi-level (MED), and TL-based model (RETRAIN). In relation to the second challenge, we introduce the ICE-H evaluation metric based on the ICE metric for assessing English language summaries. The Rouge and ICE-H metrics are used for evaluating the BASE, MED, and RETRAIN models. According to the Rouge results, the RETRAIN model produces slightly better abstracts than the BASE and MED models for 20k and 100k training samples. The ICE-H metric, on the other hand, produces inconclusive results, which may be attributed to the limitations of existing Hindi NLP resources, such as word embeddings and POS taggers.

M3 - Conference contribution/Paper

SP - 603

EP - 612

BT - Proceedings of the 20th International Conference on Natural Language Processing (ICON)

A2 - Pawar, Jyoti D.

A2 - Devi, Spbha Lalitha

PB - NLP Association of India (NLPAI)

ER -