Home > Research > Datasets > Welsh Summary Creator Dataset
View graph of relations

Welsh Summary Creator Dataset

Dataset

  • Jonathan Morris (Creator)
  • Mo El-Haj (Creator)
  • Dawn Knight (Creator)
  • Ignatius Ezeani (Creator)
  • Dawn Knight (Contributor)
  • Mo El-Haj (Contributor)
  • Ignatius Ezeani (Contributor)
  • Jonathan Morris (Contributor)

Description

Set Ddata Dyma gasgliad o 513 o destunau Cymraeg (erthyglau Wicipedia) a'u crynodebau. Tynnwyd pob erthygl – sydd yn cynnwys o leiaf 500 tocyn o ran hyd – ynghyd â'i grynodeb Wicipedia gan ddefnyddio WikipediaAPI. Mae'r ffeiliau crai – sy'n cynnwys yr erthyglau a’r chrynodebau a dynnwyd o Wikipedia fel ag y maent yn ymddangos yno – ar gael ar ffurf data.zip mewn fformatau html a thestun plaen ac maent wedi'u trwyddedu o dan Drwydded Ryngwladol Creative Commons Attribution 4.0. Gellir gweld y sgriptiau Python ar gyfer cyrchu'r ffeiliau a dynnwyd ac a broseswyd, a'u defnyddio gyda'r ffeil ar y cyd hon a gellir gweld hefyd y cyfarwyddiadau ar sut i’w defnyddio, fel y maent wedi’u disgrifio isod. Dataset This is a collection of 513 Welsh texts (Wikipedia articles) and their summaries. Each of the articles - containing at least 500 tokens in length - was extracted along with its Wikipedia summary using the WikipediaAPI. The raw files - containing the Wikipedia extracted articles and summaries as-is - are available in data.zip in html and plain text formats and licensed under a Creative Commons Attribution 4.0 International License. The Python scripts for accessing the extracted and processed files can be viewed and used with this Colab file with the usage instruction described below.
Date made available2022
PublisherEuropean Language Grid

Contact person