Name | Description | Annotation | Items | Format | Disponibility | Link |
---|---|---|---|---|---|---|
Disco | Diachronic Spanish Sonnet Corpus | Metric y enjambment | 4087 sonnets | Plain text, TEI | Public | https://github.com/linhd-postdata/disco |
Carvajal | Corpus Spanish poems written by Antonio Carvajal Milena | Metric and rhyme | 300 poems | Plain text | Private | – |
Poesi.as | Corpus Spanish poetry | – | ~80.000 poems | Plain text | Public | https://github.com/linhd-postdata/poesi.as |
Fandom lyrics | Corpus Spanish songs | – | ~100.000 songs | Plain text | Private | – |
Gongocorpus | Corpus of Góngora’s poetry | Metric | 241 poems | Plain text | Public | https://github.com/linhd-postdata/gongocorpus |
Biblioteca Italiana | Corpus Italian poems | – | 18.000 poems | Plain text, TEI | Public | https://github.com/linhd-postdata/biblioteca_italiana |
Middle High German Conceptual Database | Subset of the available poems in the MHDB Database at the University of Salzburg | Metric | ~500.000 verses | Plain text | Public | https://github.com/linhd-postdata/MHDBDB |
Poeti d’Italia | Corpus of Italian poems | – | 823 poems | Plain text, TEI, HTML | Public | https://github.com/linhd-postdata/poetiditalia |
Pedecerto | Corpus of Latin poems | Metric | 457 poems | XML | Private | – |
TextGrid Poetry Corpus | Corpus of German poems | – | 100.000 poems | Plain text | Public | https://github.com/linhd-postdata/textgrid-poetry |
Project Gutenberg Poetry Corpora | Corpus of verses in Dutch, English, French, German, Italian, Portuguese, and Spanish. | – | 3 million verses | Plain text | Public | https://github.com/linhd-postdata/projectgutenberg-poetry-corpora |
Miladinovci | Corpus of Macedonian songs | – | 678 songs | Plain text | Private | – |
Métrique en Ligne | Corpus of French poems | Rhyme | 5.000 poems | Plain text | Public | https://github.com/linhd-postdata/metrique-en-ligne |
Hismetag corpus | Corpus of texts in medieval Spanish | Named entities | 10 texts | Plain text, XML | Public | https://github.com/linhd-postdata/hismetag-corpus-manually-tag |
EDFU | Corpus of Spanish words syllabified | Syllables | ~100.000 words | Plain text | Public | https://github.com/linhd-postdata/edfu |
PULPO | Corpus of verses in Spanish, English, French, Italian, Czech, Portuguese, Arabic, Chinese, Finnish Hungarian, Russian, and German | Language | ~95 Million words | Plain text | Public | https://huggingface.co/datasets/linhd-postdata/pulpo |
Spanish Stanzas | Corpus of Spanish stanzas | Stanzas | 5005 stanzas | Plain text | Public+Private | https://github.com/linhd-postdata/stanzas-evalution-public |
Corpora and datasetsSalva Ros2022-06-29T09:03:09+00:00