TOWARDS A NETWORK OF ONTOLOGIES FOR THE EUROPEAN POETRY
The main goal of the POSTDATA project is to achieve the standardisation of poetry and to publish the data on poetic works and their analysis as linked open data (LOD). This objective has been undertaken from different points of view.
From the philological point of view, we aimed to develop an abstract model for the representation of poetry. This model was based on existing philological concepts that were present in a representative set of the research projects, manuals, and corpora.
From the technological point of view, the goal of the project was to formalize this philological standardization into Digital Humanities standards. For this purpose, we built an ontological model using Semantic Web technologies and W3C standards (such as OWL). Thus, we have enabled the publication of the metadata extracted from the philological conceptualization as LOD, ready to be shared, linked and improved by the community of practise.
In order to achieve these objectives, we implemented the following steps:
- A comparative analysis of projects and digital repertoires of different poetic traditions. This was the starting point to retrieve the main conceptual elements and properties in order to build a domain model.
- The construction of a domain model that captures the concepts and relationships of the domain of knowledge, that is, the European poetry.
- The construction of a network of ontologies that becomes an interoperable standard in the field of the Semantic Web to represent the domain of European poetry.
- The storage and publication of data in a format that enables it accessibility as LOD.
Starting point- Analysis of repertoires
The starting point for the definition of the Domain Model for the European Poetry (DM-EP) is a set of twenty five repertoires, most of them available on the Web of Documents. These twenty five repertoires are served by databases, where the data models are relational or hierarchical.
The repertoires represent different poetry traditions, languages and cultures. The criterion with most weight in the selection of repertoires was their availability, both in terms of having access to their internal structure and in terms of the ability of our research team to understand and analyse their contents. Nevertheless, great efforts were made in order to gather a representative sample for which the language, the period of composition, and the prosodic system (metre) were considered as defining criteria.
POSTDATA contacted delegates from the different repertoires, inviting them to participate as stakeholders of the project. The delegates were asked to send the structure of the databases and any additional documentation in order for the researchers to be able to analyse and study the data models. Thus, the starting point for the analysis was a set of MySQL dumps, XSD and XML files, Perl scripts, and spreadsheets.
Some of the repositories analyzed are:
- Corpus of Czech Verse
- Eigtheen Century Poetry Archive
- Estonian Runic Songs
- MedDB – Base de datos da Lírica Profana Galego-Portuguesa
- Nederlandse liederenbank
- Répertoire de la poésie hongroise ancienne
- Skaldic Poetry of the Scandinavian Middle Ages
* For more information, see the publication:
Curado Malta, Mariana, Helena Bermúdez-Sabel, Ana Alice Baptista, and Elena Gonzalez-Blanco. 2018. ‘Validation of a metadata application profile domain model‘. International Conference on Dublin Core and Metadata Applications, (18), 65–75. Retrieved from http://dcevents.dublincore.org/IntConf/dc-2018/paper/view/555/675
The map below, also available in this link, locates these repertoires in one of the countries where the poetic tradition at hand was originated, and they are grouped according to chronological criteria.
Illustration 1. Map of repertories
The definition of the domain model, a common conceptual model that should represent the informational needs of the European poetry (EP) community of practice, integrates the data requirements that result from defining the functional requirements, together with the results of the following sub-activities:
- Analysis of the data model of a representative sample of EP databases.
- Analysis of a survey addressed to the final users of the repertoires in order to understandthe data needs of the users of poetry databases.
We applied a reverse engineering approach using software engineering techniques. To extract and compare all the concepts in each data model and to construct a common model out of them, the work team decided to build conceptual models for each one of the data models analysed. This process is described in detail in Curado Malta, Centenera, and Gonzalez-Blanco (2017)1 and Bermúdez-Sabel, Curado Malta, and Gonzalez-Blanco (2017)2.
The DM-EP was developed in an iterative way and, over time, information from different sources was collected and included as explained as follows. The diagram below shows the workflow of the Domain Model development process
Illustration 2. Workflow of the Domain Model development process.
* For more information, see the deliverable Poetry Standardization and Linked Open Data
The domain model
The DM-EP is a model with 40 entities, 494 attributes and 409 relations.
The complete model is available at http://postdata-prototype.linhd.uned.es/domain-model.php
This model is very complex due mainly to its comprehensiveness. Therefore, to facilitate its understanding and visualization, we identified certain areas of knowledge to conduct a simpler study and analysis. Each area includes the concepts and properties related to the theme of the area. In addition, there is a division that contains a miscellany of concepts that are related to the works but do not refer to a specific domain of knowledge. These areas are shown in the image below that corresponds to the main page of the model visualization tool.
Illustration 3. Visualization tool page
1 Bermúdez-Sabel, Helena, Mariana Curado Malta, and Elena Gonzalez-Blanco. 2017. ‘Towards Interoperability in the European Poetry Community: The Standardization ofPhilological Concepts’. In Language, Data, and Knowledge: First InternationalConference, LDK 2017, Galway, Ireland, June 19-20, 2017, Proceedings, edited by Jorge Gracia, Francis Bond, John P. McCrae, Paul Buitelaar, Christian Chiarcos, andSebastian Hellmann, 156–65. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-59888-8_14.
2 Curado Malta, Mariana, Paloma Centenera, and Elena Gonzalez-Blanco. 2017. ‘Using Reverse Engineering to Define a Domain Model: The Case of the Development of a Metadata Application Profile for European Poetry’. In Developing Metadata Application Profiles, edited by Mariana Curado Malta, Ana Alice Baptista, and Paul Walk,146–80. IGI Global. https://doi.org/10.4018/978-1-5225-2221-8.
As mentioned, one of the aims of POSTDATA project is to create a platform where poetry researchers can publish their semantically enriched data about European Poetry as LOD.
Consequently, the development of an ontology will assist researchers with the organization of their information and with the application of computational methods to their data. Our intention is to build a platform to facilitate the communication between the European poetry research community, making their data interoperable. Moreover, this platform will enable the creation of numerous applications to retrieve, query and edit the data.
Due to the complexity of the domain, we considered the areas of knowledge identified in the model and from them, we built a modular design for the ontological model. These areas were further divided into subdomains so as to to manage the complexity and facilitate the transition towards the ontology.
The semantic scope of these areas and subdomains was profusely analyzed which aide us to determine which of the subdomains corresponds to a complete and independent ontology and which of them should be merged to create a larger ontology.
For this task, we applied the following criteria to define which ontologies must been created. The criteria are:
- 1. Strong cohesion in every ontology, that is, the classes in the ontology are related and it is not necessary to use any external class to complete the classes coherence and definition. Therefore, the ontologies are self-contained.
- 2. Weak coupling, that is, small number of relations between ontologies. These relations must concern the enrichment of the knowledge provided by the ontology and consider if additional features need to be covered.
In addition, we implemented a refinement of the concepts and properties identified in the domain for each of the subdomains together with the identification of ontologies and ontological design patterns to increase reuse and alignment.
This process resulted in a network of ontologies presented in the following illustration and that can be consulted in the link that appears below.