After the OpenAIRE team set up the Research Community Gateway (henceforth: gateway) for DARIAH, we started populating it with DARIAH-relevant content from the Research Infrastructure’s primary information hub, the French HAL repository. From here, all the records that contain the term ‘DARIAH’ in any of their metadata fields had been added to the gateway. The DARIAH-relevant Zenodo groups had also been identified and integrated with the gateway. This allows the monitoring of the growth of DARIAH’s self-archiving and self-publishing practices from a single discovery hub.
DARIAH collection on HAL
The DARIAH collection on the French national HAL repository (https://halshs.archives-ouvertes.fr/DARIAH) serves as a primary hub of information for DARIAH-affiliated publications. The repository guarantees their sustainable, long-term availability, findability and maximum potential for further reuse. The semi-automated curation workflow of the repository ensures high-quality metadata. It is now officially open for all the scholarly networks around DARIAH to make their outputs available in a publicly maintained environment that is independent from the publishing sector.
DARIAH collection on Zenodo
DARIAH research products can also be found on Zenodo, probably the most widely used European self-archiving infrastructure (https://zenodo.org/search?page=3&size=20&q=dariah#). In 2019, we launched a DARIAH Zenodo community to systematically collect DARIAH-affiliated outputs that are chosen to be published on Zenodo. The dynamic of assigning records to multiple communities allows us to curate outputs that are resulted from collaborations or projects shared with sister infrastructures (e.g. PARTHENOS, SSHOC, OPERAS-P) and therefore belong to multiple communities. Besides, following the emerging community practice of hosting online events, materials of our virtual workshops and conferences will also be shared via Zenodo.
The DARIAH Open Access policy (halshs-02106332) encourages self-archiving of all sorts of research outputs (publications, data, software) across the DARIAH community and gives a practical implementation plan which is complemented by a step-by-step guide of how to add relevant works to the above collections.
A research infrastructure is by its very nature a meta-organization: it has a bird’s eye view and it can incorporate perspectives from different disciplines and different national communities. The OpenAIRE research community dashboard amplifies this perspective as it enables exploring and bringing together content from both generic repositories and from the national repository services. This possibility was especially worth exploring for DARIAH whose service portfolio, reflecting the heterogeneity of research contexts in arts and humanities, is highly distributed by character. That said, most of the DARIAH affiliated services are not offered through a centralised/federated architecture, but instead through in-kind contributions of individual partners (see Kálmán et al. 2019). To bring together DARIAH-relevant outputs from national contexts, we selected two flagship DARIAH data services that are operating on national levels: the French Social Sciences and Humanities data repository NAKALA and TextGrid (from DARIAH-DE), a virtual research environment for the creation, curation and publication of digital scholarly editions, and developed interoperability frameworks between them and OpenAIRE. In practice, this meant developing crosswalks between Dublin Core (TexGrid) and DC terms (NAKALA) metadata schemas and the DataCite standards.
As a result, content types that are important for the arts and humanities communities, such as the digital critical edition, became visible on European horizon and beyond disciplinary silos as parts of the OpenAIRE research graph. We included content from these services that are directly affiliated with DARIAH (i.e. digital editions that had been produced in collaboration with DARIAH-DE or data-sets from Huma-Num, the institution forming the backbone of DARIAH-FR) and added the rest of their content to the bigger OpenAIRE Digital Humanities and Cultural heritage collection.
NAKALA (https://www.nakala.fr/) is a French repository service dedicated to share data in social sciences and humanities. It allows research teams to deposit their digital data (text files, sound, image, video) in a secure repository that ensures both the long-term availability and persistent citability (using handle PIDs) of their resources. It is powered by Huma-Num, the French infrastructure for digital humanities. As such, NAKALA is part of a coherent chain of complementary data tools and services (https://www.huma-num.fr/services-et-outils) that accompany the research data life cycle and thus enable the FAIRifiaction of research data: NAKALA implements the “AIR” of FAIR principles, the “F” part is done by the discovery platform ISIDORE (https://isidore.science/).
NAKALA offers three main types of services: permanent data access, assignment of a PID to be able to cite data and metadata presentation services. These latter are achieved through the following semantic web standards and technologies and OAI-PMH EndPoint. First, NAKALA is operated along the Triple Store-type Resource Description Framework (RDF) that allows for the rich and networked description of content via keeping information graphs such as data-creator-publisher-concept-collection information together. Besides, it makes standardized information sharing procedures possible and thus enables to build applications, for instance, interactive maps, for the valorization of data. Second, NAKALA enables other repositories (such as the DBPedia (http://en.dbpedia.org/) or discovery services (such as ISIDORE) to connect with and harvest the database via a standard OAI-PMH protocol.
NAKALA does not offer a search and discovery layer (but can easily achieve interoperability with such services) and therefore the OpenAIRE Research Community Gateway could serve as an especially suitable complementary service.
The TextGrid Repository
TextGrid provides a Virtual Research Environment (VRE) to support the creation, enrichment and publication of digital textual or musicological editions. This environment allows researchers worldwide to explore and collaborate on digitized records of cultural heritage (in TEI, MEI encoded XML formats) regardless of their physical location. The environment offers solutions for the complete scientific workflow from collecting and generating of primary data through editions and publications. As such, TextGrid embraces the idea of bridging the sphere where science is performed with the sphere where science is published. The TextGrid Publish tool allows users to permanently publish the files that they have used in the TextGrid Lab in the TextGrid Repository. When published, the metadata will be validated automatically. The published objects are freely accessible and searchable on www.textgridrep.de after publishing and are accessible via their URI in the link like https://textgridrep.org/textgrid:vqn2.0. All published objects receive a persistent identifier (such as hdl:11858/00-1734-0000-0005-1424-B) and all data are migrated to a static storage with backups, duplicated, and long term archive services.
The repository is hosted and sustainably operated by the Humanities Data Centre (HDC), a joint venture of Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) and Göttingen State and University Library (SUB).
Finally, the Research Community Gateway can also be considered as a tool that helps to monitor the dynamism of the research infrastructure and allows us to develop a better understanding of how the DARIAH research objects shape/impact the arts and humanities research landscape. To this end, we needed to check its compliance (or complementarity) with DARIAH’s current publication monitoring workflow. Currently, a DARIAH Zotero library is used for this purpose. The curation policy behind the collection is that every time a record containing the term 'DARIAH' is indexed by Google Scholar, it will (semi-)automatically be added to the collection. Considering the diversity of arts and humanities publication landscape also in terms of metadata quality, this seemed to be the most inclusive large-scale solution that is not restricted to scholarly databases that are selective with SSH content (such as Scopus, Web of Science) nor pose significant metadata quality requirements (such as PIDs) as inclusion criteria.
For the time being, it is a manual workflow applied to decide which records are associated with DARIAH loosely (e.g. only mentioning DARIAH or DARIAH services, events) or strongly (coverage of DARIAH events, works resulted by the use of DARIAH services, works published by DARIAH Working Groups or DARIAH-affiliated authors etc.) and the integration of this latter group to the OpenAIRE DARIAH Gateway is happening on retroactive basis started from 2020 going backwards. One entry barrier is that publications without PIDs are impossible to be added to the gateway.
The inclusion of relevant items from the DARIAH Zotero library to the DARIAH gatewaydashboard makes it possible:
Outputs of large-scale, strongly collaborative European projects can easily become scattered across different institutional, national or thematic repositories reflecting the affiliation of contributor groups. Publications in the traditional sense (i.e. research articles or book chapters published at academic presses) add further silos to this picture. Harvesting a great number of repositories and scholarly databases, the OpenAIRE Research Graph and the Research Community Gateways bring together and make searchable outputs belonging to the same project through keyword search (project name or grant agreement number) and faceted search functionalities.
The example below shows outputs of DARIAH’s CENDARI project (https://www.dariah.eu/activities/projects-and-affiliations/cendari/) collected from 20 different content providers.
Enabling discovery paths like this significantly contributes to the sustainability of research projects and the reusability of their results while nurturing the diversity of their hosting infrastructures and publication formats.
Via the gateway, interested communities (researchers, service developers, members of the broader DARIAH network) can get an overview of DARIAH-affiliated digital scholarly objects with searching and browsing facilities. Furthermore, everyone, especially researcher communities around DARIAH are invited to:
The content of the DARIAH community gateway forms a part of the OpenAIRE research graph, which means that it is also searchable in broader discovery contexts, such as OpenAIRE Explore or its disciplinary branch, the Digital Humanities and Cultural Heritage OpenAIRE Community Gateway.
Building interoperability with central, European discovery frameworks is key to make thematic services and knowledge production in their associated disciplines visible and valued on a shared European horizon. This paves the way for realizing the only desirable vision of Open Science that is inclusive with all disciplines, regardless of epistemic traditions or level of consensus on metadata standards. Developing and sustaining information management systems that remain in public hands and go beyond research outputs in the traditional sense and include also datasets, software etc. is primarily an important step towards the (formal) recognition of such digital scholarly objects as genuine forms of publications. Building metadata crosswalks from local services to the OpenAIRE guidelines requires both insider knowledge about the service and on-scale perspective on the bigger landscape to which the service in question is becoming connected as a result of interoperability. It is an iterative process, active support from the OpenAIRE team (enabled by the OpenAIRE Advance project framework) was essential.
On the other hand, since such interoperability frameworks operate along a standard of a common denominator, they necessarily come with losing a certain level of richness of the integrated material (for instance, in the case of integrating digital editions published in the TextGrid Repository, we had to decide whether to include the publication date of the original work vs. that of the critical edition into the metadata crosswalk, the inclusion of both information were not possible). Therefore, on the one hand, awareness of this common denominator nature of discovery frameworks should be present when users interact with the discovery surface. On the other hand, keeping provenance information (e.g. where the record in question is coming from and under which conditions users can access it, information about the last updates etc.) clear and rich by design is an especially valuable asset of the OpenAIRE Research Community Gateways.
A similarly important inclusivity criterion is finding a delicate balance between keeping metadata quality high and a technical entry threshold as low as possible. In the context of adding DARIAH affiliated publications (book chapters, research articles etc.) to the gateway, we had to leave out records that did not come with a Persistent Identifier. Although publisher metadata quality is out of our control, considering the growing importance of PIDs as basic units in any digital scholarly ecosystem, we can expect this to change for the better in the near future.
We owe special thanks to Yoann Moranville and Nicolas Larrousse who created the metadata crosswalks between NAKALA and OpenAIRE and Maximillian Behnert-Brodhun and Stefan Funk who created the crosswalks between the Text-Grid Repository and OpenAIRE. From the OpenAIRE team, the generous support and contributions of Alessia Bardi, Mirjiam Baglioni, Amelie Bäcker and Harry Dimitropoulos was instrumental in setting up the gateway, populating it and helping the development of the interoperability frameworks with the DARIAH services.
This use case was written by Erzsébet Tóth-Czifra (Open Science Officer at DARIAH).