Providing Finnish national Publication Data to OpenAIRE – Case VIRTA
This text was written by Joonas Nikkanen (CSC – IT Center for Science, Finland) and Jochen Schirrwagen (Bielefeld University Library, Germany)
Efficient dissemination and visibility of research results across scientific communication infrastructure boundaries is closely linked to the definition of standards for the description of scientific information and communication protocols. Metadata should be as complete and consistent as possible, as its quality is also part of the services that build upon it and is therefore a prerequisite for its use and acceptance by researchers and the public. At the same time, authors should not be burdened with additional effort and redundant input of bibliographic data.
To achieve this goal, collaborations, resources and active contributions from different infrastructures and their organizations are required.One example of such collaborative effort is the integration of the Finnish VIRTA Publication Information Service into OpenAIRE which has been a work-in-progress from mid-2018 and aims to be in production in the first half of this year. The goal of the integration is to provide Finnish publication metadata from the national aggregator, VIRTA, to OpenAIRE and thus improve the quality of metadata and the visibility of Finnish research results on an European and international level. By help of the integration of the aggregator, serving as a national Current Research Information System (CRIS), a single point of entry is created. Thus the effort needed to be OpenAIRE compliant is shifted from many institutional publication infrastructures to one central content provider.
The integration of VIRTA in OpenAIRE will also solve the following issues. In case of Finland none of the commercial CRIS platforms is currently compatible with OpenAIRE aggregation requirements. Moreover Finnish repositories do not cover the complete research output available from academic institutions in Finland. VIRTA will allow to answer questions like what is the portion of Open Access compared to the total publication output and what is the share of native-language publications. Integration of (national) CRIS with OpenAIRE would provide answers to such questions and enables comparison across national borders. In parallel the integration of institutional CRIS is important as it will greatly improve the coverage and quality of metadata in OpenAIRE and will expand the monitoring capabilities provided by the OpenAIRE portal and dashboards.
VIRTA Publication Information Service
VIRTA Publication Information Service is an advanced data warehouse solution to integrate institutional data at the national level in Finland. VIRTA was launched in spring 2016. The service is developed by CSC – IT Center for Science and owned by the Finnish Ministry of Education and Culture. As a data hub, VIRTA has up-to-date bibliographic information of all scientific publications from 54 Finnish organizations using different local solutions for publication data collection, such as commercial CRISes, self-made publication registers and institutional publication repositories (Figure 1). About 60,000 scientific, professional and non-scholarly publications are transferred per year with all scientific fields covered. Publication metadata in VIRTA is based on a national data model that fulfills the requirements of national higher education institutions' funding model and other needs of monitoring research and development activities.
Two (or three) steps to OpenAIRE integration
1. Mapping the data model to CERIF
The first step of integration is to map the data model in your CRIS system to the CERIF data model as described in the Guidelines. The work needed may vary considerably between the different source systems and their data models. To use proper time and resources at this point it is highly recommended though, as it both improves the interoperability and quality of the metadata and makes the validation phase more fluently later on.
Gladly, there were many similarities between the VIRTA and CERIF data models to start with. However, some key differences had to be addressed. These included for example the vocabulary of publication types, the use of IDs in case of persistent identifiers as well as person IDs. Moreover, open access classifications needed to be homogenized and Finnish national classifications, e.g. scientific fields, needed to be taken into consideration when representing metadata both in human and machine readable formats required in CERIF. The mapping resulted in a rather long table, which includes the source VIRTA element and the equivalent CERIF element and examples for both. This up-to-date mapping is available at: https://wiki.eduuni.fi/x/lRLTB
2. Providing the data in CERIF-XML via OAI-PMH endpoint
As stated in the Guidelines, OpenAIRE harvests metadata by using the OAI-PMH protocol and the endpoint provided by the source system. This endpoint should provide the metadata in CERIF-XML which is made available by using the mapping done to the source system data model.
OAI-PMH was already implemented in VIRTA in order to provide metadata in both Dublin Core and VIRTA-XML formats (Fig. 2). This implementation was used as the basis for implementing OpenAIRE specifications. However, the implementation was extended and now supports an additional metadata prefix oai_cerif_openaire and the supported sets: openaire_cris_publications, openaire_cris_persons, openaire_cris_events, openaire_cris_orgUnits.
3. Any extra steps?
Source systems aiming for OpenAIRE integration may require additional effort to get harvested by OpenAIRE. This might be due to metadata ownership and GDPR related issues, technological or infrastructure solutions not being able support endpoints or other issues which are not directly related to OpenAIRE, but rather have to be solved at the source system level.
Summary
As the VIRTA-OpenAIRE integration goes into production in the following months, more than 350 000 scientific, professional and non-scholarly publications' metadata can be added to OpenAIRE's database and explored via the OpenAIRE portal. By using VIRTA's OpenAIRE integration, the Finnish research organizations do not need to invest in their own solutions for OpenAIRE compliance. This leads to both high cost efficiency and greatly enhances the interoperability of Finnish publication metadata at European level, and in addition expands OpenAIRE's coverage in national metadata aggregators.
• About VIRTA Publication Information Service https://wiki.eduuni.fi/display/cscvirtajtp/VIRTA+in+English
• OpenAIRE Guidelines for CRIS Managers 1.1.1 https://openaire-guidelines-for-cris-managers.readthedocs.io/en/latest/index.html
• Up-to-date mapping table between VIRTA and CERIF data models https://wiki.eduuni.fi/pages/viewpage.action?pageId=80941717
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.