Implementing OpenAIRE Guidelines for CRIS Managers “Lessons Learned” by the Dutch aggregator NARCIS Portal

In the context of the Horizon 2020 project OpenAIRE Advance, the institute DANS (Data archiving and Networked Services) in the Netherlands implemented the OpenAIRE CRIS/CERIF Guidelines using a CRIS on a national level: NARCIS Portal.
This blog describes the "lessons learned" during the technical implementation of the OpenAIRE (OA) CERIF-XML profile for CRIS managers into the NARCIS domain. This implementation has been part of Subtask 6.1.2. Interoperability - Connecting with CRIS-CERIF. In general: the creation of the CERIF-entities is accomplished by XSLT-transformations of the native NARCIS internal format where possible.

NARCIS is the Dutch national aggregator of scholarly research output, maintained by DANS-KNAW. NARCIS provides access to scientific information, including (open access) publications from the repositories of all the Dutch universities, the Academy, the national research funder NWO, and a number of research institutes (around 40 institutes). NARCIS also harvests the metadata of datasets from Dutch data archives (around 25). Besides NARCIS gives an overview of descriptions of research projects, researchers and research institutes. OpenAIRE harvest NARCIS as focal point for scholarly output in the Netherlands.

Because of its role as an aggregator, the NARCIS domain has to deal with different (input) models. While all of these are merged into one model, the metadata is scattered around within different formats and components, like relational databases, indexes, file-storage and so on.

Lessons learned

In short we describe the major issues NARCIS ran into during conversion of patents, persons, embedded pmh-dereferenceable person objects, funding, equipment & event, PMH (Protocol for Metadata Harvesting) identifiers, PMH provenance, and output validation.

NARCIS 'Patents' is a sub-type of a 'Publication' within the NARCIS model. Within OA-CERIF, this is a top-level XML element. Therefore, NARCIS needed to change the ingest-flow of NARCIS Publication types.

However deprecated, the Person name-identifier 'Digital Author Identifier' (DAI) is still commonly available within the Dutch domain. This type was initially not available from the OA-CERIF application profile. To be able to use this identifier, NARCIS did a pull-request (PR) on the guidelines-cris-managers to add this identifier to the CERIF-XML profile. After merging the PR, NARCIS is now able to use this DAI type within the profile.
Due to the restricted list of allowed name-identifiers to be used, NARCIS needs filtering of Person name-identifiers. Therefore, the output may lack some identifier types, that may, or may not be useful to OpenAIRE.

Embedded pmh-dereferenceable Person objects
The implementation of embedded and pmh-dereferenceable Person elements within Authors in Publications, Products, Patents, etc. is impossible to create. This is because the NARCIS internal/local identifier, which is needed to create the pmh identifier to dereference to, is unknown in the source repository.
In this case NARCIS cannot map a given 'author' name to the corresponding Person object and make it dereferenceable. All it can do is supply a 'DisplayName' field, that cannot be dereferenced.
To solve this problem, NARCIS created a service that resolves an incoming Person in the NARCIS domain, by its supplied (federated) Identifiers to its internal/local identifier. If it resolves, NARCIS is able to create a full dereferenceable embedded Person for this object, by supplying the PMH-identifier, instead of a 'DisplayName' only. This solution will only work if the source repository supplies one or more federated identifiers and NARCIS domain holds at least one of them.

However scarcely available in the NARCIS domain, Funding was too much of an effort implementing. Partly because Funders (Person/organisation) within NARCIS are often part of a research Project and Funding/amounts are not available.
Implementing this type, should have ended up with a few Funders only, like the Dutch national funder NWO and the EC and no amounts. This was not worth the effort creating a mapping.

Equipment & Event
These entities are not part of the NARCIS model and therefore could not be implemented.

PMH identifiers
The format of the OAI-PMH identifiers described in the OA-CERIF profile forced NARCIS to change the PMH software, to adhere to this format. This, while the default format from the PMH-software was valid though.

PMH provenance
Using the PMH provenance according the OA-CERIF specification was trivial. Since NARCIS already had implemented this feature, because of its aggregator role.

Output validation
During implementation, NARCIS's preferred way of validating the (PMH) output was using a local instance of the openaire-cris-validator.

More information:

Authors: Wilko Steinhoff (DANS/OpenAIRE) and Elly Dijk (DANS/OpenAIRE))


