“Knowledge is the engine of our economy. And data is its fuel.”
Sharing links between the published literature and datasets is crucial to achieve the full potential of research data publishing. This article presents the coordination and implementation efforts of the ICSU-WDS/RDA Data Publishing Services Working Group (DPS-WG) and the OpenAIRE infrastructure towards realizing and operating the DLI Service, an open and one-for-all data-literature interlinking service. The service is the result of an (open to others) collaboration between major stakeholders in the field of data publishing and populates and provides access to a graph of dataset-literature and dataset-dataset links collected from a variety of major data centers, publishers, and research organizations. Based on feedback from content providers and consumers, the service will also enable the incremental refinement of an interlinking data model and exchange format, towards shaping up a universal, cross-platform, cross-discipline solution for sharing dataset-literature links.
Introduction and vision
Challenges to realize the full potential of research data exist at different levels - from cultural aspects, such as proper rewards and incentives, to policy and funding, and to technology. The challenges are interconnected and impact a diversity of stakeholders in the research data landscape - including researchers, research organizations, funding bodies, data centers, and publishers. To make progress in overcoming barriers and building a stronger research data infrastructure, it is essential that the different stakeholders work together to address common issues and move forward on a common path. Alongside other organizations, ICSU World Data Systems (ICSU-WDS), the Research Data Alliance (RDA), and OpenAIRE2020 provide useful forums for such collaborations. In particular, they are today working in synergy on an initiative that brings together different parties in the research data landscape with the objective of creating the Data Literature Interlinking Service (DLI Service), i.e. “an open, freely accessible, web based service that enables its users to identify datasets that are associated with a given article, and vice versa” . At the moment of writing, members of the initiative include: the RDA Publishing Data Services Working Group, the OpenAIRE infrastructure, the Research Data Alliance (RDA), ICSU World Data Systems, STM,CrossRef, DataCite, ORCID, the National Data Service, and the RMap project. The vision is that of moving away from the large set of bilateral arrangements that characterizes the research ecosystem today, towards establishing common standards and tools that sit in the middle and interact with all parties (see Figure 1). Such a transition would facilitate interoperability between platforms and systems operated by the different parties, reduce systemic inefficiencies in the ecosystem, and ultimately enable new tools and functionalities to the benefit of researchers.
The DLI Service populates and provides access to a graph of “authoritative” dataset-literature links collected and aggregated from a variety of major data centers, publishers, and research organizations. It is intended to offer facilities for the following classes of actors
Note: formal data acquisition policies, SLAs, and data provider registration procedures will be produced at a later stage; currently each “application” is processed independently with bilateral agreements.
- End-users: searching and browsing the graph of links via the portal
- Third-party service developers: accessing publications and datasets in the graph via programmatic APIs
- Content providers: willing to feed high-quality authoritative links between publications and datasets or between datasets to the service (complete list of content providers).
Based on feedback from content providers and consumers, The DLI Service will refine its underlying interlinking data model and exchange format to make it a universal, cross-platform, cross-discipline solution for collecting and sharing dataset-literature links, balancing between the information that can be shared across content providers and the information needed by its consumers.
In the forthcoming months further work will be carried out towards the delivery of a production service that is fully reliable in terms of QoS and quality of content. The following actions will be undertaken:
- Definition of a content acquisition policy: minimal quality requirements to be respected by content providers in order for their publications, datasets and relative relationships to be aggregated by the system;
- Definition of SLAs for content providers: make sure content providers are aware and agree on how their content (metadata) will be made openly accessible via the service;
- Technical enhancements: data harmonization (e.g. cross-PID deduplication), data programmatic access (e.g. high-throughput resolver), data scalability (e.g. moving away from open source databases).
- Deployment as an OpenAIRE infrastructure operational service: deploying the service on the OpenAIRE hardware infrastructure.
 RDA Data Publishing Services Working Group (DPS-WG) Case Statement