OpenAIRE Graph: steadily riding the wild wave of Open Science
by Thanasis Vergoulis, Paolo Manghi, Giulia Malaguarnera, Anastasia Aritzi, Athina Papadopoulou, Claudio Atzori, Alessia Bardi, Miram Baglioni
An open & global map of science
For our ancestors, the globe was considered an immense space that concealed many interesting and unknown places waiting to be explored. However, soon enough, humans invented maps, a tool to facilitate how they explore and navigate the world. Nowadays, scientists, funders, publishers, and other stakeholders in the research ecosystem need a similar tool to help them navigate through the huge and heterogeneous information space created by the continuously growing amount of open and interconnected scholarly metadata records so that they can reveal the valuable latent knowledge within.
Having the vision to create and deliver an open, up-to-date, and global “map of science” across disciplines and countries, from December 2019 onwards, we started providing the OpenAIRE Graph (until recently called the OpenAIRE Research Graph), one of the largest and most heterogeneous collections of scholarly metadata for research products (e.g., articles, datasets, software), other research entities (e.g., projects, institutions, communities), and the links between them. This initiative culminates over ten years of work by OpenAIRE in the domain of scholarly communication to facilitate and advocate for the free flow and sharing of research products and related metadata across researchers, communities, institutions, companies, and policymakers. As a result of this community-driven and technological effort, today, our Graph aggregates and interlinks hundreds of millions of metadata records from tens of thousands of data sources trusted by scientists.
The Graph is being updated bi-weekly and its contents are available for download and re-use as CC-BY via an API, while an open snapshot is released every six months on Zenodo.org. In addition, the principles, data, and vision of the Graph are community-governed: OpenAIRE AMKE that implements and delivers the Graph, is a non-profit legal entity connecting 49 members that represent research and academic organisations who are committed to Open Science and steer activities in their countries (read our Strategy 2023-2025). OpenAIRE AMKE’s participatory governance structure ensures the Graph's endorsement, adoption, operation, and sustainability among its members, countries, and research communities. Finally, the underlying infrastructure has recently adhered to the POSI principles.
The Graph APIs count today 500Mi+ accesses per year via OpenAIRE portals and as third-party services requests. Elsevier’s Scopus and SciVal rely on the APIs, as well as European and worldwide institutional repositories, European Commission (EC’s Participants Portal SYGMA), ORCID, other funders around the globe, researchers, companies, and scholarly services. Furthermore, the Graph will be a key EOSC resource by providing the EOSC with: (i) a catalogue of all research products, core in fostering Open Science and establishing its practices in the daily research activities, and (ii) Open Science monitoring tools, to measure trends and impact of Open Science and funding across communities and Nations. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community.
Planning for the future: a roadmap
Our vision for the next few years is to work intensively on implementing and delivering significantly improved versions of the Graph. To this end, we are drafting a roadmap for the next two years geared towards achieving the following goals:
Goal 1: Make the Graph well-documented and developer-friendly
One of the main objectives of the Graph is to expose scholarly metadata records that third-party developers can use to build added-value services for researchers and other relevant stakeholders. It is only possible to serve this mission by offering well-structured and comprehensive documentation accompanied by related training activities and material. To this end, we have already released a new documentation website that elaborates on the Graph data model and provides details on the data sources and technologies used. In addition, to support new developers, we have also released a Beginner’s Kit that consists of a subset of the Graph that is easy to load on a personal computer and a notebook that contains indicative example queries. Finally, we plan for various training events during the next months, including a hands-on tutorial at the ISSI 2023 conference.
Goal 2: Make the Graph more interoperable
As mentioned above, OpenAIRE works in co-creation activities and close contact with the Open Science community. OpenAIRE co-leads an RDA Interest Group on Open Science Graphs interoperability, aiming to define frameworks to enable data exchange across Knowledge Graphs. The group onboards other key players in the domain, such as the FREYA’s PID Graph, ResearchGraph.org, OpenCitations.net, Crossref, DataCite, and the Open Knowledge Research Graph. The Graph will adopt the outputs of this group, making it even more interoperable.
Goal 3: Further improve the Graph coverage
Our Graph is already collecting metadata records from thousands of scholarly communication sources from all over the world and is covering a large variety of research entities while capturing the most important relationships among them. Still, there is a lot of space for improvements in data sources coverage and data model completeness. In order to tackle this challenge, we are establishing collaboration to incorporate important missing sources (e.g., DBLP, which collects Computer Science literature) or to exploit additional metadata from already onboarded partners (e.g., include citations from Crossref). Regarding the data model completeness, we plan to include new node/entity types in the Graph (e.g., researchers, venues, research activities); to improve or extend the metadata we collect for each entity or relationship, leveraging significant external resources (e.g., Fields of Science & Sustainable Development Goals from SciNoBo, impact indicators from BIP! Toolbox) or the output of improved data mining and machine learning approaches applied on an extended corpus of full texts (e.g., the outputs of the SciLake project). Our overarching goal is to make the Graph an optimal collection for monitoring Open Science impact and trends, such as FAIRness, openness, and reproducibility of science from the institutional, research community, and policy-maker perspectives.
Goal 4: Further improve the Graph content quality and transparency
Since the Graph acts as a means to track and analyse the Open Science scholarly records, it attempts to be as inclusive as possible regarding its input sources. On the other hand, the urgency and momentum of Open Science led to publishing workflows that, depending on the maturity of the context, may lose their traditional assumptions of trust, quality assessment, and control. To this end, we plan to extend the Graph to include, at the level of the PIDs, annotations about the measurable quality of the publishing effort to enable consumers (e.g. researchers, web portals, applications/services) to filter content based on those and authors of the products to improve their records and therefore their reusability. Such annotations may derive from the inherent quality of the publishing venue (e.g. ensuring peer review or metadata curation) or by the completeness and coherence of the metadata record and relationships to other entities in the Graph (a sort of a measurable commitment to publish quality outcomes of the author).
Goal 5: Make the Graph easy to query and maintain
We plan to expand the ways the Graph can be queried, testing cutting-edge open-source technologies for navigational queries (in the context of the SciLake project). In addition, it is important for the Graph to offer high-quality, frequently updated, interconnected scholarly metadata since this is crucial for various applications. To this end, we plan to apply various optimisations in the Graph creation workflow that are expected to reduce the time required for the updates and redesign indexes and intermediate files to reduce their footprint and the required storage space.
New era, new name!
It is clear that we are embarking on a rapid evolution of the Graph into a significantly improved and mature resource, paving the way for the provision of added-value services for the research community at large. To mark this new era of the Graph, its official name has also been adjusted so that it is as minimal as possible and easier to remember. From the OpenAIRE Research Graph, welcome to the OpenAIRE Graph!
Unlock the power of Open Science data with #OpenAIRE_Graph and enter a world full of opportunities.
We look forward to having you with us on this exciting journey!