Skip to main content
6 minutes reading time (1249 words)

The OpenAIRE Research Graph

ORD_moocard_double

​ Bringing scholarly communication back into the hands of scientists 

Tracking and connecting all links between research results, institutions that produced and financed them, and where to find them for consultation or further use.

We are collecting feedback! Find out more here.

The backdrop: Open Science is gradually becoming the modus operandi in research practices, affecting the way researchers collaborate and publish, discover, and access scientific knowledge. Scientists are increasingly publishing research results beyond the article, to share all scientific products (metadata and files) generated during an experiment, such as datasets, software, experiments. They publish in scholarly communication data sources (e.g. institutional repositories, data archives, software repositories), rely where possible on persistent identifiers (e.g. DOI, ORCID, Grid.ac, PDBs), specify semantic links to other research products (e.g. supplementedBy, citedBy, versionOf), and possibly to projects and/or relative funders. By following such practices, scientists are implicitly constructing the Global Open Science Graph, where by "graph" we mean a collection of objects interlinked by semantic relationships.

What if the Global Open Science Graph would be thoroughly and consistently populated, encompassing all scholarly entities, relying on rich metadata, PIDs, and links, the possibilities would be endless. Research would be completely contextualized and traversable, open and free-of-charge, thereby facilitating:

  • Funders and organizations to monitor compliance of Open Science mandates, scientific trends, and research impact;
  • Researchers to build a complete individual scientific record, track the provenance of scientific results, reproduce experiments by reaching all products;
  • Research communities, by viewing and populating the subset of the Open Science Graph relative to the community, to monitor community research trends, identify and calculate community-specific indicators of quality of science;
  • Publishers to launch and promote their journals relying on equal and verifiable quality indicators, opted by research communities in their specific domains, and calculated over the Global Open Science Graph taking into account the whole citation graph, open access, reproducibility degrees, etc.

The OpenAIRE graph maps the knowledge produced by public research and the actors involved in, enabling explorations of content like never before.

What we are developing: the OpenAIRE Research Graph has been conceived as a resource for researchers, funders, organizations, research communities, SMEs, and publishers to achieve the aforementioned objectives by synergically seeking the population and maintenance of the Global Open Science Graph as a complete, trusted, open, public good. It is conceived to be:

Open: it is available for download and re-use as CC-BY (due to some input sources whose license is CC-BY); parts of the graphs can be re-used as CC-0;

Transparent: provenance is tracked at the level of the records and, when these are the result of full-text mining, of the properties (provenance also includes an indicator of trust, in the range [0..1]);

Decentralized: metadata and links are collected from data sources, such as institutional/data/software repositories, publishers, registries, and re-distributed to such sources via brokering services.

How we build it: the OpenAIRE Research Graph includes metadata and links between scientific products (e.g. literature, datasets, software, and "other research products"), organizations, funders, funding streams, projects, communities, and (provenance) data sources - the details of the graph data model can be found in Zenodo.org.

The Graph is today available as a BETA release, obtained as an aggregation of the metadata and links collected from ~10.000 trusted sources, further enriched with metadata and links provided by:

  • OpenAIRE end-users, e.g. researchers, project administrators, data curators providing links from scientific products to projects, funders, communities, or other products;
  • OpenAIRE Full-text mining algorithms over around ~12Mi Open Access article full-texts;
  • Research infrastructure scholarly services, bridged to the graph via OpenAIRE, exposing metadata of products such as research workflows, experiments, research objects, software, etc..


Today: As of November 2019, the Graph aggregates around 450Mi metadata records with links, which after deduplication, cleaning, and classification narrow down to ~110Mi publications, ~10Mi datasets, ~180K software research products. ~7Mi other products with 480Mi (bi-directional) semantic links between them. Such products are in turn linked to 7 research communities, organizations, and projects from ~29 funders worldwide.

Access: Access to the BETA release graph is today possible via the Beta EXPLORE portal or via "data dumps" made available via Zenodo.org, among which:

DOIBoost: CrossRef enriched with ORCID, Microsoft Academics, and Unpaywall; the data dump is supplemented with the software and with the data paper to allow for reproducibility;

Scholexplorer: the data dump of the aggregation of 480Mi article-dataset, dataset-dataset links collected form data centres and publishers, provided in JSON Scholix.org format.

Feedback: the graph is currently under pre-release consultation process which will last two months, any feedback to improve its quality in this period and in the future would be welcome. You can do this via Trello or via opening a ticket via the OpenAIRE Helpdesk under the category OpenAIRE Services with the subject "OpenAIRE Research Graph: <your feedback to improve>".

The OpenAIRE Research Graph and the EOSC

The OpenAIRE Research Graph represents the EOSC catalogue of all resources: data, publications, software, methods, etc.., all linked together.

The Graph will be a key resource of the European Open Science Cloud, by providing the EOSC catalogue of all scientific products to support the synergic construction of a common scientific communication ecosystem in line with Open Science research life-cycle principles. Its aim is to contribute at the overall Open Access and Open Science mission, offering fertile ground for the realisation of value-added services such as discovery of content, monitoring Open Science within and across disciplines, and ultimately return scientific knowledge to the citizens, as a public good.

We are not working alone: several other initiatives around the world are trying to materialise the Global Open Science Graph, bearing in mind different customers and target use-cases. Making such graphs interoperable is key to enable the synergic contribution to the Global Open Science Graph. 

The recently established RDA Open Science Graphs for FAIR Data IG addresses such topics and involves major scientists, practitioners, and initiatives in the field. More specifically, the following graph initiatives are already exchanging, or in the process of, metadata and links with the OpenAIRE Research Graph:

  • The FREYA PID graph built by the European Commission project FREYA to materialize entities with PIDs and links between them;
  • The ResearchGraph built by the ResearchGraph Foundation (Australia) whose focus is linking research projects and research outcomes on the basis of co-authorship or other collaboration models such as joint funding and grants;
  • The Open Research Knowledge Graph built by the research team of TIB (Germany) which focuses only on communicated content, e.g. semantics of science, rather than the context e.g., people and institutions
  • The OpenCitations graph which focuses on citation and relative semantics between publications.


Useful links 

The OpenAIRE Research Graph
×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.