Skip to main content
8 minutes reading time (1518 words)

OpenAIRE Research Graph: An intelligent gateway to scholarly communication

graph-copy

The OpenAIRE Research Graph is gradually becoming a key asset of OpenAIRE. We are investing a lot of effort in its development and improvement. Quality, inclusiveness, transparency and open governance are key in our thinking, but underneath all, there is a huge ongoing technical effort for making it robust and usable for discovery and monitoring. Many of our users are asking us to know more about what the graph is about? Why do we need one and what does OpenAIRE Graph bring in? We listened and we compiled a set of key points that will guide the uninitiated and the experts alike into our world. Into our future. 

1. A knowledge graph for research: a dynamically growing database of interconnected scholarly communication entities

The basics: what is the role of graph databases?

Well, with the exponential growth of data, big data, the tremendous amount of interconnections among data, and industry need to answer complex questions, the graph databases are ideal. Just imagine that you have been self-trained over the last two decades to search on the internet using multiple criteria and you got used into getting back results no matter what you are looking for. OpenAIRE Research Graph is a graph database that interlinks information from various resources by using nodes (also called points or vertices) which are connected by edges (links, lines). Nodes can be connected in any way possible and without any constraint on connection numbers. Graphs could have two types of edges; ones that have direction (directed edge) and others that do not have (undirected edge). That characteristic, defines and distinguishes a directed vs. an undirected graph. You may read more here.

Credits: https://medium.com/basecs/a-gentle-introduction-to-graph-theory-77969829ead8

Example of an undirected graph: Facebook! A node in Facebook is you, connected with other friends (nodes) in a bidirectional way.

Example of a directed graph: Twitter! In this case, you are still a node that is connected (by following) with other nodes, except that following someone does not automatically mean they follow you back! (see  an example here).

What are the ontologies used on OpenAIRE Research Graph?

On the OpenAIRE Research Graph, the nodes are scholarly objects: publications, datasets, metadata information, authors, journals, etc. Nodes have labels to tell you what type of thing they are; an author's node has labels such as id, first_name, last_name. Guessed correctly, the type of the graph is a directed one. Why? Because a research article has specific authors, DOI, research products, organizations, funders, communities and so on. The authors though, do not only write one paper but many, that are linked with different journals, authors, DOIs, etc.

As you can imagine, knowing that a node has a directed connection might not be enough. That is why we need the so-called ontologies specification that "includes descriptions of concepts and properties in a domain, relationships between concepts, constraints on how the relationships can be used and individuals as members of concepts." For example we need to know that a publication has an author named X, has a DOI with number XX, and is published in Journal XXX. That enriches the directed edges and helps us have a clear structure.

What is a scholarly communication graph good for?

  • Discovery of information, along a big data space, fast results, accuracy and answers to complex questions
  • Monitor of all interconnections and interrelated objects, tracking of information

How can one experience the OpenAIRE Research Graph?

By querying

Visit the OpenAIRE EXPLORE service, and search for a term. EXPLORE will bring you on screen all the related results directly from the OpenAIRE Research Graph. 

Visually

Thanks to the OpenAIRE-Advance open innovation call, Opscidia created an ontology generator tool that allows users to search a term in the OpenAIRE Graph and view a simple visualization of nodes and their edges. A detailed presentation is available here: https://www.openaire.eu/opscidia-ontology-generator

To try out the Opscidia ontology generator, please visit: https://openaire.opscidia.com/ 

2. The OpenAIRE Graph is formed by 124Million publications and 14Million research data, searchable via EXPLORE, on your browser, in less than 5 seconds with accuracy and trust

If you want to travel along the Nexus data trek, you need to use a fast vehicle! Speed is another characteristic of OpenAIRE Research Graph, as it handles big data in a very efficient manner. According to a Tech Validate and IBM survey in 2017, 57% of participants recognized speed as the top technology benefit of a graph database.

How does it do that?

Due to the OpenAIRE Research Graph database model, "a native graph" maintains the information related to "its neighbor nodes" without having to "load or touch unrelated data for a given query". In simple words, when you search on OpenAIRE EXPLORE, only the necessary nodes are queried, without triggering their interconnected nodes that possess other kinds of information that have no value to the query. Another ability that the OpenAIRE Research Graph has, is to combine multiple dimensions to manage big data, including time series, spatial attributes, data types, languages and more, combined under one search query!

How do I know that the information is trustworthy?

The operation of the OpenAIRE Research Graph is transparent and open to all. A visit at the https://graph.openaire.eu/about#architecture website demonstrates its architecture. More specific, the section that explains "How we built it", shows all the stages of the data path along the way inside the OpenAIRE Research Graph. The deduplication, Enrichment, Post-Cleaning processes, ensure that the information is valid and trustable.

3. OpenAIRE Research Graph is based on technologies with tremendous growth

Gartner has predicted that graph processing and graph databases "will grow at 100 percent annually over the next few years to accelerate data preparation and enable more complex and adaptive data science." It looks like the Knowledge graphs have reached a peak on the Gartner expectations over time graph, advancing from early innovation stages back in 2018 and 2019.

And it's not just this evidence, as there are more than 28K peer-reviewed scientific publications about graph data science in recent years from which 11K open access research outcomes, 23 projects and 195 organizations.

The OpenAIRE Research Graph rides the slope of data oceans

How does the Graph handle all that data available? It is built and structured based on standards on classifications. Specifically, the Graph model is respecting the revised field of Science and Technology (FOS) classification in the Frascati manual. That helps the Graph users to enjoy R&D outcomes that can be comparable (statistics), valuable and evidence based. Open Citations, with the rich collection of metadata and links that they offer, drive the Graph to new uncharted waters. An example follows below.

OpenAIRE Research Graph as the Data Space of advanced policy making platform - IntelComp - in Europe

As of January 2021, the OpenAIRE Graph provides its data collection to the IntelComp H2020 project. IntelComp is a Competitive Intelligence Cloud/High Performance Computing Platform for Artificial Intelligence-based Science, Technology and Innovation (STI) Policy Making. IntelComp will therefore benefit from the well-defined, trustable and FAIR data available through the OpenAIRE Graph, to reach its mission and vision towards advanced, data and science driven policy making in Europe on STI. The OpenAIRE Graph will benefit from this collaboration, by incorporating information focusing on policy making, learning which data, drive and motivate policy makers, what matters most and how blended information from various sources can be of high-added value for future policies in Europe.

 4. You can download the latest OpenAIRE Graph dump on Zenodo or access it through an API

OpenAIRE Research Graph is open, accessible and available to all! The records produced by the graph are free and can be used under a CC-BY license. The information on how to access or even download a dump is available here: https://graph.openaire.eu/resources

5. OpenAIRE Research Graph is the basis for many OpenAIRE services that provide, correlate, extract and produce research outcomes for EOSC communities

The main OpenAIRE services that use the OpenAIRE Research Graph, are available through the OpenAIRE services catalogue here: http://catalogue.openaire.eu/search;quantity=10

Just to give you an overview:

  • OpenAIRE EXPLORE: It is the OpenAIRE Graph 'frontend' service that allows users to search for simple or complex Open Access research objects available on the OpenAIRE Research Graph.
  • OpenAIRE CONNECT: Is where research communities set up, manage, share, link, disseminate and monitor information available on OpenAIRE Research Graph about their research.
  • OpenAIRE PROVIDE: Is the service that offers a variety of tools to enable content providers to connect scholarly works with the OpenAIRE Research Graph .
  • OpenAIRE MONITOR: Is an on-demand personalized research monitoring tool for funders, research institutions and policy makers that assists them track and view the impact or open access research outcomes derived from the OpenAIRE Research Graph.
  • OpenAIRE OpenScienceObservatory: It presents a more generic visualization interface of the OpenAIRE Research Graph, that allows users to view the European research landscape, track trends for open access products, reveal hidden potential on existing resources and collaboration patterns.
×
Stay Informed

When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.

Tools for legal issues in Research Data Management
Cypriot NI4OS-Europe event and OpenAIRE contributi...