The OpenAIRE Research Graph is gradually becoming a key asset of OpenAIRE. We are investing a lot of effort in its development and improvement. Quality, inclusiveness, transparency and open governance are key in our thinking, but underneath all, there is a huge ongoing technical effort for making it robust and usable for discovery and monitoring. Many of our users are asking us to know more about what the graph is about? Why do we need one and what does OpenAIRE Graph bring in? We listened and we compiled a set of key points that will guide the uninitiated and the experts alike into our world. Into our future.
The basics: what is the role of graph databases?
Well, with the exponential growth of data, big data, the tremendous amount of interconnections among data, and industry need to answer complex questions, the graph databases are ideal. Just imagine that you have been self-trained over the last two decades to search on the internet using multiple criteria and you got used into getting back results no matter what you are looking for. OpenAIRE Research Graph is a graph database that interlinks information from various resources by using nodes (also called points or vertices) which are connected by edges (links, lines). Nodes can be connected in any way possible and without any constraint on connection numbers. Graphs could have two types of edges; ones that have direction (directed edge) and others that do not have (undirected edge). That characteristic, defines and distinguishes a directed vs. an undirected graph. You may read more here.
Example of an undirected graph: Facebook! A node in Facebook is you, connected with other friends (nodes) in a bidirectional way.
Example of a directed graph: Twitter! In this case, you are still a node that is connected (by following) with other nodes, except that following someone does not automatically mean they follow you back! (see an example here).
What are the ontologies used on OpenAIRE Research Graph?
On the OpenAIRE Research Graph, the nodes are scholarly objects: publications, datasets, metadata information, authors, journals, etc. Nodes have labels to tell you what type of thing they are; an author's node has labels such as id, first_name, last_name. Guessed correctly, the type of the graph is a directed one. Why? Because a research article has specific authors, DOI, research products, organizations, funders, communities and so on. The authors though, do not only write one paper but many, that are linked with different journals, authors, DOIs, etc.
As you can imagine, knowing that a node has a directed connection might not be enough. That is why we need the so-called ontologies specification that "includes descriptions of concepts and properties in a domain, relationships between concepts, constraints on how the relationships can be used and individuals as members of concepts." For example we need to know that a publication has an author named X, has a DOI with number XX, and is published in Journal XXX. That enriches the directed edges and helps us have a clear structure.
What is a scholarly communication graph good for?
How can one experience the OpenAIRE Research Graph?
Visit the OpenAIRE EXPLORE service, and search for a term. EXPLORE will bring you on screen all the related results directly from the OpenAIRE Research Graph.
Thanks to the OpenAIRE-Advance open innovation call, Opscidia created an ontology generator tool that allows users to search a term in the OpenAIRE Graph and view a simple visualization of nodes and their edges. A detailed presentation is available here: https://www.openaire.eu/opscidia-ontology-generator
To try out the Opscidia ontology generator, please visit: https://openaire.opscidia.com/
If you want to travel along the Nexus data trek, you need to use a fast vehicle! Speed is another characteristic of OpenAIRE Research Graph, as it handles big data in a very efficient manner. According to a Tech Validate and IBM survey in 2017, 57% of participants recognized speed as the top technology benefit of a graph database.
How does it do that?
Due to the OpenAIRE Research Graph database model, "a native graph" maintains the information related to "its neighbor nodes" without having to "load or touch unrelated data for a given query". In simple words, when you search on OpenAIRE EXPLORE, only the necessary nodes are queried, without triggering their interconnected nodes that possess other kinds of information that have no value to the query. Another ability that the OpenAIRE Research Graph has, is to combine multiple dimensions to manage big data, including time series, spatial attributes, data types, languages and more, combined under one search query!
How do I know that the information is trustworthy?
The operation of the OpenAIRE Research Graph is transparent and open to all. A visit at the https://graph.openaire.eu/about#architecture website demonstrates its architecture. More specific, the section that explains "How we built it", shows all the stages of the data path along the way inside the OpenAIRE Research Graph. The deduplication, Enrichment, Post-Cleaning processes, ensure that the information is valid and trustable.
Gartner has predicted that graph processing and graph databases "will grow at 100 percent annually over the next few years to accelerate data preparation and enable more complex and adaptive data science." It looks like the Knowledge graphs have reached a peak on the Gartner expectations over time graph, advancing from early innovation stages back in 2018 and 2019.
And it's not just this evidence, as there are more than 28K peer-reviewed scientific publications about graph data science in recent years from which 11K open access research outcomes, 23 projects and 195 organizations.
The OpenAIRE Research Graph rides the slope of data oceans
How does the Graph handle all that data available? It is built and structured based on standards on classifications. Specifically, the Graph model is respecting the revised field of Science and Technology (FOS) classification in the Frascati manual. That helps the Graph users to enjoy R&D outcomes that can be comparable (statistics), valuable and evidence based. Open Citations, with the rich collection of metadata and links that they offer, drive the Graph to new uncharted waters. An example follows below.
OpenAIRE Research Graph as the Data Space of advanced policy making platform - IntelComp - in Europe
As of January 2021, the OpenAIRE Graph provides its data collection to the IntelComp H2020 project. IntelComp is a Competitive Intelligence Cloud/High Performance Computing Platform for Artificial Intelligence-based Science, Technology and Innovation (STI) Policy Making. IntelComp will therefore benefit from the well-defined, trustable and FAIR data available through the OpenAIRE Graph, to reach its mission and vision towards advanced, data and science driven policy making in Europe on STI. The OpenAIRE Graph will benefit from this collaboration, by incorporating information focusing on policy making, learning which data, drive and motivate policy makers, what matters most and how blended information from various sources can be of high-added value for future policies in Europe.
OpenAIRE Research Graph is open, accessible and available to all! The records produced by the graph are free and can be used under a CC-BY license. The information on how to access or even download a dump is available here: https://graph.openaire.eu/resources
The main OpenAIRE services that use the OpenAIRE Research Graph, are available through the OpenAIRE services catalogue here: http://catalogue.openaire.eu/search;quantity=10
Just to give you an overview: