Skip to main content
Case studies

Supporting intelligent policy making with OpenAIRE Graph

IntelComp STI Data Lake builds upon the OpenAIRE Graph

Overview

IntelComp’s STI Data Space is composed of interconnected currently fragmented and dispersed data from various ecosystems, both for / from the private and public sectors. Therefore it fully supports the objectives of the priorities for the EU over the 2019-2024 period, both from the Council and the Commission. OpenAIRE Graph is the source of scholarly works of the IntelComp STI Data Space.

Challenge & Scenario

The challenge was to satisfy the necessity for a qualitative, scientific trustable data, along with third parties data, that will support the project goal towards a holistic approach to STI policy making.

Solution & Implementation

That resulted in the creation of the IntelComp STI Data Space and the leverage the structured big data provided by OpenAIRE Graph (given its responsive metadata schema format), and based on that, the definition of an IntelComp schema to map the unstructured resources. Therefore, that enables IntelComp to be a sustainable and fully interoperable platform with most of the data types and formats that make sense and satisfy users’ needs.  Based on a survey by Deloitte in 2019, only 18% of organisations reported being able to take advantage of unstructured data, while 64% rely on structured data from internal systems/resources. At the same time, another survey by Treehouse Technology Group, found that unstructured data makes up 80 percent of enterprise data – and is growing at a rate of 55 percent and 65 percent per year.  IntelComp STI Data Space offers all kinds of data related to the project, and processes and analyses them by using HPC technologies.

Impact

The impact of IntelComp and OpenAIRE Graph, is that a tremendous number of more than 160 millions of scholarly works are now processed, analysed, and categorised to provide insights and trends to policy makers, to offer a holistic view for policy making to the domains such as Climate (agriculture, agrifood, energy), AI, Health, Public sector (governments, policy agencies).  There are two new tools created to support the users of IntelComp: i) The IntelComp STI Data Space catalogue, where all data sources and tools are registered and ii) the IntelComp STI Viewer, where easy to comprehend visualisations are drawn, with the available information from the IntelComp STI Data Space - data from OpenAIRE Graph,patents information from PATSTAT-EPO, citations information from OpenCitations (included within the OpenAIRE Graph).

In depth description

Details

Once the information of the IntelComp catalogue is updated, the IntelComp STI Viewer combines and projects data via user-friendly visualisations. In the following images, examples show how many publications in AI are published over the last years and furthermore, the topics of interest categorised.


Description/Source: The graph shows the evolution in the share of publications in different topics in the AI domain in the EU, over time. The ontology of topics has been inferred from the selected publications using Natural Language Processing techniques: Latent Dirichlet Allocation is used to detect and categorise the topics, and ChatGPT is used to label them. Data Source: (OpenAIRE Graph)

A logical question then arises: what about the publications on AI per country and organisations? That could be useful to a policy maker to get a clear overview of research outcomes, as shown in the following image.

Description/Source: The graph shows the number of publications in AI by the country of the affiliated organisation of an author (with at least one author affiliated to an EU organisation). In the case of multiple authors each organisation and corresponding country is counted as a separate publication (e.g., one publication with three authors, where two are from Greece and one is from Spain is counted as two publications in Greece and one in Spain). Data Source: OpenAIRE Graph

Description/Source: The graph shows the top 100 organisations in terms of the number of publications in AI with at least one author from an EU organisation. Data Source: OpenAIRE Graph

Continuing, another good indicator of the research impact is the combination of publications cited when a patent was registered and how the publications enabled international collaborations on various topics under the AI influence. 

Description/Source: The graph shows the total number of publications, superimposed with the number of those that are cited in patents, over time using data on publications and patents that are in the AI domain in the EU. Data Sources: OpenAIRE Graph, Patstat - EPO dataset.

Description/Source: The graph shows the number of publications with authors affiliated to organisations in at least two different countries, one of which is in the EU, by topic in the AI domain. The ontology of topics has been inferred from the selected publications using Natural Language Processing techniques: Latent Dirichlet Allocation is used to detect and categorise the topics, and ChatGPT is used to label them. Data Source: OpenAIRE Graph

Through this case study, it is clear how the OpenAIRE Graph can empower the policy making process and assist this difficult and multidisciplinary activity. The automation and constructive methodology of the OpenAIRE Graph itself, helps policy makers to enrich their information and easily navigate through various topics to spot important insights. Afterall, the OpenAIRE Graph is shaped by human knowledge and offers outcomes to further develop it.

Service in focus

OpenAIRE Graph

The OpenAIRE Research Graph is a service that populates and provides access (via APIs or a downloadable dump on Zenodo) to the research community, SMEs, Research Infrastructures. The Graph includes metadata and links between scientific products (e.g., literature, datasets, software, "other research products"), organizations, funders, funding streams, projects, communities, and (provenance) data sources. In IntelComp, the Graph is the core element of the IntelComp Data Lake, that along with other types of datasets (internal from partners, from social media, from websites, from third parties like EC, EPO, etc.) forms a rich pool of data to use, combine, compare, analyze, and view.

Related Services

We want to hear from you

If you find the case study useful, contact us so we can guide you through the process.