Skip to main content


Jun 19, 2019

Follow Us

Research Policy Monitoring in the Era of Open Science and Big Data

Jun 19, 2019

What we learned: The what (indicators) and the how (infrastructures)

Research funders across Europe are increasingly mandating Open Science practices for funded research outputs to support open and free access to valuable elements of the scholarly communication life-cycle. From OA to publications and the recent PlanS developments, to the promotion and uptake of coordinated RDM practices, to the more advanced research assessment exercises to understand innovation and societal impact, there is a need for monitoring of research output.


National and EU e-Infrastructures respond to these needs by embedding and developing monitoring tools to provide evidence-based data on policy uptake, costs, and research impact, while at the same time promoting interoperability of information outputs, shareable across networks.

This two-day workshop, co-organised by OpenAIRE and Data4Impact, with support of Science Europe, explored mechanisms for research policy monitoring and indicators, and how to link these to infrastructure and services. The first day was focused on open science indicators as these emerge from national and EU initiatives, while the second day explored more advanced aspects of indicators for innovation and societal impact.

Other elements addressed in this two-day workshop were:

  • Existing ways of monitoring for Open Science – the what and the how
  • Collaboration aspects to achieve a seamless monitoring landscape via open infrastructures
  • Data driven techniques for research assessment and their links to open data

Takeaways included:

  • A set of priorities of what elements in the research cycle need to be monitored
  • How infrastructures can work together to collectively provide monitoring elements and where to start
  • A deeper understanding of the benefits monitoring tools can bring to the OS lifecycle
  • A glimpse into emerging and future trends for research impact assessment using big (open) data and AI
Programme and presentations can be found here.

Briefing from the two days

  • Monitoring and Infrastructure for Open Science

    Ghent3Transparency, talking to one another and quality. These were just some of the themes concluded at the OpenAIRE / Data4Impact workshop, attended by over 80 people in Gent. OpenAIRE has been working in the area of monitoring and tracking research output, especially that of funders for nearly ten years now.

    The workshop explored different ways of monitoring in this new landscape of open science.

    So what is the priority?

    Quality must come first. The forward-looking keynote speech, given by Marc Vanholsbeeck, explored the meaning of impact. What do we actually refer to by using the term? There are, in fact, so many different kinds of impact in research, over 3000+ pathways according to the available research. Therefore, it is hard to pin down what we want to measure. The overarching theme was: quality comes first, impact comes next. It goes without saying that policies, such as those coming from the EC or national funders can determine impact, but how do we measure it?

    What is certainly clear is that OS can both communicate and popularize research to the benefit of society – as well as drive forward new ways of measuring impact. The remaining question is who are the players and providers who can do this – as an offer to society?

    We need to monitor repositories. And get trained up. A presentation of the OS monitor by David Osimo from Lisbon Council furthered the discussion – research by the initiative has looked into incentives for researchers to share. Very few researchers are willing to share their research data beyond their research groups. One interesting point made was the need to standardize usage data coming from repositories. A clear way forward, certainly in the European landscape, is to train data stewards, the number of whom is still very few.

    The RDA-FAIR Data Maturity Model WG - outlined in a presentation by Brecht Wyns and Christophe Bahim - has set out a rich plan looking at how we standardize FAIR - at present there are no benchmarks. This WG will go far to set these criteria and interpret the implementation of FAIR. The landscape study will also be crucial. We look forward to that FAIR checklist.

    CoalitionS, and what monitoring is needed, was then presented by Stephan Kuster. Clearly the Coalition S members have had their hands busy taking on board all the many responses. There will be some changes, altering the existing key principles. This includes: no pay-walls, more flexibility on the CC-BY licensing and a bit of leeway on allowing hybrid, so long as the journal can demonstrate it is moving towards a full open access model. And…the green road is also a very important route to open access reflecting that CoalitionS is part of a global movement. Good to see! There are few sanctions, the ultimate goal, here, is to change the publishing system, not to punish the researchers.

    Important point from Science Europe – remember that not all funders are so well resourced to deal with all these monitoring issues.

    Indicators should be transparent. Dietmar Lampert’s presentation stressed that - above all - indicators should be transparent. The results presented from his study (ZSI Research Policy and Development) gave some interesting insights: what indicators should be developed such as:

    • % of Pubs of OA journals with no impact factor
    • Availability of means to easily publish negative results

    Researchers are all too aware of standards within research. It is therefore a natural progression that we can build one for OS. Other factors in the research workflow can give us monitoring insights, and we need to explore them. Research evaluation should also harness the the unseized opportunities of the open science era and potentially new technological and methodological approaches.

    The unseized opportunities of the open science era and the possible technology and methodology approaches.

    The potential of the semantic web and using linked data to measure OS was the key message from Diego-Valerio Chialva (European Research Council). Diego set out an interesting use case for how to monitor outputs from the repositories, and these outputs need to be also collected but often are not, leading to ineffective evaluation. However semantic data can assist here and adds huge advantages to identifying resources. He also made the important point that OS needs to be monitored in and of itself.

    Based on open source sofware for transparency the OPERA project (Karen Hytteballe Ibanez, Technical University of Denmark): Exploring Open Research Analytics using VIVO used Vivo to describe open science metrics along with collaborations international and national, funding national and international, importantly using VOSviewer, and NEO4J for visualisations.

    It’s all about good data. Paolo Manghi reiterated the need for reliable data to contribute to the OpenAIRE graph. Much of this graph looks beyond the article itself such as: funder data – projects – research data – software which leads to a huge potential of added value services. And it needs to be kept curated and clean. Ultimately, we are building this graph together as decentralised open science community from repositories, aggregators, individuals, institutions, service providers. Ultimately all can contribute, and this would be for the common good. However, the data needs to be reliable.

    An inspiring tale from Portugal by Vasco Vaz, Foundation for Science and Technology. RCAPP started in 2008 providing access to all HEI material nation-wide. Involving pretty much everyone from the scholarly communication chain, it serves up a number of services. Using OpenAIRE’s services, project information is automatically added to output. It works as a successful case study whereby deposit is a natural part of the life-cycle within the institutional setting and since standards are implemented early on, the researcher only has to deposit once and reap the benefits.

    Panel session

    Panelists included Bregt Saenen (EUA), Helena Cousijn (FREYA), Paolo Manghi (OpenAIRE), James Hardcastle (Clarivate Analytics), Diego- Valerio Chialva (ERC).

    Questions and discussion were around the following:
    • Important question: who pays: this - in itself - is a negative question. It should be framed as ‘what is the cost going to be in comparison to the existing system’.
    • How can we work together?
    • We have to be interoperable to work together. The systems and their outputs have to interact. Interaction will make thing easier. Two panel members stressed that if we find similar ways to organize and discuss ideas then this sort of discussion can continue.
    Some of the main conclusions:
    • Don’t collect numbers just for the sake of it.
    • We need to carefully construct the parameters. This can lead to perverse results.
    • We need a quality trust seal for research outcomes and data.
    • We need benchmarking for open science
    • It has to be transparent
    • We need to be able to easily compare policies

    Some panel members argued for a consortia approach. The single institution cannot pay alone.

    Against the backdrop of the audience mentimeter, it was clear that ‘openness has to somehow be monitored. However, it was clear that lack of trusted sources is the main barriers.

    menti1 menti2
  • Open Science and Big data in support of measuring R&I Indicators

    The second workshop day focused on the use of big data technologies for advanced research assessment. The workshop was led by Data4Impact, a Horizon 2020 project funded by the European Commmission. Data4Impact pioneers big data techniques and develop pilot approaches which track elgacy and impact of research activities after the end of public funding. In this workshop, the project consortium presented a series of indicators developed on the performance and societal impact of 40+ research programmes in the health domain for the first time.

    During the morning session 

    Data4Impact coordinator Vilius Stanciauskas (PPMI) introduced the project. The key message was that building on data harvested from PubMed, OpenAIRE,, PATSTAT, clinical guidelines repositories, company websites, social media and media platforms, EC monitoring data and other databases, Data4Impact enables policymakers, funders, experts, researchers and the public in general to ‘Ask less & know more’ in the context of advanced research assessment.

    Following that, Alexander Feidenheimer (Fraunhofer ISI) provided an overview of the Data4Impact Analytical Model Of Societal Impact Assessment (AMOSIA) which has been developed over the course of the project. The analytical model is structured around four distinct phases of the research lifecycle, including input, throughput, output, and impact. Relying on novel big data techniques such as web scraping, crawling and mining as well as text analysis methods such as Natural Language Processing and deep learning, Data4Impact has gathered data for each analytical phase.

    The primary purpose of the following presentations by Data4Impact consortium was to present the indicators which have been developed over the course of the project for each of these phases. Ioanna Grypari (ATHENA RC) discussed the approach used for tracking of research outputs.

    Then, the consortium including Ioanna Grypari (ATHENA RC), Iason Demiros (Qualia), Vilius Stanciauskas (PPMI) and Gustaf Nelhans (University of Borås) have put emphasis on three categories of impact, including academic, economic and societal impact. .

    Please refer to the Data4Impact booklet which demonstrates linkages between data collected across the three dimensions of impact

    d4I booklet page

    The presentations of the second workshop day may be accessed here

    In the afternoon session 

    Data4Impact invited the participants to two parallel sessions, focusing on the Data4Impact methodology and indicators in the areas of:

    • Academic Impact & Societal Relevance of Research and
    • Economic Impact & Societal & Health Impact.
    A summary of key Data4Impact indicators presented to the discussion groups, accompanied by their assessment on utility and credibility in each of the focus areas.
    LevelIndicatorDescriptionRelevance & CredibilityComments
    1. Input level indicators 1.1. Funding volume Monetary expenditure on R&I activities N/A N/A
    2. Throughput and output level indicators 2.1. Publications Number of publications produced by programme/funder cited in patents N/A N/A
    Number of highly cited publications in patents N/A N/A
    2.2. Patents Number of patents produced N/A N/A
    2.3. Innovation outputs produced by EU FPs projects Number of health-specific innovation outputs produced in projects funded by the EU Framework Programmes N/A N/A
    2.4. New companies Number of new companies/start-ups created in the EU Framework Programmes N/A N/A
    2.5. Innovation outputs produced by companies participating in R&I activities Number of product innovations; or process, service or other innovations announced by companies N/A N/A
    2.6. Innovation activities carried out by companies participating in R&I activities Number of licensing agreements; or cases of acquisitions; or cases of private funding attracted; or cases of public funding attracted by companies; or cases of newly CE-marked medical devices or technologies N/A N/A
    3. Academic impact 3.1. Funding priorities Topic size in PubMed (absolute and normalised) High N/A
    Distribution of topics per funder (normalised) High N/A
    Distribution of funders per topic (absolute) High N/A
    3.2. Timeliness of research performed Rate of topic growth between 2012-2018 compared to 2005-2011 Low The length of the time intervals used to estimate trends, growth and other factors would be derived by some well-established and/or relevant criterion
    Share of funding allocated to top-10% fastest growing topics High N/A
    3.3. Funding exclusivity Share of funding allocated to top-10% smallest research topics by size (i.e. investment in small-niche topics) Medium relevance but high credibility N/A
    Number of funders per topic whose output share exceeds 3% globally Medium To be clarified
    Share of funding allocated to research topics with less than 5 funders whose share exceeds the 3% mark (i.e. investment in topics where few other funders invest) Medium To be clarified
    3.4. Technological value/significance of patents Analysis of the extent to which commonly patent forward citations (i.e. citations a patent receives from subsequent patent filings) are used High N/A
    4. Economic impact 4.1. Economic and innovation performance of companies Estimated share of enterprises with evidence of innovation activities High Might fit better if presented as input level indicator
    Estimated share of highly innovative enterprises High
    Estimated share of enterprises with evidence of licensing activities (incl. patent/ trademark license agreements) High
    Estimated share of enterprises involved in activities related to acquisitions High
    Estimated share of enterprises with evidence of private investment/capital attracted High
    4.2. Continuity of innovation activities Estimated overlap between project activities in FP7 & identified company innovations Medium To be clarified
    Number of newly CE-marked devices and medical technologies that could be directly linked to R&I activities in the EU Framework Programmes Medium
    5. Societal/ health impact 5.1. Impact on public health Citations of publications in clinical guidelines Medium Might be relevant to consider policy impact (e.g. consider health technology assessment)
    5.2. Societal awareness/relevance of research Rank of research topic based on number of news articles, blogs, posts, tweets, etc. discussing a given topic Low N/A
    5.3. Congruence of research funding with societal priorities Rank similarity of most discussed research topics versus actual spending in the topics Low N/A
    5.4. Newly launched medicines and medicinal products Number of human medicinal products or orphan medicines that could be directly linked to R&I activities in the EU Framework Programmes High Most positively assessed indicator
    Strength of link based on the number of mentions of product names and their active substances in EC monitoring data High N/A

    Following the discussions on the indicators developed by Data4Impact, the discussion leads from the consortium invited the discussion groups to consider whether any other indicators could be relevant in the context of research assessment. The Academic Impact & Societal Relevance of Research discussion group suggested to explore the possibility to compare the topic of the call with the topics of the publications which came out of the project. This could help determine whether the research published matches the research promised in terms of topics. It was also suggested to consider an indicator which would normalise the numbers produced to enable cross-disciplinary comparisons.

    The workshop programme concluded with a panel discussion involving the representatives of the funding organisations analyses in Data4Impact as well as service providers including:
    • Amit Prasad (World Health Organisation)
    • Danil Mikhailov (Wellcome Trust)
    • Silvia Pozzi (Fondazione Telethon)
    • Frank Manista (JISC)
    • Natalia Manola (OpenAIRE)
    The panel discussion focused on the following themes:
    • Possibilities and limitations of big data
    • Lessons learned from the use case
    • Transfer of knowledge and methodology to more environments and programmes
    Key conclusions of the panel discussion were:
    • Big data shows huge long-term potential, although limitations must be considered (e.g. time lag, data availability, etc.).
    • Some ways to mitigate the limitations could be with the use of multiple approaches (e.g. quantitative and qualitative) and additional documentation on how the indicators are derived and what specific data is used to construct them. With increased transparency, policymakers and funders could reach a consensus on the definition of each indicator as well as potential solutions to key data issues.
    • Reproducibility of the results considering particularly the usability and ease of use which are the driving force behind the adoption of new technologies.
    • Key lessons learnt is that Data4Impact shows that we now have a way to discover and use the data that is out there but could not be collected and used before for research assessment
    • Consider the impact of the fact that some information cannot be accessed openly on the project results.
    • Next step is to extend the Data4Impact knowledge and methodology to other domains and programmes.

    If you are interested in learning more about Data4Impact methodology and results, we would like to invite you to attend our upcoming Workshop on Data4Impact Methodology and Indicators which will take place on 24 June 2019 at the premises of Research Executive Agency (Brussels, Belgium). In this hands-on, interactive workshop we aim to gather feedback on the chosen methodology, coverage and latency/timeliness of the developed indicators, to maximise their relevance for all the stakeholders involved. You may find the programme and registration page for this workshop in our website. If you have any questions about the event, please contact Sonata Brokeviciute at .