News
From Metrics to Meaning: Rethinking Research Assessment with Dr. Mathijs Vleugel
The landscape of research evaluation is undergoing a profound transformation as the community moves away from a narrow focus on quantitative indicators toward a more comprehensive appreciation of scientific contribution. To better understand this transition, we sat down with Dr. Mathijs Vleugel, the Head of the Helmholtz Open Science Office and Chair of the CoARA National Chapter Germany, who has been closely observing these shifts within one of Europe’s largest scientific organisations.
Central to his recent work is the survey "Valuing what matters?", conducted between June and September 2025 across the Helmholtz Association, Germany’s largest research performing organisation with over 47,000 employees. Collecting the perspectives from 1,145 researchers the survey serves provides valuable insights into the current state of research assessment and informing the next stages. By investigating the gap between perceived evaluation criteria in hiring and promotion decisions and desired criteria, Dr. Vleugel is able to provide data-driven suggestions for more qualitative and inclusive assessment systems.
Here, he shares insights into key findings of the survey, which demonstrates a broad consensus on the need to reform towards more value-based evaluation criteria as well as important disciplinary and generational differences. In addition, he offers his perspective on how these findings might inform the future evaluation criteria used by hiring and promotion committees.
To set the scene, what prompted you to run this survey now, and what decision or change did you hope it would inform in research assessment practice?
In March 2025, the Helmholtz Open Science Office and the Helmholtz Working Group Open Science jointly launched the Helmholtz Task Group Research Assessment, which brings together experts from our 18 research centres, including representatives from libraries, data centres, strategy divisions, human resources and, most importantly, researchers themselves. Its overarching goal is to provide a platform for identifying and developing quality-oriented approaches to research assessment, while promoting awareness on reform initiatives like DORA and CoARA across the Helmholtz Association.
As its initial concrete activity, the group currently focuses on the assessment criteria used in individual evaluations, particularly in hiring and promotion procedures. Given that institutes and evaluation committees enjoy a considerable degree of autonomy in defining these criteria, we see the potential to make a direct impact in this area.
To first gain a clearer understanding of which contributions researchers across different fields consider most relevant to their work, we first conducted a fairly basic, but informative, survey: Helmholtz researchers were asked to select and rank their personal top ten from a list of 40 possible assessment criteria for hiring and promotion decisions. In total, 1,145 researchers responded, providing insights into both the perceived status quo (Which criteria do you think are currently being used?) and their preferred future criteria (Which criteria do you think should be used?). To enable differentiated analysis, respondents were also asked to share their career stage, research field and institute.
A detailed description of the methodology, as well as the original data, can be found here https://doi.org/10.5281/zenodo.18944700.
For a Responsible Research Assessment (RRA) audience, how would you summarise the main message of the findings in one or two key points?
Perhaps not unexpectedly, the lists of reported perceived and desired criteria differed substantially. Researchers predominantly expect to be evaluated based on output-based criteria, such as (metrics related to) publications and funding. In contrast, when researchers were asked to list their desired evaluation criteria, this focus shifted to more value-based criteria, such as those related to collaborative research, good research practice, institutional culture, and impact.

Besides relevant, although more subtle differences across the six Helmholtz research areas, we observe pronounced differences when looking at the desired evaluation criteria for different career stages. Compared to leading researchers (e.g., group leaders or principal investigators), early-career researchers (up to the PhD stage) report a particularly strong preference for value-based criteria, with “non-metric quality of publications” being the only high-ranking output-related criterion in their selections. Whether this pattern reflects a genuine generational shift, the selective effects of a system that favours more output-oriented researchers, or differing expectations associated with specific career stages remains an open question (most probably it is a combination of all of these). In any case, this data give us the opportunity to take a step back and consider which skills and values we should promote to support high-quality and collaborative research cultures.

Based on your findings, what is the single policy or procedural change that would have the greatest impact on aligning assessment decisions with Responsible Research Assessment?
I think our survey in many aspects provides an evidence-base for what many would have intuitively expected. At the same time, we experience that having the actual data to support this really helps us to move beyond more ideological discussions on research assessment and provide us with a level of disciplinary and institutional detail that allows us to act in concrete and targeted ways.
For example, the Helmholtz Task Group Research Assessment is now using these results to design modular (narrative) CV templates that can be flexibly adapted for different research areas and different job profiles. In order to do this, we have taken the 15 most frequently mentioned desired assessment criteria and are currently collecting and defining ways in which applicants can report on these aspects in their CV. Fortunately, we do not have to start from scratch, as we can build on existing work, for example from CoARA working groups, DORA’s Reformscape, and previous initiatives from within the Helmholtz Association. In addition to our modular CV templates, we will prepare guiding information for evaluators, covering best practices, potential risks such as biases or gamification, and expectations that vary by career stage. We aim to present these materials to our research centres in the second half of this year and hopefully run some targeted pilot implementations. The CV templates, along with any insights or outcomes from the pilots, will of course all be made available to the wider community.
Which survey finding most clearly demonstrates the current implementation gap, the difference between what institutions claim to value and what is actually rewarded, and what does this imply for institutional leaders?
Some of the differences that we observe between career stages can be intuitively explained by differences between the job profiles. For instance, the ability to secure funding is widely considered to be essential for established researchers, whereas it is an unrealistic assessment criterion for PhD candidates. However, when we consider assessment criteria that operate at a more systemic level, this line of argumentation does not always hold.
For example, especially researchers at later career stages (e.g., institute or group leaders) are in a position where they can make substantial contributions to shaping their institutional culture, as well as upholding standards of ethics and integrity. Yet we find that it is the early career researchers who seem to place a higher priority on including these as evaluation criteria. This mismatch suggests that we need to continue actively engaging with institute and group leaders in order to improve awareness and provide them with the tools to shape their immediate research environments in ways that attract talent and foster excellent, impactful research. Institutions and their leadership naturally play an instrumental role here, for example by establishing enabling frameworks, providing resources, and recognising individual contributions to research culture in hiring and promotion decisions.
What are the most feasible ‘first steps’ institutions can implement within 6-12 months to reduce dependence on proxy metrics in the assessment of researchers, while keeping decisions timely and manageable for committees?
Most researchers and evaluation committees will agree that the bibliometric indicators still dominant in evaluations are a poor proxy for research quality. However, we have to go beyond acknowledging these limitations and provide them with suitable alternatives. We hope that our CV templates achieve precisely this: offering ways to broaden the narrow set of criteria that are currently in use, in order to assess what we value and to recruit the best fitting candidates for a certain position.
Our data highlights relevant differences between disciplines - we therefore recommend that other institutions that wish to develop tailored CV templates first establish a similar baseline of the priorities that are most relevant for their research communities. Institutions are most welcome to use our survey-design and in the future also adapt our CV-templates to their institutional context.
An initial and minimally disruptive change to current procedures would be to make often-overlooked contributions visible by asking applicants to report on them. For example, in addition to listing their most influential publications, candidates could be asked report a selection of their most significant other research outputs (e.g., data, software, methods, patents) together with a short explanation why they consider these important. Applicants can also be asked to provide examples of key contributions to peer review (where possible, referencing open reports) along with a brief narrative describing how their feedback improved the work or even helped to shape standards of a field.
Although asking for such narrative descriptions may require some additional preparation from applicants, I would argue that they ultimately facilitate the work of evaluation committees. Imagine serving on a hiring committee outside one’s immediate area of expertise (which is often the case), how should evaluators be able to judge if someone is the right candidate’s based on a list of publications? I think it is unrealistic to expect that evaluators are able (and have the time) to contextualise every single publication and the applicant’s contributions to these without any guidance. The other undesirable approach is to continue to take shortcuts by relying on flawed metrics like the number of publications, citation counts, and journal impact factors. Instead, providing a selection of publications together with a short narrative explanation of the candidate’s contributions and the significance of the work is manageable for applicants and gives committees essential context to make informed decisions.
As Chair of the German CoARA National Chapter, what are the most effective levers at national chapter level to accelerate RRA implementation?
The German research landscape is characterised by a broad range of organizational types and profiles, a high degree of academic freedom, and a federated political system. While these features form the foundation of a robust and diverse research ecosystem, they can also slow down the coordination and implementation of systemic reforms. My impression is that this (at least partially) explains the comparatively low uptake of reform initiatives like DORA and CoARA at German higher education institutions compared to other European countries (Janne Pölönen, 2025 - https://zenodo.org/records/15727196).
However, with support from major funding organisations like the German Research Foundation (DFG) and the European Commission, research institutions increasingly recognise that assessment reforms are already taking place in Europe and in Germany, and that it is always better to participate in shaping these changes. At the same time, we do still have a lot of convincing to do.
For this, it will be crucial to demonstrate what can be accomplished with research assessment reforms and how they can structurally contribute to research excellence and impact, rather than counteract it. We need to provide a strong evidence-base for what works and be transparent about what doesn’t. We are starting to see really good examples of these, and I think our local community should be as inclusive as possible to the many reform initiatives that are taking part outside of formalised structures like CoARA.
I know you have also collaborated with OpenAIRE as part of the CoARA Working Group on Open Infrastructures for Responsible Research Assessment. For an organisation like OpenAIRE, what should open infrastructures provide to support these reforms, especially to reduce administrative burden and increase transparency, while avoiding a return to simplistic metric-based decision-making?
We should somehow find a balance between nurturing disciplinary and institutional differences on the one hand and ensuring time-efficiency and ease-of-use on the other, and I think open infrastructures can play a central role here. Let’s stay with our modular CV templates to make this more concrete: Ultimately, we wish for such CV templates to be integrated in online recruiting portals, while ensuring sufficient flexibility to accommodate the different expectations of advertised roles. This means that there will never be a uniform solution. While these structured CVs are of great help to evaluation committees, I appreciate it can be frustrating and time-consuming for applicants when every institution uses different formats that need to be filled out manually for every application. Digital infrastructures, such as researcher profiles, should be able to bridge this gap.
Firstly, they can support researchers with collecting their data (professional positions, research outputs, funding, activities, etc.) in an automated or semi-automated fashion and allow them to curate and complement this with any other information that they consider important. Secondly, by making these researcher profiles machine-readable and interoperable with institutional recruiting platforms, applicants should be able to import many of the data directly into the application form. This should reduce the time-consuming manual labour and allow researchers to focus on the parts of their application that need their personal and intellectual contribution.
How can open infrastructures support responsible use of indicators, so they inform judgement rather than replacing it?
We have to acknowledge that metrics will continue to play a role in research assessment. Even as reform initiatives encourage moving beyond narrow quantitative indicators in individual evaluations, metrics are likely to remain relevant in other contexts, for example in institutional monitoring, strategic planning, and reporting against key performance indicators. The challenge, therefore, is not to eliminate indicators entirely, but to ensure they are used responsibly and in ways that are contextualised. Crucially, research communities themselves should define what constitutes responsible use of indicators in their specific evaluation contexts, as different disciplines require different forms of evidence.
However, any responsible use of indicators is impossible without access to high-quality and trustworthy information sources. To make assessments fair and verifiable, both the data sources and the methodologies used to derive indicators must be transparent. Open infrastructures should additionally be based on governance structures that enable community oversight and improvement that works in the interest of research. This transparency allows the research community to scrutinise how indicators are produced, what their limitations are, and where they may introduce bias.
Taken together, these reflections point to a clear direction: responsible research assessment is not about replacing one set of indicators with another, but about rethinking how contributions are recognised across the research lifecycle. The Helmholtz survey highlights the need to better align what institutions reward with what researchers themselves value, while ensuring evaluation processes remain fair, transparent, and context-aware.
As Dr. Vleugel emphasises, this shift depends on access to trustworthy, open infrastructures that make data and methodologies visible and open to scrutiny. Approaches such as narrative CVs offer a practical way forward, helping to capture a broader range of contributions while providing evaluators with essential context.
Reflecting this wider shift, OpenAIRE is developing researcher profiles and flexible narrative CV templates as part of MyResearchFolio, a service currently in testing. By combining community-driven standards with interoperable infrastructures, these efforts aim to reduce administrative burden and support more inclusive and meaningful assessment practices.