News
The Mutable Truth: Why Freezing and Versioning Metadata is Essential for Open Science Research Assessment
In the dynamic landscape of Open Science, maintaining the integrity and reliability of scholarly metadata is more important than ever. As Open Science broadens access to research, from traditional journal articles to datasets and software, the need for robust metadata practices becomes clear. Among these, metadata freezing and versioning play a crucial role in supporting trustworthy and transparent research assessment [1].
For infrastructures like the OpenAIRE Graph, ensuring metadata accuracy after publication is not just a technical challenge, it’s essential for enabling researchers, institutions, and policymakers to rely on the data for comprehensive and fair evaluation.
The Challenge of Mutable Metadata
Today, research outputs are published across a wide variety of platforms: institutional repositories, thematic data archives, and catch-all repositories like Zenodo or Figshare. While this diversity improves accessibility, it also introduces variability in metadata curation.
Often, the scientists themselves handle metadata entry: filling out attribution and citation fields, uploading files, and finalising the deposition. This process can lead to errors due to misinterpretation or lack of oversight, but a deeper problem lies in metadata mutability.
Even when a DOI is assigned and the files are frozen, the metadata can still be edited. While this flexibility is generally advantageous, for example, for post-publication corrections, it can also be exploited for malicious purposes.
The Risks of Uncontrolled Metadata Updates
If not properly controlled, mutable metadata undermines the reliability of citation indexes and other metrics used for research assessment.
Imagine a scenario where, after publication, a record owner:
- Adds an author who wasn’t originally associated with a highly cited dataset;
- Adds fabricated citation links to the metadata to boost impact indicators.
These actions distort citation networks, misattribute credit, and compromise the indicators that assessment systems rely on. Fortunately, such practices can be detected or discouraged when metadata is cross-checked against authoritative sources, but this only works when verification is possible.
Verification Practices Across Outputs
Let’s look at different types of research outputs:
Scientific articles
Metadata such as the title, abstract, author list, and citations (references) is typically embedded and frozen in the article file (PDF). This allows for direct validation in most cases, either manually or via tools that extract and compare metadata.
Research software
Scientists who follow best practices often upload codemeta or cff metadata files into versioning platforms such as GitHub. When these projects are then published in repositories like Zenodo, to get citable DOIs and metadata, the metadata files becomes both frozen and versioned together with the code. These files can be used to validate repository metadata declared by other scientists at the time of deposition. While still not widespread (only around 2% of GitHub projects adopt this practice), it provides a solid foundation for verification of trustworthy attribution.
Research data
This remains the most vulnerable category. Unlike articles or software, data files rarely include embedded bibliographic or citation metadata. As a result, repository metadata is often the sole source of attribution (authors, organisations, funding) and citation. Without embedded or linked records to cross-check against, validation becomes nearly impossible.
Establishing Best Practices: Freezing and Versioning
As the Open Science publishing model continues to grow, and researchers take on more responsibility for metadata creation, the need for structured metadata management becomes critical. We must move toward standardising metadata freezing and versioning practices for all types of research outputs. In other words, to be scholarly communication “FAIR”, research products should include bibliographic and citation metadata files as part of their uploads, alongside the research product files themselves, so that their integrity can be upheld through freezing and versioning. Ideally, metadata files should adhere to common standards (such as codemeta and CFF), ensuring uniformity of representation across diverse repository platforms and allowing for the validation of repository metadata trustworthiness.
Toward Transparent and Responsible Research Assessment
Metadata freezing and versioning are not just technical preferences, they are strategic enablers of responsible and reproducible research assessment. By embedding these practices into Open Science workflows, we can create a system that is not only open, but also verifiable, transparent, and fair.
This shift benefits the entire research community, from scientists seeking proper recognition, to institutions aiming for reliable indicators, to policymakers demanding trustworthy metrics. And for infrastructures like the OpenAIRE Graph, it lays the groundwork for assessments that truly reflect the richness and complexity of modern science.
[1] Paolo Manghi; Challenges in building scholarly knowledge graphs for research assessment in open science. Quantitative Science Studies 2024; 5 (4): 991–1021. doi: https://doi.org/10.1162/qss_a_00322