How to comply with Horizon Europe mandate
Proper Research Data Management (RDM) is mandatory for any Horizon Europe project generating or reusing research data. It is a key part of Horizon Europe's open science requirements.
In Horizon Europe, beneficiaries must manage the digital research data generated in the action (‘data’) responsibly, in line with the FAIR principles, and should at least do the following:
Keep in mind that ‘research data’ is a very broad concept and certainly not limited to numerical/tabular data.
Horizon Europe emphasizes the management of data and other outputs in accordance with the FAIR principles, which means making data Findable, Accessible, Interoperable, and Reusable. By providing a set of attributes that enable and enhance their reuse by both humans and machines.
Making data available via a trusted repository already goes a long way in increasing their FAIRness.
There is no single, one-size-fits-all way to manage research data and make them FAIR. What is appropriate and feasible largely depends on the research domain and data type(s) involved, as well as on the specificities of the project.
The first step to comply with RDM requirements in Horizon Europe is to develop a Data Management Plan (DMP).
A DMP is a document that outlines, from the start of the project, how research data will be handled both during and after a research project. It identifies key actions and strategies to ensure that research data are of a high-quality, secure, sustainable, and – to the extent possible – accessible and reusable.
When should the DMP be ready?
Whenever possible, beneficiaries are encouraged to make their DMPs non-restricted, public deliverables, under open access and a CC BY license to allow a broad re-use. For example, many European projects from H2020 made their DMPs available via the Zenodo repository. In addition, the DMP can be made available as part of the deliverables on the EU CORDIS website.
To prepare the DMP of your project, Horizon Europe makes a DMP template available. The use of this template is recommended but not mandatory.
To help you draft both your Horizon Europe proposal and full DMP, use the online planning tool Argos, OpenAIRE’s open source service for writing and publishing DMPs. It offers templates aligned with the Horizon Europe requirements and custom guidance for each of the questions in the templates.
Trusted repositories are infrastructures that provide reliable and long-term access to digital resources such as data, publications, etc. Usually, these repositories go through assessment or certification processes to guarantee that certain quality criteria are met.
In Horizon Europe, the following are considered trusted repositories:
In Horizon Europe, data should be deposited as soon as possible after its generation and, at the latest, by the end of the project.
There are, however, some additional requirements:
In Horizon Europe, research data should be made open access by default and licensed under the latest version of CC BY (attribution required) or CC 0 (public domain), or equivalent.
However, it is recognized that data should be ‘as open as possible, as closed as necessary’, and exceptions can be made when providing open access to data:
In such cases, data can be kept restricted, closed or under embargo, but beneficiaries must explain in the DMP the legitimate exception(s) under which they choose to restrict access to (some of the) research data. Find more information on How do I know if my research data is protected?
Horizon Europe requires that, when you deposit data in a trusted repository, this should be described with rich metadata in line with the FAIR principles.
Most (trusted) repositories will require that you fill in a metadata form about the data or files that you will publish, which should cover these requirements.
Horizon Europe requires that metadata follows the FAIR principles. In practice, this means that you should choose a repository where metadata follows standards and includes Persistent Identifiers (PIDs): for the dataset (e.g. DOI), the author(s) (e.g. ORCID or ResearcherID) and, if possible, the organization(s) (e.g. ROR) and grant (e.g. grant DOI).
For additional information, you can check the page on metadata.
Horizon Europe requires that, at the time of depositing research data in a trusted repository, you must also provide access (via the repository) to information about any research output or any other tools or instruments needed to re-use or validate the research data.
Typically, the repository will allow you to provide links to relevant information (e.g. links to related publications, to a code repository, to related datasets, etc.).
Data management and sharing activities need to be costed into research, in terms of the time and resources needed. By planning early, costs can be significantly reduced. Costs associated with open access to research data, can be claimed as eligible costs of any Horizon Europe grant during the duration of the project under the conditions defined in the Grant Agreement: they must already be budgeted and accepted in the grant proposal, and note the “during the duration of the project”.
To estimate costs for RDM, you can check the online RDM-costing tool and the infographic on ‘What will it cost t o manage and share my data?’.
The following additional support materials can help you with the RDM requirements in Horizon Europe projects:
OpenAIRE also offers tools for research data management: ARGOS is an OpenAIRE service that simplifies the management, validation, monitoring and maintenance of DMPs. [tool]
The following sources were used and contain more extensive information on how to address open science in Horizon Europe proposals:
This Glossary provides the definitions of and practical advice regarding the key terms mentioned in the context of RDM requirements in Horizon Europe. Other terms relating to RDM can be found in the From Science Europe Data Glossary.
Anonymisation is the process of removing personally identifiable information (information that directly or indirectly relates to an identified or identifiable person) from datasets containing sensitive data. As a result, data subject is no longer identifiable. As opposed to pseudonymisation, anonymisation is not reversible, which means that the re-identification of the data subject is not possible.
Data backup is a process of creating a copy of data in a digital format and storing it on another device to ensure that data are saved and to prevent data loss.
Backups can be full (all files are backed up whenever a backup is made) or partial (only a part of the files, e.g. new files, are backed up).
Controlled vocabulary is an organised and standardised arrangement of predefined terms (words and phrases) that are used to index content in an information system with the aim of facilitating information retrieval. Controlled vocabularies connect variant terms and synonyms for concepts, link concepts in a logical order and organise them into categories, so as to provide a consistent way to describe data. They can be general and discipline-specific, and can take the form of subject heading lists, thesauri, authority files, taxonomies and alphanumeric classification schemes.
Ontologies are not controlled vocabularies, but they use controlled vocabularies to establish a formal specification of a conceptual model in which concepts and categories of concepts, properties, relationships among concepts and categories, functions, constraints, and axioms are defined.
Data Management Plan (DMP) is a formal document that outlines how data will be handled throughout the research data lifecycle – from planning, through collecting, analysing, publishing, preserving, to sharing and reusing.
In Horizon Europe, DMPs are mandatory and a template to guide the preparation of DMPs is provided. The list of points to be addressed includes data types and formats, compliance with the FAIR principles, (metadata, repositories, controlled vocabularies, licences, etc.), legal requirements (intellectual property rights, GDPR), costs of preservation, data security and ethics, retention periods.
Online tools that can facilitate the preparation of DMPs are available:
Data documentation includes various types of information that can help find, assess, understand/interpret, and (re)use research data – e.g. information about methods, protocols, datasets to be used and data files, preliminary findings, etc. Documentation helps understand the context in which data were created, as well as the structure and the content of data. Data should be documented through all stages of the research data lifecycle. Detailed and rich documentation ensures reproducibility and upholds research integrity. Documentation also includes metadata.
File format is a standard way of encoding information so that it can be stored in a computer file. Digital research data may be stored in a wide variety of file formats, depending on the devices and tools used in data collection and processing.
File formats may be proprietary (the encoding-scheme is designed and owned by a company or organisation, and is not published, due to which files can be opened only by those who have particular software or hardware tools) and/or prone to obsolescence (legacy formats, bit rot).
To ensure that users can access and understand data and that data can be preserved in the long term, use open (defined by an openly published specification that anyone can use) and lossless formats (ensuring that no data or quality loss will occur during file manipulation).
Check also this OpenAIRE Guide.
File naming convention is a framework for generating file names that have a consistent structure, while describing the content of files and their relations to other files.
Licence is a written agreement by means of which the copyright holder defines the rights granted to the users. In a digital environment, standardised licences based on a set of predefined reuse conditions, such as Creative Commons, are used. Licences for digital objects are machine readable.
Metadata are data that provide information about other data, e.g. a description of the content of the data, the date when the data were produced or collected, tools and devices used to obtain data, file formats and sizes, the names of the people who created or collected data, relevant persistent identifiers, etc. Metadata should be created and provided in accordance with commonly used metadata standards, which may be general or discipline specific. This ensures that metadata can be understood by humans and processed and exchanged by machines.
Persistent identifier (PID) is a long-lasting reference to a resource that provides the information required to reliably identify, verify and locate the resource. In a digital environment, PIDs have the form of URLs. When pasted in a browser, they take users to the resource.
Apart from digital resources, PIDs can also relate to researchers (e.g. ORCID, ISNI), institutions (e.g. ROR), grants, instruments and devices, etc. In this case, a PID leads to the record describing a researcher, an institution, etc. in the relevant registry.
Pseudonymisation (pseudo anonymisation) is the processing of personal data in such a way that the data can no longer be related to the data subject without the use of additional information. However, the additional information must be kept separately and subject to technical and organisational measures to ensure that data subjects remain unidentifiable. As opposed to anonymisation, pseudonymisation is a reversible process, which means that data subjects can be re-identified if access to the additional information is enabled.
Repository is a digital platform that ingests, stores, manages, preserves, and provides access to digital content. A repository should support a commonly accepted metadata standard and have a protocol enabling metadata exchange.
Repositories are usually classified into:
In Horizon Europe research data should be deposited in a trusted repository, i.e. in a repository that operates in accordance with relevant standards and best practice, provides long-term access and preservation, and ensures compliance with the FAIR principles. Trusted repositories include certified or community-recognised repositories, stable institutional repositories and generalist repositories (such as Zenodo).
Sensitive data is information that should be protected against unauthorised disclosure because unauthorised access may negatively affect the privacy of an individual, trade and business secrets or even security. In the context of research, sensitive data usually include personally identifiable information (names, date and place of birth, place of living, employment information, etc.), health information, and other private or confidential data.
Data storage is a computing technology that enables saving data in a digital format on computer components and recording media, including cloud services.
In the context of Research Data Management, it is necessary to ensure that data are stored securely until the end of the project and throughout the minimum retention period. Storage options may include: