Guides for Researchers
How to comply with Horizon Europe mandate
for Research Data Management
What are the requirements?
Proper Research Data Management (RDM) is mandatory for any Horizon Europe project generating or reusing research data. It is a key part of Horizon Europe's open science requirements.
In Horizon Europe, beneficiaries must manage the digital research data generated in the action (‘data’) responsibly, in line with the FAIR principles, and should at least do the following:
- Prepare a Data Management Plan (DMP) and keep it updated throughout the course of the project
- Deposit data in a trusted repository and provide open access to it (‘as open as possible, as closed as necessary’)
- Provide information (via the same repository) about any research output or any other tools and instruments needed to re-use or validate the data
Keep in mind that ‘research data’ is a very broad concept and certainly not limited to numerical/tabular data.
The FAIR principles
Horizon Europe emphasizes the management of data and other outputs in accordance with the FAIR principles, which means making data Findable, Accessible, Interoperable, and Reusable. By providing a set of attributes that enable and enhance their reuse by both humans and machines.
Making data available via a trusted repository already goes a long way in increasing their FAIRness.
There is no single, one-size-fits-all way to manage research data and make them FAIR. What is appropriate and feasible largely depends on the research domain and data type(s) involved, as well as on the specificities of the project.
Developing a Data Management Plan (DMP)
The first step to comply with RDM requirements in Horizon Europe is to develop a Data Management Plan (DMP).
A DMP is a document that outlines, from the start of the project, how research data will be handled both during and after a research project. It identifies key actions and strategies to ensure that research data are of a high-quality, secure, sustainable, and – to the extent possible – accessible and reusable.
DMP timeline
When should the DMP be ready?
- A short (1-page) DMP is required at the proposal stage.
- A full, initial version of the DMP is required as deliverable; normally by month 6.
- Note: by exception, in cases of a public emergency and if the work programme requires so, you should submit a full DMP already with submission of proposals or at the latest by the signature of the grant agreement.
- The DMP is considered ‘a living document’ and has to be regularly updated to reflect changes that may arise or decisions that are implemented. For projects longer than 12 months, an updated version of the DMP has to be submitted as a deliverable.
- A final version of the DMP that describes how the data is managed and shared has to be delivered at the end of the project.
Whenever possible, beneficiaries are encouraged to make their DMPs non-restricted, public deliverables, under open access and a CC BY license to allow a broad re-use. For example, many European projects from H2020 made their DMPs available via the Zenodo repository. In addition, the DMP can be made available as part of the deliverables on the EU CORDIS website.
What to include in a DMP?
To prepare the DMP of your project, Horizon Europe makes a DMP template available. The use of this template is recommended but not mandatory.
To help you draft both your Horizon Europe proposal and full DMP, use the online planning tool Argos, OpenAIRE’s open source service for writing and publishing DMPs. It offers templates aligned with the Horizon Europe requirements and custom guidance for each of the questions in the templates.
Providing access to research data in trusted repositories
Trusted repositories
Trusted repositories are infrastructures that provide reliable and long-term access to digital resources such as data, publications, etc. Usually, these repositories go through assessment or certification processes to guarantee that certain quality criteria are met.
In Horizon Europe, the following are considered trusted repositories:
- Certified repositories, for example those with CoreTrustSeal, Nestor Seal DIN31644, or ISO16363 certifications.
- Domain-specific repositories that are internationally recognized, commonly used and endorsed by the research communities relevant to your project.
- General-purpose repositories or institutional repositories that, without official certification, present the essential characteristics of trusted repositories (e.g. security provisions, services for the creation of machine actionable metadata, long-term preservation of data, etc.).
- You can find more information on how to find a trustworthy repository for your data.
- For calls with a condition relating to the European Open Science Cloud (EOSC), data must be deposited in repositories that are EOSC-federated (discoverable via the EOSC Portal).
When should data be deposited?
In Horizon Europe, data should be deposited as soon as possible after its generation and, at the latest, by the end of the project.
There are, however, some additional requirements:
- Data underpinning a scientific publication should be deposited at the latest at the time of publication, and in line with standard community practices.
- In cases of public emergency, if requested by the granting authority, immediate open access should be provided. If exceptions to open access to data apply, data should be made available at least to the legal entities that need the data to address the public emergency.
- In exceptional cases, data can be deposited after the project has finished.
'As open as possible, as closed as necessary'
In Horizon Europe, research data should be made open access by default and licensed under the latest version of CC BY (attribution required) or CC 0 (public domain), or equivalent.
However, it is recognized that data should be ‘as open as possible, as closed as necessary’, and exceptions can be made when providing open access to data:
- Is against the beneficiary’s legitimate interests, including regarding commercial exploitation;
- Is contrary to any other constraints, such as data protection rules, privacy, confidentiality, trade secrets, Union competitive interests, security rules, intellectual property rights or;
- Would be against other obligations under the Grant Agreement.
In such cases, data can be kept restricted, closed or under embargo, but beneficiaries must explain in the DMP the legitimate exception(s) under which they choose to restrict access to (some of the) research data. Find more information on How do I know if my research data is protected?
Metadata
Horizon Europe requires that, when you deposit data in a trusted repository, this should be described with rich metadata in line with the FAIR principles.
Metadata should:
- At least include the following fields: author(s), dataset description or abstract, date of dataset deposit or publication date, dataset deposit venue, dataset license (CC 0 or CC BY by default) and dataset embargo period (if any).
- Also include information about Horizon Europe or Euratom funding: grant project name, acronym and number. Ideally, the repository will have dedicated fields for this information. If not, you can include them in other appropriate fields, such as the abstract
- Be open access under a CC 0 license or equivalent. This is also recommended in cases where data must be closed or restricted but there are no compelling reasons for metadata not to be findable and accessible.
Most (trusted) repositories will require that you fill in a metadata form about the data or files that you will publish, which should cover these requirements.
Horizon Europe requires that metadata follows the FAIR principles. In practice, this means that you should choose a repository where metadata follows standards and includes Persistent Identifiers (PIDs): for the dataset (e.g. DOI), the author(s) (e.g. ORCID or ResearcherID) and, if possible, the organization(s) (e.g. ROR) and grant (e.g. grant DOI).
For additional information, you can check the page on metadata.
Validation (and re-use) requirements
Horizon Europe requires that, at the time of depositing research data in a trusted repository, you must also provide access (via the repository) to information about any research output or any other tools or instruments needed to re-use or validate the research data.
- “Research outputs, tools and instruments” may include data, software, algorithms, protocols, models, workflows, electronic notebooks and others.
- What type of information should be provided? A detailed description of the research output/tool/instrument, how to access it, any dependencies on commercial products (e.g. software), potential version/type, potential parameters, etc.
- Besides information about these outputs, beneficiaries are encouraged to provide open (digital) access to the research outputs, tools and instruments themselves unless legitimate interests or constraints apply.
Typically, the repository will allow you to provide links to relevant information (e.g. links to related publications, to a code repository, to related datasets, etc.).
Costs of Research Data Management
Data management and sharing activities need to be costed into research, in terms of the time and resources needed. By planning early, costs can be significantly reduced. Costs associated with open access to research data, can be claimed as eligible costs of any Horizon Europe grant during the duration of the project under the conditions defined in the Grant Agreement: they must already be budgeted and accepted in the grant proposal, and note the “during the duration of the project”.
To estimate costs for RDM, you can check the online RDM-costing tool and the infographic on ‘What will it cost t o manage and share my data?’.
View our webinars recordings
How can OpenAIRE help?
The following additional support materials can help you with the RDM requirements in Horizon Europe projects:
- Open Science in Horizon Europe proposal [guide]
- Horizon Europe OA to publications [guide]
- What will it cost to manage and share my data? [infographic]
- How to make your data FAIR [guide]
- Data formats for preservation [guide]
- How to deal with non-digital data [guide]
- How to deal with sensitive data [guide]
- Raw data, backup and versioning [guide]
- How to find a trustworthy repository for your data [guide]
OpenAIRE also offers tools for research data management: ARGOS is an OpenAIRE service that simplifies the management, validation, monitoring and maintenance of DMPs. [tool]
Links and further information
The following sources were used and contain more extensive information on how to address open science in Horizon Europe proposals:
- European Commission, Horizon Europe Programme Guide (contains a dedicated section on open science on page 38)
- European Commission, Horizon Europe Programme Standard Application Form (HE RIA, IA)
- European Commission, EU Grants. AGA- Annotated Model Grant Agreement
Glossary
This Glossary provides the definitions of and practical advice regarding the key terms mentioned in the context of RDM requirements in Horizon Europe. Other terms relating to RDM can be found in the From Science Europe Data Glossary.
Anonymisation
Anonymisation is the process of removing personally identifiable information (information that directly or indirectly relates to an identified or identifiable person) from datasets containing sensitive data. As a result, data subject is no longer identifiable. As opposed to pseudonymisation, anonymisation is not reversible, which means that the re-identification of the data subject is not possible.
Practical advice:
- OpenAIRE has developed a tool that can be used for anonymisation Amnesia.
Backup
Data backup is a process of creating a copy of data in a digital format and storing it on another device to ensure that data are saved and to prevent data loss.
Backups can be full (all files are backed up whenever a backup is made) or partial (only a part of the files, e.g. new files, are backed up).
Practical advice:
- One backup should be at a physically separate location.
- Backups should be made from the master copy.
- The backup location should be as secure as the master copy location.
Controlled vocabularies and ontologies
Controlled vocabulary is an organised and standardised arrangement of predefined terms (words and phrases) that are used to index content in an information system with the aim of facilitating information retrieval. Controlled vocabularies connect variant terms and synonyms for concepts, link concepts in a logical order and organise them into categories, so as to provide a consistent way to describe data. They can be general and discipline-specific, and can take the form of subject heading lists, thesauri, authority files, taxonomies and alphanumeric classification schemes.
Ontologies are not controlled vocabularies, but they use controlled vocabularies to establish a formal specification of a conceptual model in which concepts and categories of concepts, properties, relationships among concepts and categories, functions, constraints, and axioms are defined.
Practical advice:
- Use standard domain specific controlled vocabularies and/or ontologies wherever possible to better align your outputs with similar data in your field of research.
- Use well-documented vocabularies in which terms are assigned persistent identifiers.
- Use repositories that enable you to add terms from controlled vocabularies.
- A useful domain agnostic resource for finding controlled vocabularies and ontologies can be found here.
- Check also other resources.
Data Management Plan
Data Management Plan (DMP) is a formal document that outlines how data will be handled throughout the research data lifecycle – from planning, through collecting, analysing, publishing, preserving, to sharing and reusing.
In Horizon Europe, DMPs are mandatory and a template to guide the preparation of DMPs is provided. The list of points to be addressed includes data types and formats, compliance with the FAIR principles, (metadata, repositories, controlled vocabularies, licences, etc.), legal requirements (intellectual property rights, GDPR), costs of preservation, data security and ethics, retention periods.
Online tools that can facilitate the preparation of DMPs are available:
Practical advice:
- A DMP is a living document, which should be updated as the project develops. Any deviations from the original proposal can be documented and explained.
- If possible, make the DMP publicly available.
Documentation
Data documentation includes various types of information that can help find, assess, understand/interpret, and (re)use research data – e.g. information about methods, protocols, datasets to be used and data files, preliminary findings, etc. Documentation helps understand the context in which data were created, as well as the structure and the content of data. Data should be documented through all stages of the research data lifecycle. Detailed and rich documentation ensures reproducibility and upholds research integrity. Documentation also includes metadata.
Practical advice:
- Various tools, such as e-lab notebooks, are available to support you in the process of creating documentation
File format
File format is a standard way of encoding information so that it can be stored in a computer file. Digital research data may be stored in a wide variety of file formats, depending on the devices and tools used in data collection and processing.
File formats may be proprietary (the encoding-scheme is designed and owned by a company or organisation, and is not published, due to which files can be opened only by those who have particular software or hardware tools) and/or prone to obsolescence (legacy formats, bit rot).
To ensure that users can access and understand data and that data can be preserved in the long term, use open (defined by an openly published specification that anyone can use) and lossless formats (ensuring that no data or quality loss will occur during file manipulation).
Practical advice:
- If you are using proprietary formats, produce a copy of data in an open format.
- Some file formats, though proprietary, have become the de facto standard due to their ubiquity (e.g. TIFF). In these cases, it is still recommended to produce a copy of the data in an open format so that both be stored together.
- When preparing a DMP, check whether the formats commonly used by your research community are suitable for long-term preservation. You may use lists of recommended (suitable for long-term preservation) file formats as a reference point:
- File Formats, DANS, https://dans.knaw.nl/en/file-formats/
- Recommended Formats, UK Data Service. https://ukdataservice.ac.uk/learning-hub/research-data-management/format-your-data/recommended-formats/
- Choosing a File Format’. Swedish National Data Service (SND). https://snd.gu.se/en/manage-data/guides/choosing-file-format
Check also this OpenAIRE Guide.
File naming convention
File naming convention is a framework for generating file names that have a consistent structure, while describing the content of files and their relations to other files.
Practical advice:
- Use standardised syntax for file naming to aid better searchability and the ability to perform batch processes. These can take the form of YYMMDD_filename, and it is recommended to use version suffixes wherever possible when creating versions of a file from a master copy. These could, for example, be generated after cleaning or analysis steps.
- Define the file naming convention in an early stage of your research and apply it consistently throughout the research data lifecycle.
Licence
Licence is a written agreement by means of which the copyright holder defines the rights granted to the users. In a digital environment, standardised licences based on a set of predefined reuse conditions, such as Creative Commons, are used. Licences for digital objects are machine readable.
In Horizon Europe, CC BY or CC0 (or equivalent) open licence is required for data in open access, while metadata deposited data must be open under the CC0 or equivalent licence.
Practical advice:
- When depositing your data in a repository, consult the repository’s licence policy to check whether it is compliant with the Horizon Europe requirements.
- If you are combining already published data into a new dataset, check the compatibility of data licences.
Metadata
Metadata are data that provide information about other data, e.g. a description of the content of the data, the date when the data were produced or collected, tools and devices used to obtain data, file formats and sizes, the names of the people who created or collected data, relevant persistent identifiers, etc. Metadata should be created and provided in accordance with commonly used metadata standards, which may be general or discipline specific. This ensures that metadata can be understood by humans and processed and exchanged by machines.
Practical advice:
- When preparing a DMP, you will be required to mention the metadata standards you will “follow to make your data interoperable”. A useful resource for finding available metadata standards can be found here. Choose a standard that is commonly used in your discipline. Do not invent your own!
- Some metadata are automatically generated (by devices used to create and capture data) and embedded in data files, e.g. in digital audio and video recordings,while some have to be produced manually.
- When depositing your data in a repository, you will be guided by the user interface through the process of providing metadata. In the input form, some metadata fields are mandatory, which means that the procedure cannot be completed if this information is not provided. It is highly recommended to provide as detailed information as possible even in non-mandatory fields.
Persistent identifier (PID)
Persistent identifier (PID) is a long-lasting reference to a resource that provides the information required to reliably identify, verify and locate the resource. In a digital environment, PIDs have the form of URLs. When pasted in a browser, they take users to the resource.
Apart from digital resources, PIDs can also relate to researchers (e.g. ORCID, ISNI), institutions (e.g. ROR), grants, instruments and devices, etc. In this case, a PID leads to the record describing a researcher, an institution, etc. in the relevant registry.
Examples of PIDs include DOIs, ORCIDs, ISBN, Handles, etc.
Practical advice
- Apply PIDs to your research outputs. PIDs are typically automatically assigned by trustworthy repositories once your data are deposited there.
- It is also recommended to register yourself with ORCID to get a PID for you as an individual.
Pseudonymisation
Pseudonymisation (pseudo anonymisation) is the processing of personal data in such a way that the data can no longer be related to the data subject without the use of additional information. However, the additional information must be kept separately and subject to technical and organisational measures to ensure that data subjects remain unidentifiable. As opposed to anonymisation, pseudonymisation is a reversible process, which means that data subjects can be re-identified if access to the additional information is enabled.
Practical advice
- OpenAIRE has developed a tool that can be used for pseudonymisation - Amnesia.
Repository
Repository is a digital platform that ingests, stores, manages, preserves, and provides access to digital content. A repository should support a commonly accepted metadata standard and have a protocol enabling metadata exchange.
Repositories are usually classified into:
- subject/disciplinary,
- institutional and
- generalist repositories.
In Horizon Europe research data should be deposited in a trusted repository, i.e. in a repository that operates in accordance with relevant standards and best practice, provides long-term access and preservation, and ensures compliance with the FAIR principles. Trusted repositories include certified or community-recognised repositories, stable institutional repositories and generalist repositories (such as Zenodo).
Practical advice:
- Use the re3data repository registry to identify appropriate repositories.
- The use of domain specific repositories is the most desirable. Such repositories will employ domain specific metadata standards and controlled vocabularies and/or ontologies which will enhance the ability to do analyses across similar datasets and even across domains.
- Institutional and generalist repositories should only be considered if no domain specific repository can be found or if there is a mandate to deposit in a specific repository. If it is necessary to deposit data in multiple repositories, this should be done so as to maintain the same persistent identifier across those multiple copies.
Sensitive data
Sensitive data is information that should be protected against unauthorised disclosure because unauthorised access may negatively affect the privacy of an individual, trade and business secrets or even security. In the context of research, sensitive data usually include personally identifiable information (names, date and place of birth, place of living, employment information, etc.), health information, and other private or confidential data.
Practical advice:
- Even in case of sensitive data, it is still possible to adhere to FAIR principles and open science by making the metadata freely available, while not enabling public access to the underlying data. These data can be managed through access control mechanisms and anonymisation and pseudonymisation.
Storage
Data storage is a computing technology that enables saving data in a digital format on computer components and recording media, including cloud services.
In the context of Research Data Management, it is necessary to ensure that data are stored securely until the end of the project and throughout the minimum retention period. Storage options may include:
- Portable devices,
- University network drives,
- Cloud services.
Practical advice
- Storage on local storage spaces on hard drives, pen drives, etc. is discouraged because these devices are vulnerable and data loss may occur. If portable devices are used, it should be ensured that there are copies on networked drives and backup storage.
- Using in-house or institutionally approved spaces is recommended, especially if regular backups are enabled.
- Cloud services are suitable for collaboration with partners from partner institutions. However, it should be checked whether the selected cloud service makes regular backups and whether it falls under European jurisdiction.
Publication date: August 8, 2022