Research Data Management Handbook
A Quick Guide to Research Data Management and the Open Research Data Pilot
Intro[Jump links:]
What is Open Research Data?What is the Open Research Data pilot?Is my project part of the Open Research Data Pilot?How do I comply with the Open Research Data Pilot?What is a Data Management Plan and how do I write a DMP?When do I have to create a Data Management Plan?How do I make my research data open?What do I do when I have sensitive data or privacy issues? What is metadata?What about RDM costs?What about my publications?
What is Open Research Data?
Open Research Data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to credit the curator and share under the same license.
Open access to research data fits within the Open Science paradigm, situated within a context of ever greater transparency, accessibility and accountability. The main goals of these developments are to lower access barriers to research outputs, to speed up the research process and to increase the quality, integrity and longevity of the scholarly record.
What is the Open Research Data pilot?
The Open Research Data Pilot of the European Commission enables open access and reuse of research data generated by Horizon 2020 projects. There are two main pillars to the Pilot: developing a Data Management Plan (DMP) and providing open access to research data, if possible.
The conditions you have to adhere to, are:
- Develop (and keep up-to-date) a Data Management Plan (DMP).
- Deposit your data in a research data repository.
- Ensure third parties can freely access, mine, exploit, reproduce and disseminate your data.
- Provide related information and identify (or provide) the tools needed to use the raw data to validate your research.
The pilot applies to:
- The data (and metadata) needed to validate results in scientific publications.
- Other curated and/or raw data (and metadata) that you specify in the DMP.
Data management costs are eligible for reimbursement during the duration of the project, and can be claimed under the conditions defined in the grant agreement. You can find more detailed informations in the EC's Guidelines on FAIR Data Management in Horizon 2020.
Is my project part of the Open Research Data Pilot?
As of 2017 participating in the pilot will be the default option for all Horizon 2020 projects, though opting out is accepted. If your project started before that date, check Article 29.3 in your grant agreement.
Projects started in 2014-2016Limited ORD Pilot | From 2017Extended ORD Pilot |
Limited ORD pilot: some areas participate: Check Article 29.3 | Participating is default option for all projects |
Possibility to opt-out but also to opt-in on voluntary basis for other areas | Possibility to opt-out |
How do I comply with the Open Research Data Pilot?
There are three steps to comply with the Open Research Data Pilot:
Step 1: Set up a DMP
The first version of your DMP has to be submitted within six months. You should update your DMP whenever significant changes occur, but at a minimum for periodic evaluation and the final review.
Step 2: Find a research data repository
Find a data repository that matches your data needs and discipline. An overview of repositories can be found at Re3data. If there is no subject-specific data repository available, catch-all repositories such as Zenodo, provide a good alternative.
Step 3: Deposit your data
Deposit the data and the information necessary to access and use it, i.e. metadata and tools/instruments, in the data repository. Attach an open licence, such as a creative commons license, to the datasets that can be made openly available.
(Partially) Opting-out
Each project proposal will need to consider taking part in the Pilot, but it remains possible to opt-out of at any stage: during the application phase or the grant agreement preparation (GAP) phase or after signing the grant agreement.
Moreover, projects individually define which data the Pilot covers for their specific context. A project can choose to only make a subset of the data available, or they can initially plan to make certain data available but then change their decision mid-project, for example if they discover there's a commercial application and plan to file for a patent.
The key principle to bear in mind is to be "as open as possible, as closed as necessary." If you plan to keep some datasets closed, you need to justify these decisions in your Data Management Plan.
For more information also check the EC's Guidelines on FAIR Data Management in Horizon 2020 and the OpenAIRE Open Data Pilot webpage.
What is a Data Management Plan and how do I write a DMP?
A Data Management Plan (DMP) is a formal document that specifies how research data will be handled both during and after a research project. It identifies key actions and strategies to ensure that research data are of a high quality, safe, sustainable and – where possible – accessible and reusable. More and more research funders require a DMP as part of the grant proposal process, or after funding has been approved. A DMP should be considered a ‘living’ document - it is ideally created before or at the start of a research project, but updated when necessary as the project progresses. Planning for data management is therefore not a one-off event, but a process.DMP Templates
The DMP template for H2020 projects provided by the EC includes:
- A summary of your data
- How to make your data FAIR
- Information about costs and resources
- Information about data security
- Ethical aspects
DMPonline is an online tool that provides a number of templates representing the requirements of different funders and institutions, such as Horizon 2020. It also provides further guidance to understand and answer template-specific questions. Plans created with DMPonline can be easily shared with collaborators and exported in various formats.
When do I have to create a Data Management Plan?
The first version of the DMP is expected to be delivered within the first 6 months of the project.
It should be updated as a minimum in time with the periodic evaluation/assessment of the project. If there are no other periodic reviews foreseen within the grant agreement, then such an update needs to be made in time for the final review at the latest.
Furthermore, the project consortium can define a timetable for review in the DMP itself.
How do I make my research data open?
To make your research data open, you can deposit it in dedicated data repositories. Some repositories, such as Zenodo, accept both publications and datasets. Data repositories allow you to provide persistent links to your datasets, so that they can be cited, linked and tracked. Just like other publications, you can license your data to define what level of reuse you allow. OpenAIRE recommends to use the Creative Commons CC0 Waiver or CC-BY licence for open access to data.
What do I do when I have sensitive data or privacy issues?
The concept of the free use of research data within the Pilot may conflict with data protection rules if such data contain personal data. Data protection rules are always applicable when personal data is processed. The term ‘Processed’ is all embracing and any operation with personal data not qualifying as processing is almost unthinkable. Anonymisation of personal research data is the only effective solution to comply with both the data protection legislation and the requirements of the Open Research Data Pilot.
Οbtaining the consent of the data subject to use and exchange his data may seem an alternative. However, the Pilot demands that data be made openly available to the broad public and for all forms of re use, whereas directive 95/46/EC demands that the purpose for processing must be defined as precise as possible.Remember that opting out of the Pilot is possible, provided that you motivate in the Data Management Plan why the data or part of the data can’t be shared openly.
What is metadata?
Metadata is simply information about data to make the data meaningful for machines. Properly describing and documenting data allows users (yourself included) to understand and track important details of the work. In addition to describing data, having metadata about the data also facilitates search and retrieval of the data when deposited in a data repository.It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more.
Metadata formats and standardsMetadata can take many different forms, from free text to standardized, structured, machine-readable, extensible content. It is recommended to use a standard metadata format used in your field. Specific disciplines, repositories or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Because creation of standardized metadata can be difficult and time consuming, another consideration when selecting a standard is the availability of tools that can help generate the metadata (e.g. Morpho allows for easy creation of EML, Nesstar for DDI data, etc.).The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.Some specific examples of metadata standards, both general and domain specific are:
- Dublin Core - domain agnostic, basic and widely used metadata standard
- DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
- EML (Ecological Metadata Language) - specific for ecology disciplines
- ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
- MINSEQE (MINimal information about high throughput SEQeuencing Experiments) - Genomics standard
- FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata
- MIBBI - Minimum Information for Biological and Biomedical Investigations
Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
What about RDM costs?
Under Horizon 2020, costs for data management are eligible for reimbursement for the duration of the grant agreement. To make an estimate of the cost associated with data management the University of Utrecht (The Netherlands) compiled a Data Management Cost Guide.
What about my publications?
We have a whole page dedicated to making your publications openly accessible and how to comply with the Open Access Mandate of the EC over here.
How can OpenAIRE help me?
We are here to help and inform you. We make sure your datasets are picked up by our infrastructure and link them to your project and publications. We also have lots of supporting material that can help you comply with the Open Research Data Pilot:- Fact sheet
- Helpdesk
- FAQ
- ….