Under H2020 an Open Research Data pilot was introduced encouraging good data management and aiming to make research data open. When your Horizon 2020 project is part of the pilot, and your data meets certain conditions, you must deposit your data in a research data repository where they will be findable and accessible for others.
Data management and sharing activities need to be costed into research, in terms of the time and resources needed. By planning early, costs can be significantly reduced. Costs associated with open access to research data, can be claimed as eligible costs of any Horizon 2020 grant during the duration of the project under the conditions defined in the H2020 Grant Agreement: they must already be budgeted and accepted in the grant proposal, and note the “during the duration of the project”.
Do you want to get a quick, visually understandable overview of this issue? Take a look at this infographic produced by DCC in the context of the OpenAIRE's RDM Task Force, dedicated to DMP resources.
Written on .
Step 1: Check the data management activities in the table and tick those that may apply to your proposed research.
Step 2: For each selected activity, estimate the additional time and/or other resources needed and cost this, e.g. people’s time or physical resources needed such as hardware or software. Find out which resources, e.g. for data storage and backup, are available to you from your institution. Consider whether you need a dedicated data manager.
Step 3: Add these data management costs to your research application. Coordinate resourcing and costing with your institution, research office and institutional IT services.
Step 4: Plan the data management activities in advance to avoid them competing with the need to focus on research excellence.
Remember that when your research project nears the end you do not want these additional data management activities to compete with delivery of your planned outputs, writing of publications and the timely delivery of your project. At this later stage the costs of preparing data for sharing may be significantly higher.
Written on .
We cannot predict your costs for you, the costs for data management and storage vary and depend on your project and the volume, the domain, level of documentation and preservation of your data.
But we can help you get started on costing your curation activities. In this guide you can find a tool listing, explaining and estimating the cost of possible expenses of data management. Estimates for quantifying amounts are only indicative of the order of magnitude.
Written on .
Findable
Make your data findable by ensuring it:
Has a persistent identifier
Has rich metadata
Is searchable and discoverable online
Persistent identifiers (PIDs) are important because they unambiguously identify your data and facilitate data citation. An example of a PID is a Digital Object Identifier (DOI). When depositing your data in a repository, make sure you select a repository that assigns a persistent identifier (for example Zenodo).
The metadata describing your data supports findability, citation and reuse. Rich metadata provides important context for the interpretation of your data and makes it easier for machines to conduct automated analysis. Follow standard metadata schemes, general ones such as Dublin Core, or discipline specific. Consult the DCC metadata directory, the RDA Metadata Directory and a portal of data standards at FAIRsharing.
Accessible
Make your data accessible by ensuring it:
Is retrievable online using standardised protocols
Has restrictions in place if necessary
Remember that not all data has to be made open. Data can be restricted and still be FAIR. However, if access is allowed, data should be retrievable without the need for specialised protocols. In addition, even if the full content is not made openly available, the data must be as findable as possible.
As Open as Possible, As Closed as Necessary
Where can I keep my data? Not necessarily opening it up, but keeping it somewhere safe for the long-term. You should look for a repository that does the following:
Interoperable data means it can be integrated with other data, applications and workflows. Think about not creating data with proprietary software and making it available in open formats. Remember to use community agreed schemas, controlled vocabularies, keywords, thesauri or ontologies where possible.
Reusable
Make your data reusable by ensuring it:
Is well-documented
Has clear licence and provenance information
Create documentation, e.g. a README file to help ensure that your data can be correctly interpreted and reanalyzed by others. A README plain text file should contain the following information:
for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication;
for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units;
any data processing steps, especially if not described in the publication, that may affect interpretation of results;
a description of what associated datasets are stored elsewhere, if applicable;
whom to contact with questions.
If text formatting is important for your README, PDF format could also be used.
Data should have a clear license to govern the terms of its reuse. Guidance from the DCC can help you to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some Creative Commons options. The OA guidelines under Horizon 2020 recommend CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 of this document. Check out: EUDAT provides a wizard to help you choose an appropriate license.