Make your data findable by ensuring it:
- Has a persistent identifier
- Has rich metadata
- Is searchable and discoverable online
(PIDs) are important because they unambiguously identify your data and facilitate data citation. An example of a PID is a Digital Object Identifier (DOI). When depositing your data in a repository, make sure you select a repository that assigns a persistent identifier (for example Zenodo
describing your data supports findability, citation and reuse. Rich metadata provides important context for the interpretation of your data and makes it easier for machines to conduct automated analysis. Follow standard metadata schemes, general ones such as Dublin Core
, or discipline specific. Consult the DCC metadata directory
, the RDA Metadata Directory
and a portal of data standards at FAIRsharing
Make your data accessible by ensuring it:
- Is retrievable online using standardised protocols
- Has restrictions in place if necessary
Remember that not all data has to be made open. Data can be restricted and still be FAIR. However, if access is allowed, data should be retrievable without the need for specialised protocols. In addition, even if the full content is not made openly available, the data must be as findable as possible.
As Open as Possible, As Closed as Necessary
Where can I keep my data? Not necessarily opening it up, but keeping it somewhere safe for the long-term. You should look for a repository that does the following:
- Stores the data safely
- Make sure the data is findable
- Describes the data appropriately (metadata)
- Adds license information
Make your data interoperable by using:
- Common formats and standards
- Controlled vocabularies
Interoperable data means it can be integrated with other data, applications and workflows. Think about not creating data with proprietary software and making it available in open formats. Remember to use community agreed schemas, controlled vocabularies, keywords, thesauri or ontologies where possible.
Make your data reusable by ensuring it:
- Is well-documented
- Has clear licence and provenance information
Create documentation, e.g. a README file to help ensure that your data can be correctly interpreted and reanalyzed by others. A README plain text file should contain the following information:
- for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication;
- for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units;
- any data processing steps, especially if not described in the publication, that may affect interpretation of results;
- a description of what associated datasets are stored elsewhere, if applicable;
- whom to contact with questions.
If text formatting is important for your README, PDF format could also be used.
Data should have a clear license
to govern the terms of its reuse. Guidance from the DCC can help you to understand data licensing
. This guide outlines the pros and cons of each approach e.g. the limitations of some Creative Commons options. The OA guidelines under Horizon 2020 recommend CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 of this document. Check out:
EUDAT provides a wizard
to help you choose an appropriate license.