Skip to main content

Guides for Researchers

How to make your data FAIR

Basic information with links to resources

Introduction

Are you at the start of your project and planning to create research data? Read on to find out how to make it more findable, accessible, interoperable and reusable via the FAIR principles.

Why are the FAIR principles needed? The increasing availability of online resources means that data need to be created with longevity in mind. Providing other researchers with access to your data facilitates knowledge discovery and improves research transparency.

In this context, during the Lorentz Workshop "Jointly Designing a Data FAIRport" (2014), participants formulated the FAIR data vision to optimise data sharing and reuse by humans and machines, which resulted in the publication of The FAIR Guiding Principles for scientific data management and stewardship, published in "Scientific Data".

The FAIR principles describe how research outputs should be organised so they can be more easily accessed, understood, exchanged and reused. Major funding bodies, including the European Commission, promote FAIR data to maximise the integrity and impact of their research investment.

EC FAIR data

What does the EC require from project grantees on FAIR data?

The EC supports FAIR data not as a standard but as a framework to follow when designing a Data Management Plan. It has produced a set of Guidelines for FAIR data management.

 

What is FAIR data?

The Four Basics of FAIR:

'Findable' i.e. discoverable with metadata, identifiable and locatable by means of a standard identification mechanism
'Accessible' i.e. always available and obtainable; even if the data is restricted, the metadata is open
'Interoperable' i.e. both syntactically parseable and semantically understandable, allowing data exchange and reuse between researchers, institutions, organisations or countries; and
'Reusable' i.e. sufficiently described and shared with the least restrictive licences, allowing the widest reuse possible and the least cumbersome integration with other data sources.
 

FAIRdataprinciples foster

Image: https://book.fosteropenscience.eu/

Things to remember

  • FAIR is a set of principles; not a standard.
  • Does following the FAIR principles mean that your data has to be shared openly with everyone? NO.
    • Data can be FAIR but not open. For example, data could meet the FAIR principles, but be private or only shared under certain restrictions.
    • Open data may not be FAIR. For example, publically available data may lack sufficient documentation to meet the FAIR principles, such as licensing for clear reuse.
  • If you are in receipt of H2020 funding the EC requires a Data Management Plan (DMP) as part of the H2020 data pilot. The FAIR principles can help you understand how to practically describe how to create, store, share, manage and preserve your data in your DMP.

FAIR - in depth

Findable

Make your data findable by ensuring it:
  • Has a persistent identifier
  • Has rich metadata
  • Is searchable and discoverable online
Persistent identifiers (PIDs) are important because they unambiguously identify your data and facilitate data citation. An example of a PID is a Digital Object Identifier (DOI). When depositing your data in a repository, make sure you select a repository that assigns a persistent identifier (for example Zenodo).
The metadata describing your data supports findability, citation and reuse. Rich metadata provides important context for the interpretation of your data and makes it easier for machines to conduct automated analysis. Follow standard metadata schemes, general ones such as Dublin Core, or discipline specific. Consult the DCC metadata directory, the RDA Metadata Directory and a portal of data standards at FAIRsharing.

Accessible

Make your data accessible by ensuring it:
  • Is retrievable online using standardised protocols
  • Has restrictions in place if necessary
Remember that not all data has to be made open. Data can be restricted and still be FAIR. However, if access is allowed, data should be retrievable without the need for specialised protocols. In addition, even if the full content is not made openly available, the data must be as findable as possible.
As Open as Possible, As Closed as Necessary
Where can I keep my data? Not necessarily opening it up, but keeping it somewhere safe for the long-term. You should look for a repository that does the following:
  1. Stores the data safely
  2. Make sure the data is findable
  3. Describes the data appropriately (metadata)
  4. Adds license information
You can deposit data to a general repository (e.g. Zenodo, Harvard Dataverse) or a subject-specific repository (e.g. Dryad). Looking for your discipline? Search re3data or FAIRsharing for more suitable data repositories. See a demonstration of searching for research data repositories using the re3data directory.

Interoperable

Make your data interoperable by using:
  • Common formats and standards
  • Controlled vocabularies
Interoperable data means it can be integrated with other data, applications and workflows. Think about not creating data with proprietary software and making it available in open formats. Remember to use community agreed schemas, controlled vocabularies, keywords, thesauri or ontologies where possible.

Reusable

Make your data reusable by ensuring it:
  • Is well-documented
  • Has clear licence and provenance information
Create documentation, e.g. a README file to help ensure that your data can be correctly interpreted and reanalyzed by others. A README plain text file should contain the following information:
  • for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication;
  • for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units;
  • any data processing steps, especially if not described in the publication, that may affect interpretation of results;
  • a description of what associated datasets are stored elsewhere, if applicable;
  • whom to contact with questions.
If text formatting is important for your README, PDF format could also be used.
Data should have a clear license to govern the terms of its reuse. Guidance from the DCC can help you to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some Creative Commons options. The OA guidelines under Horizon 2020 recommend CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 of this document.
Check out: EUDAT provides a wizard to help you choose an appropriate license.

How FAIR are your data?

Use this checklist to evaluate your data against the FAIR principles

Jones, S. & Grootveld, M. (2017, November). How FAIR are your data? Zenodo. http://doi.org/10.5281/zenodo.1065991

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.