Skip to main content

A Research Data Management Handbook

A primer on managing your research data

Open Data

“Open data and content can be freely used, modified and shared by anyone for any purpose”

Source: http://opendefinition.org

Tim Berners-Lee’s proposal for five star open data http://5stardata.info

One star make your stuff available on the Web (whatever format) under an open licence
Two stars make it available as structured data (e.g. Excel instead of a scan of a table)
Three stars use non-proprietary formats (e.g. CSV instead of Excel)
Four stars use URIs to denote things, so that people can point at your stuff
Five stars link your data to other data to provide context

Why manage data?

NON PECUNIAE INVESTIGATIONIS CURATORE
SED VITAE FACIMUS PROGRAMMAS DATORUM PROCURATIONIS

(Not for the research funder, but for life we make data management plans)

  • Make your research easier
  • Stop yourself drowning in irrelevant stuff
  • Save data for later
  • Avoid accusations of fraud or bad science
  • Write a data paper
  • Share your data for re-use
  • Get credit for it

 

How to make data open?

4 Key steps in very approximate order — many of the steps can be done simultaneously.

  1. Choose your dataset(s) — Choose the dataset(s) you plan to make open. Keep in mind that you can (and may need to) return to this step if you encounter problems at a later stage.
  2. Apply an open license (see more info)
    • Determine what intellectual property rights exist in the data.
    • Apply a suitable ‘open’ license that licenses all of these rights and supports the definition of openness.
    • NB: if you can’t do this go back to step 1 and try a different dataset.
  3. Make the data available — in bulk and in a useful format. You may also wish to consider alternative ways of making it available such as via an API.
  4. Make it discoverable — post on the web and perhaps organize a central catalog to list your open datasets.

Source: https://okfn.org/opendata/how-to-open-data/

data cartoon

Image courtesy of http://aukeherrema.nl CC-BY

Why share data?

It's part of good data practice

datagoodpracice

Validate results

"It was a mistake in a spreadsheet that could have been easily overlooked: a few rows left out of an equation to average the values in a column. The spreadsheet was used to draw the conclusion of an influential 2010 economics paper: that public debt of more than 90% of GDP slows down growth. This conclusion was later cited by the International Monetary Fund and the UK Treasury to justify programmes of austerity that have arguably led to riots, poverty and lost jobs."

datavalidation

More scientific breakthroughs

Data sharing enables scientific breakthroughs in the human brain studies as well as Alzheimer’s Disease, Type 2 Diabetes, Rheumatoid Arthritis and Lupus and many others.

A citation advantage

A study that analysed the citation counts of 10,555 papers on gene expression studies that created microarray data, showed: “studies that made data available in a public repository received 9% more citations than similar studies for which the data was not made available” (Source: Data reuse and the open data citation advantage, Piwowar, H. & Vision, T.)

Which data should be preserved and shared?

  • The data needed to validate results in scientific publications (minimally!).
  • The associated metadata: the dataset’s creator, title, year of publication, repository, identifier etc.
    • Follow a metadata standard in your line of work, or a generic standard, e.g. Dublin Core or DataCite, and be FAIR.
    • The repository will assign a persistent ID to the dataset: important for discovering and citing the data.
  • Documentation: code books, lab journals, informed consent forms – domain-dependent, and important for understanding the data and combining them with other data sources.
  • Software, hardware, tools, syntax queries, machine configurations – domain-dependent, and important for using the data. (Alternative: information about the software etc.)

Basically, everything that is needed to replicate a study should be available. Plus everything that is potentially useful for others.

Responsibilities in Research Data Management

  • The principal investigator – ultimately responsible for the data and for data management
  • Researchers, research assistants and/or data managers – involved in day-to-day data management
  • The institution’s management – draft and enforce data policies; raise data awareness
  • The institution’s research office consisting of library, IT and legal services – provide external data, tools, secure storage and access; expertise on rights management and ethics, data citation, metadata, access and licenses, funder requirements; raise data awareness
  • Research funders – encourage good data practices; invest in data infrastructure; raise data awareness
  • Project partners in academic and other research institutions as well as commercial partners
  • Academic publishers – impose requirements on the availability of data underlying submitted and/or published papers; provide identifiers to cite papers and link to related data
  • Research data repositories – long-term preservation of data long term; provide persistent identifiers and data discovery service

Explore the indicators related to open research data

Check out case studies and overviews of research data repositories, funder policies on data sharing and researchers attitudes towards data sharing in the Open Research Data section of the Open Science Monitor, commissioned by the European Commission Directorate-General for Research and Innovation

newerascience

 

datacitation

Image courtesy of http://aukeherrema.nl CC-BY

What is a data management plan (DMP)?

A Data Management Plan (DMP) is a brief plan to define:

  • how the data will be created
  • how it will be documented
  • who will be able to access it
  • where it will be stored
  • who will back it up
  • whether (and how) it will be shared and preserved.

DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data.

Common themes in DMPs

  • Description of data to be collected / created (i.e. content, type, format, volume...)
  • Standards / methodologies for data collection & management
  • Ethics and Intellectual Property (highlight restrictions on data sharing e.g. embargoes, confidentiality)
  • Plans for data sharing and access (i.e. how, when, to whom)
  • Strategy for long-term preservation

More elaborate DMP

Scientific research data should be easily:

  1. Discoverable - Are the data discoverable and identifiable by a standard mechanism e.g. DOIs?
  2. Accessible - Are the data accessible and under what conditions e.g. licenses, embargoes?
  3. Assessable and intelligible - Are the data and software assessable and intelligible to third parties for peer-review? E.g. can judgements be made about their reliability and the competence of those who created them?
  4. Useable beyond the original purpose for which it was collected - Are the data properly curated and stored together with the minimum software and documentation to be useful by third parties in the long-term?
  5. Interoperable to specific quality standards - Are the data and software interoperable, allowing data exchange? E.g. were common formats and standards for metadata used?
 dataocean
Image courtesy of http://aukeherrema.nl CC-BY

 

Key messages

  • The principles of good research conduct hold for all of us, across disciplinary boundaries.
  • Data management is all in a day’s work.
  • Planning and reflection are more important than the plan – but write the DMP and keep it up to date.
  • Planning data management is teamwork.
  • Think about the desired end result and plan for this.
  • Decisions made early affect what you can do later.

Useful links

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.