Guides for Researchers
How do I license my research data?
Learn more about licenses for research data and how to apply it
This guide, a user FAQs for Researchers, on how to license research data, is part of the "User guide on copyright, open science and data", meant to offer a state of the art, legally advanced, but still manageable set of rules, guidelines, and resources to enable the full potential of OS in the EU research field with a view to addressing copyright and related rights issues.
Licenses for Research Data
What licence should be applied to the research data?
It depends on what rights protect your research data, if at all. In the light of what is explained in the guide "How do I know if my research data is protected?":
- If your research data qualifies as a work (literary work such as a journal article or a software), then CC BY 4.0 is usually the best choice. The use of the Share Alike (SA) is also compatible with the Open Access definition and reinforced in Plan S licensing guidance for publications. Non-commercial should be avoided as it is not Open Access compliant. Non-derivative is a tricky issue and should be avoided, especially if you do not know what you are doing. That said, it may not be incompatible with the Open Access definition.
- If your research data is a database or a dataset (unstructured data that do not meet the database definition) usually the best option is a CC0, which waives all your rights in the database.
Keep in mind that CC licences only deal with copyright and copyright related matter. Personal data are not included in CC and are analysed separately.
What is a Creative Commons licence?
Creative Commons, a global not-for-profit organisation which provides legal tools to promote the sharing and reuse of works of authorship, has produced a number of licences some of which meet the criteria for Open Access. These offer different levels of permission.
Creative Commons offers licences readable at three different levels: legal, machine (the metadata) and human (non-legal descriptions). Creative Commons has a useful tool to help you determine the licence best for you. More restrictive CC licences are unlikely to meet Open Access requirements (e.g. because they impose restrictions on commercial use).
How to apply licenses for Research Data
How are licences applied to research data?
Licences are not automatic. The owner of rights protected data set must make it clear that a licence is applied. Repositories may help you to select the licence applied to data deposited in their repository. Applying a licence can happen by:
- Choosing a license when uploading your data in a repository;
- Referring to the license on the landing page or host site for a digital research data;
- Attaching the license to the metadata of the research data;
- Setting up a Read Me file for the data.
If a standard licence from Creative Commons, they will have tools to help attach the licence effectively. See for more info the accompanying OS repository checklist for an explanation on how to use those tools.
I’m really concerned with attribution. How can I make sure others cite me as the source for my research?
Attribution is a genuine concern. To help others cite your research, include a citation in your research that users can copy and paste to give you credit for your hard work. If you licence your data under a CC BY you are legally requiring attribution, but we recommend that you do this only if you are authoring a work such as a journal article or a photograph or a song. If you are producing protected databases (as explained above) probably your best choice is to use CC0. You can still ask for attribution, not as a legal requirement but as “please attribute my data” in line with scientific norms.
But I would like attribution when others use my dataset. In that case, shouldn’t I use a CC BY licence?
We recommend that you avoid using a CC BY licence for data. While attribution is a genuine, recognisable concern, not only might using a CC BY licence be legally unenforceable when no underlying copyright or SGDR protects the work, but it may also communicate the wrong message to the world, as you are requiring attribution for something that the law says there is no attribution (e.g. SGDR does not require moral rights).
A better solution is to use CC0 and simply ask for credit (rather than require attribution), and provide a citation for the dataset that others can copy and paste with ease. Such requests are consistent with scholarly norms for citing source materials.
I’m uncomfortable with others using my research for commercial purposes. Should I use a non-commercial licence for my dataset?
We recommend you avoid using a non-commercial licence. For legal purposes, drawing a line between what is and is not ‘commercial’ can be tricky; it’s not as black and white as you might think. For example, if you release a dataset under a non-commercial licence, it would clearly prohibit an organisation from selling your dataset to others for a profit. However, it might also prohibit someone using the dataset in their research if they intend to eventually publish that research. This is because most academic journals are commercial businesses that charge some sort of fee for access to their content, hence, such use could qualify as ‘commercial’. Consequently, using a non-commercial licence may prevent researchers from using your data in work destined for publication. This can subsequently affect the dissemination, recognition, and impact of your dataset. And it is definitively NOT open access. (see the Berlin Declaration, Bethesda Statement on Open Access Publishing, and Budapest Open Access Initiative).
I’m uncomfortable permitting use of my research for any and all purposes. Should I use a ‘No Derivatives’ (ND) license for my dataset?
We recommend you avoid using a ‘No Derivatives’ license. Similar to how a non-commercial licence might restrict meaningful reuse of your dataset, a ND license can have the same effect: it may prevent someone from recombining and reusing your data for new research. For data to be truly Open Access, it must permit these important types of reuse. It is less clear whether ND is OA compliant or not. The best view is that it depends on what kind of modifications it prohibits, therefore, there are probably cases where ND is incompatible with OA, and thus you should not use it.
Specifications of licensing Research Data
Is there any part of the research data that cannot be made available?
Consider redacting research data to remove personal data, confidential information or third party intellectual property.
I want to CC licence my work, but I’m concerned because it contains copyright protected material made available by others that I cited or quoted. Will this affect their copyright?
Your CC licence applies only to your original contributions and does not supersede any rights retained by authors whose works you have cited or have permission to use.
How should I licence my data for the purposes of Open Science?
We recommend you use the CC0 Public Domain Dedication, which is first and foremost a waiver, but can act as a licence when a waiver is not possible. By applying CC0 to your data you enable everyone to freely reuse your data as they see fit by waiving (giving up) your copyright and related rights in that data.
How should I licence my work for the purposes of Open Access?
CC BY 4.0.
If you work for an educational institution, it is good practice to first check with your research director and library. Your institution may already have an Open Access publishing policy for you to consult, and your library will be able to help you decide how to best proceed.
Is data always subject to copyright?
You should keep in mind that there are many situations in which data is not protected as a matter of copyright and related laws. Such data can include facts, names, numbers – things that are considered ‘non-original’ and part of the public domain thus not subject to copyright protection. Similarly, your database (which is a structured collection of data) might be considered ‘non-original’ and thus ineligible for copyright, and it might additionally be excluded from other forms of protection (like the EU sui generis database right, also known as the ‘SGDR’, for non-original databases).
In these cases, using a Creative Commons licence such as a CC BY could signal to users that you claim a copyright in the non-original data despite the law, and perhaps despite your real intention. Finally, if your data is in the public domain worldwide, you might state simply and obviously on the material that no restrictions attach to the reuse of your data and apply a Public Domain Mark.
Does the researcher owe any obligations of confidentiality or ethics in respect of the data?
Obligations of confidentiality may be imposed by contract or implication. Most researchers are expected to abide by ethical codes of conduct.