Skip to main content

Guides

This Glossary provides the definitions of and practical advice regarding the key terms mentioned in the context of RDM requirements in Horizon Europe. Other terms relating to RDM can be found in the From Science Europe Data Glossary.


 

 

Anonymisation

Anonymisation is the process of removing personally identifiable information (information that directly or indirectly relates to an identified or identifiable person) from datasets containing sensitive data. As a result, data subject is no longer identifiable. As opposed to pseudonymisation, anonymisation is not reversible, which means that the re-identification of the data subject is not possible.

Practical advice:

  • OpenAIRE has developed a tool that can be used for anonymisation Amnesia.

 

 

Backup

Data backup is a process of creating a copy of data in a digital format and storing it on another device to ensure that data are saved and to prevent data loss.
Backups can be full (all files are backed up whenever a backup is made) or partial (only a part of the files, e.g. new files, are backed up).

Practical advice:

  • One backup should be at a physically separate location.
  • Backups should be made from the master copy.
  • The backup location should be as secure as the master copy location.

 

 

Controlled vocabularies and ontologies

Controlled vocabulary is an organised and standardised arrangement of predefined terms (words and phrases) that are used to index content in an information system with the aim of facilitating information retrieval. Controlled vocabularies connect variant terms and synonyms for concepts, link concepts in a logical order and organise them into categories, so as to provide a consistent way to describe data. They can be general and discipline-specific, and can take the form of subject heading lists, thesauri, authority files, taxonomies and alphanumeric classification schemes.

Ontologies are not controlled vocabularies, but they use controlled vocabularies to establish a formal specification of a conceptual model in which concepts and categories of concepts, properties, relationships among concepts and categories, functions, constraints, and axioms are defined.

Practical advice:

  • Use standard domain specific controlled vocabularies and/or ontologies wherever possible to better align your outputs with similar data in your field of research. 
  • Use well-documented vocabularies in which terms are assigned persistent identifiers
  • Use repositories that enable you to add terms from controlled vocabularies.
  • A useful domain agnostic resource for finding controlled vocabularies and ontologies can be found here.
  • Check also other resources.

 

 

Data Management Plan

Data Management Plan (DMP) is a formal document that outlines how data will be handled throughout the research data lifecycle – from planning, through collecting, analysing, publishing, preserving, to sharing and reusing.

In Horizon Europe, DMPs are mandatory and a template to guide the preparation of DMPs is provided. The list of points to be addressed includes data types and formats, compliance with the FAIR principles, (metadata, repositories, controlled vocabularies, licences, etc.),  legal requirements (intellectual property rights, GDPR), costs of preservation, data security and ethics, retention periods.

Online tools that can facilitate the preparation of DMPs are available:

Practical advice:

  • A DMP is a living document, which should be updated as the project develops. Any deviations from the original proposal can be documented and explained.
  • If possible, make the DMP publicly available.

 

 

Documentation

Data documentation includes various types of information that can help find, assess, understand/interpret, and (re)use research data – e.g. information about methods, protocols, datasets to be used and data files, preliminary findings, etc. Documentation helps understand the context in which data were created, as well as the structure and the content of data. Data should be documented through all stages of the research data lifecycle. Detailed and rich documentation ensures reproducibility and upholds research integrity. Documentation also includes metadata.

Practical advice:

  • Various tools, such as e-lab notebooks, are available to support you in the process of  creating documentation

 

 

File format

File format is a standard way of encoding information so that it can be stored in a computer file. Digital research data may be stored in a wide variety of file formats, depending on the devices and tools used in data collection and processing.

File formats may be proprietary (the encoding-scheme is designed and owned by a company or organisation, and is not published, due to which files can be opened only by those who have particular software or hardware tools) and/or prone to obsolescence (legacy formats, bit rot).

To ensure that users can access and understand data and that data can be preserved in the long term, use open (defined by an openly published specification that anyone can use) and lossless formats (ensuring that no data or quality loss will occur during file manipulation).

Practical advice:

Check also this OpenAIRE Guide.


 

 

File naming convention

File naming convention is a framework for generating file names that have a consistent structure, while describing the content of files and their relations to other files.

Practical advice:

  • Use standardised syntax for file naming to aid better searchability and the ability to perform batch processes. These can take the form of YYMMDD_filename, and it is recommended to use version suffixes wherever possible when creating versions of a file from a master copy. These could, for example, be generated after cleaning or analysis steps.
  • Define the file naming convention in an early stage of your research and apply it consistently throughout the research data lifecycle.

 

 

Licence

Licence is a written agreement by means of which the copyright holder defines the rights granted to the users. In a digital environment, standardised licences based on a set of predefined reuse conditions, such as Creative Commons, are used. Licences for digital objects are machine readable. 

In Horizon Europe, CC BY or CC0 (or equivalent) open licence is required for data in open access, while metadata deposited data must be open under the CC0 or equivalent licence.

Practical advice:

  • When depositing your data in a repository, consult the repository’s licence policy to check whether it is compliant with the Horizon Europe requirements.
  • If you are combining already published data into a new dataset, check the compatibility of data licences.

 

 

Metadata

Metadata are data that provide information about other data, e.g. a description of the content of the data, the date when the data were produced or collected, tools and devices used to obtain data, file formats and sizes, the names of the people who created or collected data, relevant persistent identifiers, etc. Metadata should be created and provided in accordance with commonly used metadata standards, which may be general or discipline specific. This ensures that metadata can be understood by humans and processed and exchanged by machines. 

Practical advice:

  • When preparing a DMP, you will be required to mention the metadata standards you will “follow to make your data interoperable”. A useful resource for finding available metadata standards can be found here. Choose a standard that is commonly used in your discipline. Do not invent your own!
  • Some metadata are automatically generated (by devices used to create and capture data) and embedded in data files, e.g. in digital audio and video recordings,while some have to be produced manually. 
  • When depositing your data in a repository, you will be guided by the user interface through the process of providing metadata. In the input form, some metadata fields are mandatory, which means that the procedure cannot be completed if this information is not provided. It is highly recommended to provide as detailed information as possible even in non-mandatory fields.

 

 

Persistent identifier (PID)

Persistent identifier (PID) is a long-lasting reference to a resource that provides the information required to reliably identify, verify and locate the resource. In a digital environment, PIDs have the form of URLs. When pasted in a browser, they take users to the resource.

Apart from digital resources, PIDs can also relate to researchers (e.g. ORCID, ISNI), institutions (e.g. ROR), grants, instruments and devices, etc. In this case, a PID leads to the record describing a researcher, an institution, etc. in the relevant registry.

Examples of PIDs include DOIs, ORCIDs, ISBN, Handles, etc. 

Practical advice

  • Apply PIDs to your research outputs. PIDs are typically automatically assigned by trustworthy repositories once your data are deposited there.  
  • It is also recommended to register yourself with ORCID to get a PID for you as an individual.

 

 

Pseudonymisation

Pseudonymisation (pseudo anonymisation) is the processing of personal data in such a way that the data can no longer be related to the data subject without the use of additional information. However, the additional information must be kept separately and subject to technical and organisational measures to ensure that data subjects remain unidentifiable. As opposed to anonymisation, pseudonymisation is a reversible process, which means that data subjects can be re-identified if access to the additional information is enabled.

Practical advice

  • OpenAIRE has developed a tool that can be used for pseudonymisation - Amnesia.

 

 

Repository

Repository is a digital platform that ingests, stores, manages, preserves, and provides access to digital content. A repository should support a commonly accepted metadata standard and have a protocol enabling metadata exchange. 

Repositories are usually classified into: 

  • subject/disciplinary, 
  • institutional and 
  • generalist repositories. 

In Horizon Europe research data should be deposited in a trusted repository, i.e. in a repository that operates in accordance with relevant standards and best practice, provides long-term access and preservation, and ensures compliance with the FAIR principles. Trusted repositories include certified or community-recognised repositories, stable institutional repositories and generalist repositories (such as Zenodo).

Practical advice:

  • Use the re3data repository registry to identify appropriate repositories.
  • The use of domain specific repositories is the most desirable. Such repositories will employ domain specific metadata standards and controlled vocabularies and/or ontologies which will enhance the ability to do analyses across similar datasets and even across domains.
  • Institutional and generalist repositories should only be considered if no domain specific repository can be found or if there is a mandate to deposit in a specific repository. If it is necessary to deposit data in multiple repositories, this should be done so as to maintain the same persistent identifier across those multiple copies.

 

 

Sensitive data

Sensitive data is information that should be protected against unauthorised disclosure because unauthorised access may negatively affect the privacy of an individual, trade and business secrets or even security. In the context of research, sensitive data usually include personally identifiable information (names, date and place of birth, place of living, employment information, etc.), health information,  and other private or confidential data.

Practical advice:

  • Even in case of sensitive data, it is still possible to adhere to FAIR principles and open science by making the metadata freely available, while not enabling public access to the underlying data. These data can be managed through access control mechanisms and anonymisation and pseudonymisation.

 

 

Storage

Data storage is a computing technology that enables saving data in a digital format on computer components and recording media, including cloud services.

In the context of Research Data Management, it is necessary to ensure that data are stored securely until the end of the project and throughout the minimum retention period. Storage options may include:

  • Portable devices,
  • University network drives,
  • Cloud services.

Practical advice

  • Storage on local storage spaces on hard drives, pen drives, etc. is discouraged because these devices are vulnerable and data loss may occur. If portable devices are used, it should be ensured that there are copies on networked drives and backup storage.
  • Using in-house or institutionally approved spaces is recommended, especially if regular backups are enabled. 
  • Cloud services are suitable for collaboration with partners from partner institutions. However, it should be checked whether the selected cloud service makes regular backups and whether it falls under European jurisdiction.

Guides for Researchers

How to comply with Horizon Europe mandate

for Research Data Management

  • What are the requirements?

    Proper Research Data Management (RDM) is mandatory for any Horizon Europe project generating or reusing research data. It is a key part of Horizon Europe's open science requirements.

    In Horizon Europe, beneficiaries must manage the digital research data generated in the action (‘data’) responsibly, in line with the FAIR principles, and should at least do the following:

    • Prepare a Data Management Plan (DMP) and keep it updated throughout the course of the project
    • Deposit data in a trusted repository and provide open access to it (‘as open as possible, as closed as necessary’)
    • Provide information (via the same repository) about any research output or any other tools and instruments needed to re-use or validate the data

    Keep in mind that ‘research data’ is a very broad concept and certainly not limited to numerical/tabular data.

  • The FAIR principles

    Horizon Europe emphasizes the management of data and other outputs in accordance with the FAIR principles, which means making data Findable, Accessible, Interoperable, and Reusable. By providing a set of attributes that enable and enhance their reuse by both humans and machines.

    Making data available via a trusted repository already goes a long way in increasing their FAIRness.

    There is no single, one-size-fits-all way to manage research data and make them FAIR. What is appropriate and feasible largely depends on the research domain and data type(s) involved, as well as on the specificities of the project.

    FAIRdataprinciples foster

    Image: https://book.fosteropenscience.eu/

  • Developing a Data Management Plan (DMP)

    The first step to comply with RDM requirements in Horizon Europe is to develop a Data Management Plan (DMP).

    A DMP is a document that outlines, from the start of the project, how research data will be handled both during and after a research project. It identifies key actions and strategies to ensure that research data are of a high-quality, secure, sustainable, and – to the extent possible – accessible and reusable.

    DMP timeline

    When should the DMP be ready?   

    • A short (1-page) DMP is required at the proposal stage.
    • A full, initial version of the DMP is required as deliverable; normally by month 6.
      • Note: by exception, in cases of a public emergency and if the work programme requires so, you should submit a full DMP already with submission of proposals or at the latest by the signature of the grant agreement.
    • The DMP is considered ‘a living document’ and has to be regularly updated to reflect changes that may arise or decisions that are implemented. For projects longer than 12 months, an updated version of the DMP has to be submitted as a deliverable.
    • A final version of the DMP that describes how the data is managed and shared has to be delivered at the end of the project.   

    Whenever possible, beneficiaries are encouraged to make their DMPs non-restricted, public deliverables, under open access and a CC BY license to allow a broad re-use. For example, many European projects from H2020 made their DMPs available via the Zenodo repository. In addition, the DMP can be made available as part of the deliverables on the EU CORDIS website.

    What to include in a DMP?

    To prepare the DMP of your project, Horizon Europe makes a DMP template available. The use of this template is recommended but not mandatory.

    To help you draft both your Horizon Europe proposal and full DMP, use the online planning tool Argos, OpenAIRE’s open source service for writing and publishing DMPs. It offers templates aligned with the Horizon Europe requirements and custom guidance for each of the questions in the templates.

  • Providing access to research data in trusted repositories

    Trusted repositories

    Trusted repositories are infrastructures that provide reliable and long-term access to digital resources such as data, publications, etc. Usually, these repositories go through assessment or certification processes to guarantee that certain quality criteria are met.

    In Horizon Europe, the following are considered trusted repositories:

    • Certified repositories, for example those with CoreTrustSeal, Nestor Seal DIN31644, or ISO16363 certifications.  
    • Domain-specific repositories that are internationally recognized, commonly used and endorsed by the research communities relevant to your project.
    • General-purpose repositories or institutional repositories that, without official certification, present the essential characteristics of trusted repositories (e.g. security provisions, services for the creation of machine actionable metadata, long-term preservation of data, etc.).

    When should data be deposited?

    In Horizon Europe, data should be deposited as soon as possible after its generation and, at the latest, by the end of the project.

    There are, however, some additional requirements:

    • Data underpinning a scientific publication should be deposited at the latest at the time of publication, and in line with standard community practices.  
    • In cases of public emergency, if requested by the granting authority, immediate open access should be provided. If exceptions to open access to data apply, data should be made available at least to the legal entities that need the data to address the public emergency.
    • In exceptional cases, data can be deposited after the project has finished.

    'As open as possible, as closed as necessary'

    In Horizon Europe, research data should be made open access by default and licensed under the latest version of CC BY (attribution required) or CC 0 (public domain), or equivalent.

    However, it is recognized that data should be ‘as open as possible, as closed as necessary’, and exceptions can be made when providing open access to data:

    • Is against the beneficiary’s legitimate interests, including regarding commercial exploitation;
    • Is contrary to any other constraints, such as data protection rules, privacy, confidentiality, trade secrets, Union competitive interests, security rules, intellectual property rights or;  
    • Would be against other obligations under the Grant Agreement. 

    In such cases, data can be kept restricted, closed or under embargo, but beneficiaries must explain in the DMP the legitimate exception(s) under which they choose to restrict access to (some of the) research data. Find more information on How do I know if my research data is protected? 

    Metadata

    Horizon Europe requires that, when you deposit data in a trusted repository, this should be described with rich metadata in line with the FAIR principles.    

    Metadata should:

    • At least include the following fields: author(s), dataset description or abstract, date of dataset deposit or publication date, dataset deposit venue, dataset license (CC 0 or CC BY by default) and dataset embargo period (if any).
    • Also include information about Horizon Europe or Euratom funding: grant project name, acronym and number. Ideally, the repository will have dedicated fields for this information. If not, you can include them in other appropriate fields, such as the abstract
    • Be open access under a CC 0 license or equivalent. This is also recommended in cases where data must be closed or restricted but there are no compelling reasons for metadata not to be findable and accessible.

    Most (trusted) repositories will require that you fill in a metadata form about the data or files that you will publish, which should cover these requirements.

    Horizon Europe requires that metadata follows the FAIR principles. In practice, this means that you should choose a repository where metadata follows standards and includes Persistent Identifiers (PIDs): for the dataset (e.g. DOI), the author(s) (e.g. ORCID or ResearcherID) and, if possible, the organization(s) (e.g. ROR) and grant (e.g. grant DOI).

    For additional information, you can check the page on metadata.

  • Validation (and re-use) requirements

    Horizon Europe requires that, at the time of depositing research data in a trusted repository, you must also provide access (via the repository) to information about any research output or any other tools or instruments needed to re-use or validate the research data.

    • Research outputs, tools and instruments” may include data, software, algorithms, protocols, models, workflows, electronic notebooks and others.
    • What type of information should be provided? A detailed description of the research output/tool/instrument, how to access it, any dependencies on commercial products (e.g. software), potential version/type, potential parameters, etc.
    • Besides information about these outputs, beneficiaries are encouraged to provide open (digital) access to the research outputs, tools and instruments themselves unless legitimate interests or constraints apply.

    Typically, the repository will allow you to provide links to relevant information (e.g. links to related publications, to a code repository, to related datasets, etc.).

  • Costs of Research Data Management

    Data management and sharing activities need to be costed into research, in terms of the time and resources needed. By planning early, costs can be significantly reduced. Costs associated with open access to research data, can be claimed as eligible costs of any Horizon Europe grant during the duration of the project under the conditions defined in the Grant Agreement: they must already be budgeted and accepted in the grant proposal, and note the “during the duration of the project”.

    To estimate costs for RDM, you can check the online RDM-costing tool and the infographic on ‘What will it cost t o manage and share my data?’.

  • View our webinars recordings

    Horizon Europe Open Science requirements in practice. June 14th, 2022

  • How can OpenAIRE help?

    The following additional support materials can help you with the RDM requirements in Horizon Europe projects: 

     

    argos logo newOpenAIRE also offers tools for research data management: ARGOS is an OpenAIRE service that simplifies the management, validation, monitoring and maintenance of DMPs. [tool]

    Links and further information

    The following sources were used and contain more extensive information on how to address open science in Horizon Europe proposals:

     

  • Glossary

Publication date: August 8, 2022

Horizon Europe related guides

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.

Continue reading

Guides for Researchers

RDM service development checklist

An overview of the different capabilities required to develop effective RDM support across several different levels

  • Introduction

    Based on the RISE framework, this checklist gives an overview of the different capabilities required to develop effective research data management (RDM) support across several different levels. The rubric indicates whether the different capabilities are recommended when developing RDM services. It takes the capabilities originally included in RISE and adds two, covering EOSC participation and support for FAIR data and services. The capabilities assessed for each level are:

    RDM policy
    Covering the development and maintenance of RDM policy and associated documents and processes that enable its implementation.

    Business plans and sustainability
    Focusing on the approach to securing the sustainability of RDM services, including staff investment, technological investment, and cost modelling.

    Data Management Planning
    Concerning support for researchers to effectively plan the data component of their research and produce associated data management plan (DMP) documentation.

    Active data management
    To do with services that enable data management, including scalability and synchronisation of services, collaboration support, and security.

    Access and Publishing
    Covering the support for depositing and publishing open access data.

    Appraisal and risk assessment
    Including processes to identify valuable data and research outputs and mitigate any associated risks.

    Preservation
    Addressing the need to ensure data integrity and access to data.

    Training
    Both developing and delivering of this to researchers and research support staff, in online and in-person formats.

    Advisory services
    Concerning the provision of online and in-person advice for researchers and/or support staff who need support with aspects of RDM.

    EOSC participation
    Whether as service provider or user of European Open Science Cloud services.

    FAIR assessment
    Including assessments of both datasets and FAIR enabling services.

    More detail each of these capabilities is available in the RISE framework.

    Note that this resource, following on from RISE, does not explicitly address the maturity of services. It is hoped that the resources and secondary items it points to, and the references throughout to the areas where capabilities are addressed at different levels, can assist users in addressing service maturity and ensure that there is buy-in and input from different actors to work toward a more holistic service. Fundamental to this is the question, ‘What do we need to provide?’, which ensures that any discussions around service development centres on the needs of the researchers and service users, and the ability of the group, organisation, or institution to meet these requirements. This also has the effect of giving the group, organisation, or institution a basis on which to formulate any plans for future service provision and developments.

  • Research Group

    RDM policy

    Possibly: When dealing directly with data, either data gathered in experimental settings or data re-used from other sources, a research group or standalone project may have policy in place detailing its approach to research data management (RDM). This could be a fairly informal set of guidelines and local practices rather than an official policy. It will more often be the case that a research group will be required to comply with institutional and/or funder policy requirements. OpenAIRE provides a useful primer for researchers on complying with Horizon 2020 mandates, while JISC has produced a guide more specifically focused on compliance with funder requirements

    Business plans and sustainability

    Possibly: Plans around the sustainability of RDM services are often addressed at a higher position in the host organisation, most likely at Institutional or Research infrastructure level. However, a lot of research groups are set up as part of project activities and are dependent on external funding to continue. In these cases, this could be a very important aspect for some groups. The key thing is that the Research Group needs to know their funding source and, where necessary, plan for sustainability.

    Data Management Planning

    Yes:  Data Management Planning in general and the use of a data management plan (DMP) document enables a research group to set out what they will do with their data during a project, and what plans are in place for the data beyond the project’s end. A DMP is a living document, meaning that it can be updated throughout the project’s lifespan, and typically includes information on data description, data collection methods, metadata, licencing and long-term preservation among other criteria. Research groups can provide a great deal of knowledge on data management planning and workflows for specific disciplines and that this is an invaluable resource that could be tapped into by the host institution central RDM services. Communications between the two levels should be established to exploit this local knowledge and also to ensure that local RDM support aligns with institutional policies and expectations. For more information, a report from Science Europe outlines the Science Europe core requirements for DMPs, while this OpenAIRE RDM starter kit contains further practical resources on data management planning for beginners and experts.

    Active data management

    Yes: Active data management at the Research Group or project level involves the use of services such as cloud computing, file storage, and data back-up. Though these services themselves are often provided at institutional level, the onus is on the Research Group to ensure that active data management procedures are in place at all stages of the research data lifecycle. Similar to the point regarding DMPs, local Research Group knowledge about internal and external infrastructures and RDM workflows could be useful to tap into by central services; in these cases, an effort must also be made to align with institutional policies on data protection and integrity. This OpenAIRE primer gives advice on the handling of raw data, storage, versioning and data back-up, while this Data Management Expert Guide from CESSDA is designed to help those working in social sciences to manage their data.

    Access and Publishing

    No: Publishing data and providing long-term access to it is an important aspect to consider at Research Group level, where Research Groups can offer valuable insights into publishing routes and associated costs, also providing insights into issues to be considered for longer-term data access, especially for sensitive data. (e.g., Data access committees). However, the type of infrastructure needed for this means that it is more appropriate that support for data access and publishing is provided in-house at an institutional level, or via an external infrastructure (i.e. repository or subject-specific data centre).

    Appraisal and risk assessment

    Yes: This involves determining whether the data that the Research Group holds is of potential value to their organisation or to the wider research community, and identifying appropriate preservation strategies based on this. Attention can also be given at this stage to integrating ethics approval processes with data appraisal and risk assessment. The DCC has produced an in-depth guide on how to appraise research data for long-term storage that can assist on this task at a Research Group level.

    Preservation

    Possibly: Similar to Access and Publishing, it is important to think about preservation and to include planning for this in a DMP. However the provision of preservation infrastructure will depend on whether the group has their own data repository or databases that are used to support storage and access beyond the life of the project. For the Research Group level, there is an OpenAIRE guide that looks at appropriate data formats for preservation.

    Training

    Yes: Training at Research Group level is often dependent on the current competencies of Group members and how these competencies apply in the context of the work undertaken by the Group. Support from a higher level, such as the host institution, from experienced Research Group members or from other external bodies can help identify whether RDM training is needed and what the most appropriate types are. It is often the case at this level that researchers can learn from their colleagues, with this peer-to-peer knowledge exchange providing a practical alternative to more formal training courses.

    Advisory services

    No: Similar to training, Advisory services will usually be available from higher levels, such as the institution at which the research group is based. These advisory services will aim to provide support to researchers on various aspects of RDM, so it is practical for this service to be made available at an institute’s professional services level, where it can be offered advice to multiple groups or projects.

    EOSC participation

    Possibly: The manner of a Research Group’s participation in the European Open Science Cloud depends on a number of factors. A Research Group, depending on the work it does, may interact with the EOSC as a data/service provider or a data/service user (or a mixture of both). In any case, the EOSC Rules of Participation Working Group’s latest set of guidelines has set out the expectations and obligations of those participating in EOSC in all forms.

    FAIR assessment

    Yes: Making sure that the data produced by your Research Group is Findable, Accessible, Interoperable, and Reusable will result in your data being potentially of more value for other researchers and will make it easier to select an appropriate strategy for long-term storage and preservation. There are a number of resources available to aid researchers and Research Groups in making their data more FAIR; including the FAIR Aware tool, which is designed to help researchers and data managers learn about the requirements for data to be FAIR.

  • Institution

    RDM policy

    Yes: Having an RDM policy in place will allow an institution to align its research activities with its overall strategic objectives and direction. An RDM policy is one of the most important documents that an institution will have in place, enabling it to explicitly define the value of its research activities to the overall organisation, align with funder directives, and engage all levels in the organisation in good RDM practices. The DCC has produced a guide for developing RDM services at institutional level, while OpenAIRE has a checklist for those organisations looking to focus on their Open Science policy.

    Business plans and sustainability

    Yes: Accounting for the sustainability of RDM services at institutional level is essential. Part of this process includes identifying the costs of maintaining RDM services and forecasting where possible which aspects of the service will need to be developed in line with the institution’s future priorities (this includes alignment with funder mandates). It is also crucial to link institutional business plans and sustainability to efforts to maximise the value of research group outputs in the long-term. SPARC Europe has designed a tool to evaluate an institute’s RDM service offering, which can help in focusing on which areas need to be improved in light of the institute’s strategic planning.

    Data Management Planning

    Yes: At the institutional level, data management planning involves actively supporting researchers in documenting the plans for their research outputs and enabling the institution to take advantage of the information it gathers for the purposes of future RDM service provision and long-term strategic planning. This process increasingly involves producing data management plan (DMP) documentation; there are a number of data management plan platforms that allow for institutional accounts to be set up, with institute- and/or funder-specific requirements and guidance to be included in institutional DMP templates. One such is OpenAIRE’s Argos platform, which is part of the OpenAIRE Research Graph, while another is the DCC’s DMPonline tool.

    Active data management

    Yes: Rather than engaging directly with research outputs, an institution’s responsibility when it comes to active data management is generally in the form of service provision. These services include things like file storage and synchronisation, data back-up, security, and networked/linked storage. Other aspects to consider from the institution’s perspective will be whether researchers at the institution are working with data that they are generating themselves or with acquired data (which may require differing capabilities in terms of storage capacity and security), and the extent to which the institution’s services compete with or complement any third-party services used by researchers. On the topic of data back-up, OpenAIRE has made available this useful 10-point checklist

    Access and Publishing

    Yes: From an institutional perspective, this refers to the facilitation of data deposit and making this data as openly available as possible. Depending on the type of institution (whether it is a research-intensive organisation or otherwise) this may mean that setting up an institutional repository is the best approach in ensuring that researchers are supported in publishing their data. In any case, information on data produced by local research projects, often in the form of metadata, should be recorded by the host institution. It is also important to consider here that an institution’s approach to the oversight of data publishing and access can add value to its data collections. OpenAIRE Explore provides a platform for searching and linking data and projects funded by Horizon2020 (view the OpenAIRE guide).

    Appraisal and risk assessment

    Yes: Having procedures in place to appraise and assess research outputs and identify any potential risks is an important step in maximising the value of the data produced at an institution. This process is linked to Access and Publishing and Preservation steps - having a robust policy in place to appraise an institute’s research outputs can inform later decisions on data access, publishing, and the institute’s preservation strategies. To this end, the DCC’s ‘How to Appraise & Select Research Data for Curation’ guide sets out the steps to be taken in developing an appraisal policy, as well as the specific roles and responsibilities of those involved.

    Preservation

    Yes: From an institutional perspective, planning for preservation involves safeguarding not only the institution’s research outputs themselves, but also the technologies that ensure their future reuse is possible. This involves a consideration of specific software that is used to read any research data and how feasible it is to maintain this. The institution’s appraisal & risk assessment policy is also linked here in deciding which data is of value to organisation (and wider community) and thus worth preserving. The Digital Preservation Coalition Preservation Handbook contains a useful methodology for the development of institutional policies and strategies on preservation.

    Training

    Yes: Depending on its available resources, and the degree to which good RDM practices are embedded in the institution, policy makers will decide on providing online or face-to-face training (or a mixture), from in-house or external sources (or a mixture), to researchers and/or research support staff. In the long term, the institution’s role descriptions for professional and research support staff can be adapted include the competencies required to deliver RDM training to the institution’s research community (and potentially to those outside the institution). For institutions looking to provide external training there are a wide range of resources available, including OpenAIRE’s RDM ‘Train the Trainer’ resources, a course designed by DCC and Research Data Netherlands on ‘Delivering Research Data Management Services, while the DMT Clearing House provides a registry for online learning resources about RDM.

    Advisory services

    Yes: Providing advice on RDM at an institutional level is an effective way to guide researchers through the array of RDM tools and services that are widely available, and aid researchers in selecting the most relevant service for their work. This service can also help to direct researchers toward any appropriate training resources that might benefit their data management practices. Institutions will need to position any advisory services such that they complement the overall strategic aims of the institution; ideally, this service will be able to provide researchers with support throughout the lifecycle of their projects, from grant identification and submission, through to the storage and preservation of research outputs.

    EOSC participation

    Yes: Institutions themselves will typically engage with the EOSC from the perspective of being a research performing organisation holding important data collections and perhaps as a service provider. These services can be in the form of technical resources (such as repositories or cloud storage) or ‘human’ services (such as training and consultancy). To provide a service to EOSC, an institution must adhere to the EOSC Rules of Participation along with the more specific requirements to be met by service providers. Depending on an institution’s capacity, engagement with EOSC services should be encouraged where possible, especially where research receives European Commission funding; this can be done by advocating for EOSC engagement in the institution’s RDM strategic policy.

    FAIR assessment

    Yes: Assessment of FAIRness at institutional level involves both the services that the institution provides and the levels of awareness of FAIR amongst students, researchers, and policy makers. In terms of services, an article resulting from a collaboration between the OpenAIRE, FAIRsFAIR, RDA Europe, EOSC-hub, and FREYA projects sets out a series of recommendations which provide a framework for aligning policies with the FAIR principles. Amongst the recommendations for institutions is the establishment of data stewardship programmes and defining roles for institutional data stewards, who take responsibility for providing training and advice on FAIR. Linked to this is the idea of increasing FAIR awareness at all levels of the institution; the FAIRsFAIR-developed FAIR-Aware tool can be of use to those looking to gauge the level of engagement with FAIR at their institution.

  • Repository

    RDM policy

    Yes: An RDM policy document for a repository will focus on the types of service the repository provides to potential users and the procedures and protocols for maintaining the services. For example, if a repository only accepts datasets under a certain size, then its policy will outline the reasons for this and if there are any exception cases. The policy will include information on acceptable formats, ownership rights, licensing, metadata formats, PIDs, versioning, and access procedures. This policy may also be supplemented by a ‘terms of use’ document or something similar, which outlines the requirements for users of any services. The NI4OS Europe project has developed a Repository Policy Generator tool that allows for the creation of customised policies based on provided information

    Business plans and sustainability

    Yes: Repositories often hold large amounts of data and other research related outputs which are potentially of commercial and scientific value to both contributing researchers and the wider research community. In addition, funding for researchers and research projects often requires that data and outputs produced be deposited in a relevant repository to ensure that their value is maximised. As such, plans for the long term sustainability and the business case for services of a repository should be in place to ensure that there is confidence in the service and that resources for the maintenance and development of services are accounted for. The Curation Costs Exchange ‘Digital Curation Sustainability Model’ (DCSM) can assist in this regard (its ‘Example questions for organisations’ section on this page is especially relevant for repositories).

    Data Management Planning

    No: As the data and research outputs held in repositories is submitted by users themselves, it can be said that a large part of the responsibility for planning rests with them. However, on behalf of the repository, long term planning in terms of the capacity of the repository to continue to provide its services is necessary; this can indirectly involve data management planning (for example, planning for adequate storage over the long term, or how changes to any policies and impact on data currently held).

    Active data management

    No: Similar to the Data Management Planning aspect above, responsibility for Active data management also rests largely with the researchers and principal investigators who generate the data and research outputs. However, review of data is carried out by some repositories to ensure that submissions meet the standards required. The extent to which this is carried out depends on the repositories’ selectiveness and the amount of resource it has to carry out detailed checks on submissions.

    Access and Publishing

    Yes: One of the key functions of a repository is to maintain access to data and other research outputs deposited for re-use by others. In order to facilitate greater access, repositories can consider enlisting as part of a registry such as re3data.org, which allows users to locate appropriate repositories for accessing relevant data and sharing data with the most interested communities. A report from the APARSEN project outlines the responsibilities for continuing to upload digital rights within the context of data access.

    Appraisal and risk assessment

    Yes: Appraisal and risk assessment at repository level will focus on two levels: on the services the repository provides and on the research outputs deposited by users. For the former, important aspects to consider are whether the services provided are those that are most effective at maximising the value of the deposited research outputs and whether the changing needs of users will continue to be met by the services; for the latter, appraisal and risk assessment will tend to centre on the legal and ethical guidelines which the producer adhered to in creating any research outputs, and whether these criteria impact access and re-use across different disciplinary and regional boundaries.

    Preservation

    Yes: The preservation of research outputs is at the core of the services that repositories offer. This preservation refers to the integrity of the deposited data objects, research outputs and attendant metadata, as well as the continued access to these. Another report from the APARSEN project outlines the high-level services that can enable repositories and other similar organisations in reinforcing their preservation processes and ensuring the sustainability of the repositories’ services.

    Training

    Possibly: Repositories may provide training materials or support to users; for example, DANS, the Dutch national centre of expertise and repository for research data, provides training on Open and FAIR data, open science, RDM, and long-term preservation with a view to improving the data that will be deposited in its archive. Where it is not the case that a repository can provide extensive training and support materials to users, most repositories will generally provide training materials or support in some form, usually supplying guides for users on submitting and accessing the data and other research outputs that it holds. 

    Advisory services

    Yes: Some repositories offer the chance to contact staff about the suitability of a repository to deposit their data prior to submission. They can offer advice on things like standards that the repository recommends, appropriate formats and ontologies, and queries to do with costs, etc., making it more straightforward for depositors to ensure their data is in the right state for submission. Similar to training, advice for users who are not depositing data (i.e. those looking to access and re-use data) can be provided in the form of guide documents or multimedia, or in a Frequently Asked Questions webpage.

    EOSC participation/readiness

    Yes: A repository can join the EOSC as a provider, where its services can potentially be accessed by users beyond its original community. The criteria that the EOSC requires of providers can also be of benefit to the repository itself in encouraging it to focus on its FAIR enabling services and making available user statistics and feedback. These requirements can also feed into the strategic planning around the sustainability of the organisation and its services.

    FAIR assessment

    Yes: From the perspective of repositories, incorporating the FAIR principles with mean focusing on FAIR enabling services. There are many resources available to assist repositories in this respect, including two papers from the FAIRsFAIR project, one on a framework for FAIR services and another on FAIRsFAIR’s support to help organisations meet the CoreTrustSeal Requirements with an assessment of repositories' ability to enable FAIR data, while the FAIRsFAIR project has also developed the F-UJI tool to assess the FAIRness of datasets. A repository may also consider implementing a FAIR Data Point, which allows metadata to be stored, searched, and accessed in a FAIR manner by users.

  • Research Infrastructure

    RDM policy

    Possibly: An overarching policy is not necessarily needed at this level since some services may not be suitable for all researchers and a level of user discretion should be allowed. For example, those researchers that are producing or dealing with sensitive data cannot and should not use the same active storage areas that are used in other fields of research. In these cases, where possible, services such as data safe havens are preferred but may not be available in all institutions.

    Business plans and sustainability

    Yes: Creating and maintaining services can be very expensive. Indeed, a first step in the decision whether to build bespoke services should be to determine the level and amount of research being conducted in a host institution and the income from this through grants and other channels. This should be used to weigh up the value of the data being created and used in that institution and this is connected to the appraisal and risk assessment. Research intensive institutions, where financially feasible, should build their own services for their researchers and students. This will require careful planning and a business case needs to be drawn up as well as outlining the sustainability and scalability of any proposed services. For example see the EC context.

    Data Management Planning

    Possibly: Although not always necessary, DMPs are important for service development. They allow institutions and funders to monitor the usage of their research infrastructures and act accordingly where necessary. For example, if there are several projects running concurrently that are using large amounts of storage space of compute power, then there may be bottlenecks in the future which should be accounted for. DMPs, by their very nature outlining what should happen in the future, should allow the host institutions and funders to foresee such hurdles when taken together at scale.

    Active data management

    Yes: Safeguarding data through the active phase of the curation lifecycle is extremely important. Data loss and breaches of security are risk factors that should be mitigated and the ability to backup data and retrieve it where necessary should be factored into the design of any infrastructure. This includes networked storage and cloud services, and, for sensitive data, secure spaces such as data safe havens. See for example EUDAT services such as B2DROP for syncing and sharing data.

    Access and Publishing

    Yes: The ability to access data for third parties as well as data owners can pose difficult questions, especially for sensitive data. However, in all cases maintaining data in formats that are easily read, i.e. open file formats, should be given priority. This is also true for eventual publishing of data. For help on this, see the UK Data Services’ recommendations for file formats.

    Appraisal and risk assessment

    Yes: When channelling any given data through the various tools and spaces that a research infrastructure provides, it will be necessary to determine the value and sensitivity of that data and act accordingly with respect to safeguarding it. Appraisal is also necessary from the viewpoint of determining which data should be kept for the long-term and which can or should be destroyed. Many times this may be a subjective call, but there should be policies and/or guides in place that will allow researchers to determine this themselves. For example, the DCC has produced this guide on how to appraise and select research data.

    Preservation

    Yes: Essentially, this relates to repositories, whether institutional, domain specific or generalised. The option of which to use lies with the researcher but it is highly recommended that the researcher chooses a domain specific repository where possible, which will increase the value of their data. See for example B2SHARE or ZENODO.

    Training

    Yes: For those services that are new or more complex in nature, there should be training available. Many research infrastructures offer training, such as ESFRI clusters and the NERC Data Tree course.

    Advisory services

    Yes: Choosing the appropriate service, knowing what services are available, and ultimately knowing how to use any given service (training) will require advice. For example, at the University of Edinburgh, there is a large and wide-ranging catalogue of services available, some of which are not exclusive to the University. This can be overwhelming to researchers, especially those that are new to the host institution, and guidance must be provided to create workflows for their research based on the tools available to them. This can be overwhelming to researchers, especially those that are new to the host institution, and guidance must be provided to create workflows for their research based on the tools available to them, as is the case with the University of Edinburgh’s Digital Research Services.

    EOSC participation

    Yes: The EOSC has several services that have been built or that are in development or being planned, and that are available through a federated system to any users. Integration with these services and using them where possible, especially for EC funded projects, should be mandated. A catalogue of services can be found here.

    FAIR assessment

    Yes: To amplify FAIR principles compliance, mechanisms should be set in place to provide FAIR metrics. These will allow evaluation of how well any given workflow built of various services offers FAIR compliance and can be a useful indicator for researchers of how best to deal with their data. See the FAIRsFAIR project’s F-UJI tool.

  • Funder

    RDM policy

    Yes: Since they are providing financial backing, usually through public money, funders need to make their grantees aware of their obligations in relation to the work they conduct and how the data generated should be handled. Since data volume is growing at a pace that is ever increasing, data needs to be managed appropriately to allow research integrity to be upheld. Examples of funder mandated policy requirements can be found at the DCC which mainly shows those in the UK but also for Horizon 2020. OpenAIRE provides a useful checklist for research funding organisations to assess their readiness in adopting the Horizon2020 Open Science requirements as part of their RDM policy. 

    Business plans and sustainability

    Yes: A common strategy employed by institutions is to assess the gaps in their service provision. Filling in any gaps will require substantial financial investment in many cases and therefore requires proper planning and must exhibit awareness of future challenges that will also be encountered and that need to be addressed. Funders can identify key gaps in provision and prioritise effort there e.g. via open calls (see NWO or Wellcome Open Science grants) or supporting data centres like UKDA or NERC.

    Data Management Planning

    Yes: As well as a tool that aids researchers themselves, DMPs are also very useful to funders who can monitor how their money is being spent and thus potentially identify problems in the future. This is related to the business plans and sustainability and is again something that funders have the power to implement in terms of building infrastructure. See for example the Horizon 2020 DMP template.

    Active data management

    No: Unlike data at the final version stage of the curation lifecycle, which will typically be deposited in a repository, and which requires funder mandates, the active phase is not and should not be subject to funder requirements. This is in large part due to the heterogenous nature of service provision at institutions. See Institution.

    Access and Publishing

    Yes: Publishing of data has become a very prominent topic and one that has been made more so due to increasing awareness by lay persons. The public nature of research in most cases warrants public access to these data and, to facilitate this, most funders will provide guidance on data sharing and data access statements. Many of the major publishers have aligned with funder policies by putting data sharing policies in place with some commitment to mandatory sharing of data, where applicable. See Springer Nature and PLOS for two publisher examples.

    Appraisal and risk assessment

    No: These considerations are likely specific to individual institutions and therefore should not be overseen by funders themselves. Instead, the RDM policy should provide the necessary failsafe in these circumstances. See Institution.

    Preservation

    Yes: The preservation of research data has clear benefits for researchers themselves, but also for funders; if the preservation of research data is carried out correctly, this can reduce the need for studies to be repeated and allow support and resources to be directed toward more innovative and original research. Funders have a key role in incentivising the preservation of research data, and can do so through their policy requirements. For example, the Dutch funder NWO stipulates that it expects researchers to aim to preserve their data for ten years beyond the project end (taking into account discipline-specific and legal limitations of this).

    Training

    Possibly: How to use the services that are developed and built can seem daunting to many researchers and this must be addressed by providing adequate training that will provide the basics at the very least. Training content should not be directly dictated by funders since it is likely that there will be unique situations per institution but financial backing should be provided.

    Advisory services

    Possibly: Similar to training, this should also be financially backed but not necessarily driven by funder requirements.

    EOSC participation

    Yes: As part of pan-European efforts to integrate services and data, funder involvement should be encouraged in order to coordinate these efforts, while many are already members of the EOSC association.

    FAIR assessment

    Possibly: FAIR metrics are becoming increasingly important and prevalent as more researchers embrace the principles. It may be a requirement of funders, and in their own best interests, to adequately appraise the research they fund for FAIR compliance.

  • National

    RDM policy

    Yes: As of the end of 2020, a significant minority of European countries have national level policies in place focussing on open science and research, and this is projected to grow. Those that have devised policies aim to provide researchers in their countries with unambiguous guidelines on improving open science and research. The Netherlands provides a good example of national level ambitions where they have developed a roadmap, while a landscape analysis of policies in Europe can also be referred to.

    Business plans and sustainability

    Yes: As with funder backing, national level investment in services is also important, especially those that are publicly funded. They will likely provide more trustworthy services and ones that will be sustainable and long lasting. See for example the eInfrastructures Austria project, the German National Research Data Initiative (NFDI) and the Swiss Data Lifecycle Management project (DLCM).

    Data Management Planning

    Yes: In a similar way to that of funders, national level DMPs will allow a better understanding of resource requirements of researchers and will be the basis of service development planning. These DMPs will also be a useful resource cataloguing the research that has been conducted within these countries. See the Swedish example where the Swedish Research Council have introduced a DMP requirement and in the Netherlands the Dutch Research Council has implemented a DMP requirement.

    Active data management

    No: These services should be provided ideally at an institutional or research group level.

    Access and Publishing

    Possibly: Where possible, and as required, national repositories should be established that provide a last resort for data publication for the long-term. It is in the best interests of countries, especially those that provide substantial public funds, to safeguard their research data if institutional and domain repositories do not exist. Indeed, it may also be an option to deposit the data in more than one repository as a backup.

    Appraisal and risk assessment

    No: Though this should be addressed at National policy level, the responsibility for this should rest with the research group, institution, or infrastructure, who will be better-placed to carry out the necessary checks on data appraisal and risk assessment.

    Preservation

    Yes: Many countries have decided to implement a national level approach to supporting the preservation of data generated by their researchers. This has ensured that there is always a safety net for collecting all data and that researchers do not need to rely on third parties. Pooling of resources has allowed costs to be reduced and therefore the economic case for national repositories has been very attractive. See for example the Ductch national repository, DANS.

    Training

    Yes: Especially when considering local rules and regulations, it is essential that there is training available at a national level to provide the basic foundations for researchers to conduct their work. This is even more important when dealing with data of a sensitive nature and which has to abide within the laws at national and supranational levels. See for example training from EUDAT.

    Advisory services

    Yes: Similarly to training, there should be guidance available to help navigate local and national mandates and other regulations. Exchange of data and services and between countries and the ability to be open, can be impacted if these issues are not addressed.

    EOSC participation

    Yes: The European Research Area (ERA) is an initiative to allow free exchange of data and services across the continent and the EOSC is central to achieve those goals. Participation by nations is encouraged and uptake and implementation of services developed on a federated model is leading to better integration. See this short overview document from the European Commission on the new European Research Area.

    FAIR assessment

    Yes: Related to the EOSC, FAIR assessment and metrics will help in better integration of services and data. As a set of guiding principles, FAIR does not set out to dictate the technology but more how data should be managed, and this will have direct implications on how the technology should be used. At a national level, there should be an ability to gauge this, thereby increasing the value of the data produced in their jurisdiction. See F-UJI.

  • Overview

    checklist for RDM service development

DOI

Guides for Researchers on RDM

  • Data formats for preservation

  • How to comply with H2020 mandate - for research data

  • How to create a Data Management Plan

  • How to deal with non-digital data

  • How to deal with sensitive data

  • How to find a trustworthy repository for your data

  • How to identify and assess RDM costs

  • How to make your data FAIR

  • Raw data, backup and versioning

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.

Continue reading

The 2019 PSI – Open Data Directive

A checklist for IP, Research Data and Open Science

The 2019 PSI – Open Data Directive represents the latest major upgrade of the PSI legislation in the EU. It amends previous PSI directives which first introduced a regulatory framework for Public Sector Information establishing the principle of reuse by default. With the 2019 revision Public Sector Bodies covered by the Directive can now benefit from a more comprehensive set of rules that regulate the reuse of data and information that they hold. The PSI – Open Data Directive poses an even stronger requirement on reuse by default, expands the type of PSB covered and, very importantly for Open Science, establishes the principle that research data resulting from publicly funded research must be Open Access by default.

What is the PSI?

PSI Stands for Public Sector Information and has been a directive since 2003 in order to regulate use of public sector information and to encourage as much public sector to be openly available. In 2017 it changed to the Open Data Directive which provides a common legal framework for government held data.


  • What is the PSI?

    PSI Stands for Public Sector Information and has been a directive since 2003 in order to regulate use of public sector information and to encourage as much public sector to be openly available. In 2017 it changed to the Open Data Directive which provides a common legal framework for government held data.

    The 2019 PSI – Open Data Directive represents the latest major upgrade of the PSI legislation in the EU. It amends previous PSI directives which first introduced a regulatory framework for Public Sector Information establishing the principle of reuse by default. With the 2019 revision Public Sector Bodies covered by the Directive can now benefit from a more comprehensive set of rules that regulate the reuse of data and information that they hold. The PSI – Open Data Directive poses an even stronger requirement on reuse by default, expands the type of PSB covered and, very importantly for Open Science, establishes the principle that research data resulting from publicly funded research must be Open Access by default.

  • I. Overview

    The PSI – Open Data Directive (Directive EU 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information), which entered into force on 16 July 2019 and gives Member States 24 months to transpose it into domestic legislation, constitutes the latest major upgrade of the PSI legislation in the EU. It amends Directives 2003/98/EC and 2013/37/EU which represented respectively the first substantial intervention by the EU into the field of Public Sector Information and its significant 10-year update which further reinforced the principle of reuse by default. With the 2019 revision, which is based on principles such as transparency and fair competition in the internal market, Public Sector Bodies (PSB) covered by the Directive can now benefit from a more comprehensive legal framework that regulates the reuse of data and information that they hold. The PSI – Open Data Directive poses an even stronger requirement of reuse by default, expands the type of PSB covered and, very importantly for Open Science, establishes the principle that research data resulting from publicly funded research must be open access by default.

    There are a number of specific provisions that may impact, in certain cases significantly, Open Science practices in the EU once transposed into domestic law by Member States (MS). The following is a list of the most relevant.

    1. The scope of the directive covers documents held by Public Sector Bodies (PSB) in Member States at national, regional and local levels

    This includes national governments, ministries, state agencies and municipalities, as well as organisations funded mostly by or under the control of public authorities (e.g. meteorological institutes). Since the 2013 update, museums, libraries, including university libraries and archives are likewise included, although special rules apply. With the 2019 reform the scope has been further expanded to certain public undertakings under specific rules.

    2. The Directive regulates the reuse of documents held by PSB

    Documents are defined as any representation of acts, facts or information — and any compilation of such acts, facts or information — whatever its medium (paper, or electronic form or as a sound, visual or audiovisual recording). The definition of ‘document’ is not intended to cover computer programmes, however, Member States (MS) may extend the application of this Directive to computer programmes

    3. PSI legislation regulates the reuse of documents held by PSB and establishes the rule that such documents shall be reusable for commercial and non commercial purposes (Art. 1)

    It should be noted that access to information, i.e. the principal way in which the covered type of information is made available, is not directly regulated by PSI, which nevertheless identifies The Charter of Fundamental Rights of the European Union (see Recital 5 PSI-OD) as the legal basis for national access to information rules and also establishes that MS shall encourage public sector bodies and public undertakings to produce and make available documents falling within the scope of the Directive in accordance with the principle of ‘open by design and by default’.

    Access to information legislation is regulated at the MS level. EU institutions have their own procedures to access their documents. Interestingly, Art. 1(6) 2019 PSI – Open Data Directive establishes that the Sui Generis Database Right (SGDR) of Directive 96/9/EC shall not be exercised by public sector bodies in order to prevent the re-use of documents or to restrict re-use beyond the limits set by the 2019 Directive.

    4. Research data resulting from public funding

    Member States will be asked to develop policies for open access to publicly funded research data. This is perhaps the most important innovation brought by the 2019 PSI-OD amendment for Open Science. It implies a mandatory open access status for all research data produced with public funding (Recital 27).

    5. Provision on “high value datasets”

    Another important aspect is the provision on “high value datasets” defined as documents the re-use of which is associated with important benefits for the society and economy, which will be governed by a separate set of rules. Thematically, these datasets identify geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility as high value areas. The EC will identify the relevant dataset in 2021.

    6. Transparency requirements

    Stronger transparency requirements for public–private agreements involving public sector information, avoiding exclusive arrangements.

  • II. Focus on Research Data and Open Science

    In particular, regarding point 4) above, it should be noted how the 2019 Directive clarifies that under the national open access policies, publicly funded research data should be made open as the default option. However, this “open by default” rule should apply only to research data that have already been made publicly available by researchers, research performing organisations or research funding organisations through an institutional or subject-based repository. This should not impose extra costs for the retrieval of the datasets or require additional curation of data.

    The Directive also clarifies that MS may extend the application of the Directive to research data made publicly available through other data infrastructures than repositories, through open access publications, as an attached file to an article, a data paper or a paper in a data journal. Documents other than research data (e.g. scholarly articles, publications, etc) should continue to be exempt from the scope of this Directive.

    Additionally, concerns in relation to privacy, protection of personal data, confidentiality, national security, legitimate commercial interests, such as trade secrets, and to intellectual property rights of third parties should be duly taken into account, according to the principle ‘as open as possible, as closed as necessary’. Moreover, research data which are excluded from access on grounds of national security, defence or public security should not be covered by this Directive (see Recital 28).

  • III. The importance of national implementations and the development of national open access policies

    These are very important provisions in order to achieve one of the most important goals of Open Science, namely the accessibility, reusability and verifiability of scientific results. Access to the data that brought to certain result is a key element for the achievement of Open Science. The Directive is quite clear in terms of the basic rules, however, these rules will need to be implemented by MS in what the Directive calls national open access policies.

    It is fundamental that these national open access policies follow a common and coordinated approach in order to avoid potential fragmentation at the MS level. Space for intervention at the MS level could be found in areas such as:

    • The inclusion of software in the definition of documents of PSBs covered by the Directive;
    • The extension to other data infrastructures such as open access publications, as an attached file to an article, a data paper or a paper in a data journal as the type of first publication that will trigger the open by default rule;
    • National OA policies should contain a clear identification of a licence or licence type in order to avoid potential issues connected with licence identification, compatibility and maintenance;
    • In this sense EC Decision of 22/02/2019 “adopting Creative Commons as an open licence under the European Commission’s reuse policy” C(2019) 1655 final, which establishes CC BY 4.0 as the European Commission default licence for reuse policy and CC0 for raw data, metadata or other documents of comparable nature, while not binding for MS, indicates a clear and strong orientation for the convergence towards a common standard licence as the default option. MS should be encouraged to follow this common pattern for the generality of cases and only diverge when special circumstances justify a different approach.

Factsheet

DOI

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.

Continue reading

Guides for Researchers

RDM in Horizon Europe Proposals

  • Introduction

    The active management and appraisal of data over the lifecycle of scholarly and scientific interest defines research data management (RDM) and should be an integral part of any best practice in research and their outputs. It forms the practical requisites to performing good research by defining rules that should be followed and touches upon open science and the FAIR principles in doing so. RDM includes many elements such as licences, repositories, metadata, and more, and together allow upholding research integrity and reproducibility.

    For Horizon Europe proposals, RDM is explicitly referenced and consequently needs to be addressed by authors to show that contingencies are in place to safeguard data produced by the research being proposed. It will require authors to show evidence of practical measures that are to be put in place, from computing and storage infrastructure to what licences will be used through to the long-term preservation of data. The figure below (adapted from DCC) shows a simplified research curation lifecycle that provides a visual guide to the most important aspects to be considered in RDM and will be a visual cue to the itemised descriptions contained in this guide.

    research curation lifecycle

  • What to include in your proposal?

    The following tables provides a visual guide showing mandatory and recommended actions to be taken when writing your proposal.

    • Mandatory

      What

      How

      Documentation

      Detailed and rich documentation is fundamental to any good research and to provide reproducibility and uphold research integrity. Lab notebooks, whether on paper or by the increasing use of e-lab notebooks will aid this. Documentation is typically an umbrella term for what is required to be recorded, whether it is the type of file to be created, the protocols used in an experiment, justifications and reasoning for actions taken, and many other factors. Rich and detailed descriptions will aid future interrogation of the research conducted and this becomes a valuable resource.

      A subset of documentation is metadata and which is described below.

      Metadata

      Machine and human readable information, both at a technical and descriptive level, are the foundations of good RDM. Metadata encompass, file formats, documentation, controlled vocabularies and ontologies, licences, and persistent identifiers.

      Much of this metadata will be automatically generated when, for example, a digital data object is captured, and these will be essential in providing provenance to the underlying data and form a technical metadata layer. Descriptive metadata, which are typically done by manual curation but which are also increasingly done through automated methods thanks to such advances as AI, allow annotation of digital data objects and provide a further layer of information that is crucial to comparative analyses.

      A useful resource for finding available metadata standards can be found here, but there are several others that can be found through web searches.

      File formats For long-term preservation, it is essential that a version of your data exists in open and lossless file formats which retain all their data and are accessible across platforms. This will ensure accessibility of the data through software that are both proprietary and non-proprietary while also containing the full complement of data prior to any manipulations. These data are typically those that are initially captured and form the basis for any downstream analyses. Examples of such file formats can be found here.
      Controlled vocabularies n/a (but highly recommended where possible)
      Ontologies  n/a (but highly recommended where possible)
      Licences The ability to reuse data can be hindered by a lack of clarity on the rights that the data owner has failed to mention. By providing a licence, data reusers are made aware of their rights and the most common form of licence in research are Creative Commons. Licences for digital objects are machine readable and can enhance searches for data where filters can be used for different reuse rights. When ultimately depositing your data in a repository (see below), you should consult the repository’s licence policy which will determine what licence will then be placed on your data.
      Persistent identifiers (PIDs)

      Provide a PID for the different outputs of your research. These will provide a permanent means by which your data can be retrieved and disambiguates them from other outputs. PIDs can also relate to non-research outputs such as the researchers themselves, or the institution in which the research will be carried out or the grant. PIDs will typically be automatically assigned by trustworthy repositories (see below) once your data are deposited there which provides a valuable service and is an incentive to use these repositories.

      Examples of PIDs include DOIs, ORCIDs, ISBN.

      Storage and backup Through the active phase of the research curation lifecycle, before final deposition in a repository (see below), data need to be stored on networked and back up storage spaces which will provide a means by which data can be recovered in the event of data loss. Storage on local storage spaces on hard drives, pen drives, etc is discouraged but if this is done then it should be ensured that there are copies on networked and back up storage.
      Repositories To ensure long-term sustainability and to take responsibility away from your own hands in being able to manage your data, third party repositories need to be used. This step is data preservation and publishing your data in a repository allows reusers to find and access your data. When considering sensitive data, special attention must be given to make sure that data is safeguarded properly - it might not be possible to make the data fully accessible but there is the possibility to make the metadata discoverable.
      Data Management Plans

      In Horizon Europe, DMPs have become mandatory and will provide documentary evidence of the steps that have been taken to ensure the long-term safety of your research outputs. DMPs follow a template, which now has a revised version for Horizon Europe. The list of points to be addressed crystallise the other points raised in this guidance document regarding RDM and will show that the authors have considered all the necessary measures to uphold best practises.

      Additional factors to be considered in a DMP are ethics, legal requirements and costs: IPR, GDPR compliance and to those of local legal requirements, and conflicts of interest need to be declared, while the cost of doing RDM activities need to be estimated as part of the full grant amount, whether these are for capital expenditure or the time required by individuals to perform curation duties. Other factors to be considered in a DMP will be data retention periods before they will be deleted or destroyed.

    • Recommended and acceptable

      What

      How

      File formats

      Use of some file formats that are not open or lossless can be acceptable if there is widespread consensus on the use of those particular formats. Some file formats have become the de facto standard due to their ubiquity, but may still be proprietary. In these cases, it is still recommended to produce a copy of these data in an open format and that both be stored together.

      Use standardised syntax for file naming to aid better searchability and the ability to perform batch processes. These can take the form of YYMMDD_filename, and it is recommended to use version suffixes wherever possible when creating versions of a file from a master copy. These could, for example, be generated after cleaning or analysis steps.

      Controlled vocabularies and/or ontologies

      Use standard domain specific controlled vocabularies and/or ontologies wherever possible to better align your outputs with similar data in your field of research. Using standardised vocabularies and ontologies will minimise free text which in turn has significant benefits for comparative analyses and searches and consequently increases the value of your data.

      A useful domain agnostic resource for finding ontologies can be found here which also links to other resources, but many can also be found through web searches.

      Licences Open licences are the preferred choice wherever possible such as CC0 or CC BY. However, this may not always be possible for such data as clinical or other sensitive data. For these latter types of data it is still possible to adhere to FAIR principles and open science by making the metadata freely available which will show potential data reusers of the existence of the underlying data without being able to actually see the data itself. Subsequently, these data can be managed through access control mechanisms and anonymisation and pseudanonymisation, and one such tool that can be used is Amnesia.
      PIDs Apply DOIs to your research outputs which can typically be generated through deposition in a trustworthy repository. It is also recommended to register yourself with ORCID to get a PID for you as an individual.
      Storage and backup Use in-house or institutionally approved spaces wherever possible, and those that are non-commercial spaces. This will better guarantee ownership of data than use of commercial third party spaces which may sometimes have physical storage in a geographical location beyond the jurisdiction of the data creator.
      Repositories

      The use of domain specific repositories is the most desirable and will give your data the most value. Such repositories will employ domain specific metadata standards and controlled vocabularies and/or ontologies which will enhance the ability to do analyses across similar datasets and perhaps even across domains.

      Institutional and domain agnostic repositories should only be considered if no domain specific repository can be found and should be used as a last resort. However, there may be occasions when there will be an institutional mandate to deposit in their own repository and this should be fulfilled. Deposition of data in multiple repositories, although technically possible, should be avoided where possible, but if this is done it should be done in such a way so as to maintain the same persistent identifier across those multiple copies.

      Finally, it is recommended to use a trustworthy repository which will provide extra peace of mind since these repositories have been evaluated for their robustness and long-term sustainability. Such repositories may carry a CoreTrustSeal (CTS) or ISO standards approval and can be identified in re3data.org searches.

      DMPs Regularly updating your DMPs, after a grant has been awarded, is recommended and they should be considered as living documents. Any deviations from the original proposal can be documented here with justifications of why.
  • View our webinars recordings

    Horizon Europe Open Science requirements in practice. June 14th, 2022

  • How can OpenAIRE help?

    The following additional support materials can help you with the RDM requirements in Horizon Europe projects: 

     

    argos logo newOpenAIRE also offers tools for research data management: Argos is an OpenAIRE service that simplifies the management, validation, monitoring and maintenance of DMPs. [tool]

     

    Links and further information

    The following sources were used and contain more extensive information on how to address open science in Horizon Europe proposals:

     

Publication date: June 13, 2022

Horizon Europe related guides

Still have questions?

Contact us via our Helpdesk.
We try to respond within 48 hours.

guide, Research Data Management (RDM), horizon europe, horizon europe proposals, rdm

Continue reading