The research life-cycle made easy using OpenAIRE and EOSC-hub services: making data open yet anonymous
Becky is an early career social scientist. She is excited to have been offered a competitive post in a well-established department to work on an international EC Horizon 2020 5-year research project with many partners. The project started a year before she arrived, and her institution leads the research on the language used to describe immigration in the national press. Data Everywhere Becky knows that a significant amount of research data has already been gathered during interviews with desk editors. This research data is safely stored in a closed cloud computing area on the secure TSD - Service for Sensitive Data service. TSD is a cloud computing platform designed to comply with the security regulations appropriate to handle sensitive data. This service is provided through the Hub. Having the data safely and securely stored via the TSD service means that only the authorised researchers have access to it. The data has limited accessibility, is not discoverable and cannot be easily shared with other researchers. This is why the 'FAIRness' of the data is rather poor.
In her second week, Becky receives a reminder from the project’s coordinator that their H2020 Data Management Plan (DMP) is due for update. She needs to find out two things:
How to update the data management plan - about which she has little experience and;
How to comply with the EC’s open research data mandate.
Becky’s University European Office had previously given her information about the OpenAIRE Helpdesk. The OpenAIRE Helpdesk is where all the researchers participating in EC’s Horizon 2020 funded projects can benefit from personal assistance from a range of experts to make their project data discoverable and accessible to others. This includes recommendations for making them FAIR (Findable, Accessible, Interoperable and Reusable) and drafting the DMP. The DMP is important, since it allows the funder to check whether the project’s research is managed according to the funding agreement. Becky is directed to a suite of materials about mandate compliance that can help her understand the benefits of open data and why sharing the project data outcomes far and wide is good for society, the project itself and her research field. But Is the Data Shareable? According to the DMP, everytime a dataset is included in a publication, it needs to be publicly available to comply to the FAIR principles. Since the dataset contains sensitive information, Becky first needs to anonymise personal information from the people interviewed. Her concern about the sensitiveness of the data is reinforced by an email from her university’s ethics department reminding her to check the status of personal data and the need to conform to GDPR principles. Realising that much of the project data is sensitive, Becky goes back to the OpenAIRE Helpdesk to guide her through the different options. An Anonymous Solution The helpdesk proposes AMNESIA, a tool developed by OpenAIRE to support researchers to anonymise their research data. AMNESIA is a flexible data anonymization tool that allows to remove identifying information from data. It removes direct identifiers like names, SSNs etc., but also transforms secondary identifiers like birth date and zip code so that individuals cannot be identified in the data. Private Enough - Open to All Now that the dataset is anonymised, Becky makes it available and discoverable, by uploading it to B2SHARE -- a data repository provided via the Hub. With B2SHARE, researchers and communities can publish datasets and get Digital Object Identifier (DOI) to use in publications. All datasets published in B2SHARE are automatically made discoverable and findable via B2FIND -- an EUDAT metadata discovery portal that allows users to find data collections within an international and inter-disciplinary scope.
After a few months, Becky discovers that the dataset has been cited on a number of occasions by other researchers and she can see via the download statistics maintained within the B2SHARE service (showing the number of downloads) that objects within the dataset has been downloaded frequently.
The Outcome: Thanks to OpenAIRE and EOSC-hub services, Becky was able to get on-the-spot support to make research open for the benefit of all, making sure that her data is well-managed with a management plan, safely stored and published, complying at the same time with GDPR requirements.