Embedding Amnesia, OpenAIRE’s free and open-source software, in COVID-19 research workflows to enable secure processing and sharing of sensitive data
Background: Since the beginning of the pandemic, 'ΑΤΗΕΝΑ' Research Center has been active in communicating best practices and achievements from the national and European R&I ecosystems to the Greek academic and research communities. To maximize the impact of this effort for the benefit of researchers and health practitioners, the OpenAIRE Greek NOAD has been collaborating with national stakeholders and initiatives from different domains (e.g. SSH, Bioinformatics, Computer Science, Public Management) to both support diverse research endeavors and offer opportunities for discussion about strengths and weaknesses observed in the followed approaches.
Practical workshops: In continuation of the COVID-19 series, more practical workshops are foreseen to enhance community understanding of the use of available tools as well as to build capacity on specific aspects in the COVID-19 research data lifecycle.
The workshop that was held in June 2021 ran together with the OpenAIRE NOAD in Cyprus and concentrated on the processing of data that carry sensitive and personal information taking as a use case real data from medical records of a hospital in the United Kingdom. Moreover, OpenAIRE's anonymization tool "Amnesia" was demonstrated as a tool that can be embedded in COVID-19 research workflows to enable personal identifiers' removal with guarantees.
Presentation and demo: Guest speaker and trainer for the needs of the workshop was Manolis Terrovitis, who is the creator of the Amnesia tool. Manolis started with a presentation to set the basis of the legal framework pertaining to sensitive information (GDPR) and Open Access restrictions in Europe. The presentation went on to communicate data security and privacy methods, including encryption, and highlighted the two most commonly used anonymization techniques stating that they are usually misused by researchers:
- Anonymization that transforms personal data into exploitable non-personal data so they can be re-used or exposed or given to third parties without limitations, e.g. GDPR. The anonymization process offers guaranties that original data cannot be retrieved,
- Pseudo-anonymization that removes direct identifiers while retaining secondary information. The pseudo-anonymized data are partially protected, but they remain personal data and can lead to the recognition of a person when reverse engineering is applied.
In order to comply with Open Access principles, the preferred method is anonymization with guarantees, such as those characterizing k- and km- anonymity techniques.
Amnesia is a free and open-source software that can be downloaded and installed locally in all operating systems (Windows, Linux, etc).
It is a unique tool for implementing anonymization to set-valued data. Despite its complexity and the different algorithms that it runs, its interface is simple and user-friendly.
The way Amnesia works is as follows:
|
Hands-on and discussion: During the workshop, Manolis answered all the questions posed by the participants and had one-on-one break out sessions with those who needed more assistance. The discussions mainly focused on understanding the proper use of Amnesia in the different contexts and domains represented by participants and how anonymization is integrated in third party systems. Clarifications and tips included specific steps of the anonymization process in Amnesia, like the rounding of continuous variables where Manolis suggested selecting the largest range in the data to achieve the major accuracy.
Find out more
Amnesia documentation https://amnesia.openaire.eu/about-documentation.html
Amnesia guide https://www.openaire.eu/amnesia-guide
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.