3 minutes reading time (638 words)

Embedding Amnesia, OpenAIRE’s free and open-source software, in COVID-19 research workflows to enable secure processing and sharing of sensitive data

Amnesia-COVID

Background: Since the beginning of the pandemic, 'ΑΤΗΕΝΑ' Research Center has been active in communicating best practices and achievements from the national and European R&I ecosystems to the Greek academic and research communities. To maximize the impact of this effort for the benefit of researchers and health practitioners, the OpenAIRE Greek NOAD has been collaborating with national stakeholders and initiatives from different domains (e.g. SSH, Bioinformatics, Computer Science, Public Management) to both support diverse research endeavors and offer opportunities for discussion about strengths and weaknesses observed in the followed approaches.

Practical workshops: In continuation of the COVID-19 series, more practical workshops are foreseen to enhance community understanding of the use of available tools as well as to build capacity on specific aspects in the COVID-19 research data lifecycle.

The workshop that was held in June 2021 ran together with the OpenAIRE NOAD in Cyprus and concentrated on the processing of data that carry sensitive and personal information taking as a use case real data from medical records of a hospital in the United Kingdom. Moreover, OpenAIRE's anonymization tool "Amnesia" was demonstrated as a tool that can be embedded in COVID-19 research workflows to enable personal identifiers' removal with guarantees.

Presentation and demo: Guest speaker and trainer for the needs of the workshop was Manolis Terrovitis, who is the creator of the Amnesia tool. Manolis started with a presentation to set the basis of the legal framework pertaining to sensitive information (GDPR) and Open Access restrictions in Europe. The presentation went on to communicate data security and privacy methods, including encryption, and highlighted the two most commonly used anonymization techniques stating that they are usually misused by researchers:

  • Anonymization that transforms personal data into exploitable non-personal data so they can be re-used or exposed or given to third parties without limitations, e.g. GDPR. The anonymization process offers guaranties that original data cannot be retrieved,
  • Pseudo-anonymization that removes direct identifiers while retaining secondary information. The pseudo-anonymized data are partially protected, but they remain personal data and can lead to the recognition of a person when reverse engineering is applied.

In order to comply with Open Access principles, the preferred method is anonymization with guarantees, such as those characterizing k- and km- anonymity techniques.

Amnesia is a free and open-source software that can be downloaded and installed locally in all operating systems (Windows, Linux, etc). 

It is a unique tool for implementing anonymization to set-valued data. Despite its complexity and the different algorithms that it runs, its interface is simple and user-friendly.


The way Amnesia works is as follows:

  1. Researchers import their data in tabular formats (e.g. .csv, .xls)
  2. Researchers select from or create the generalization hierarchies, which are compiled following rules for generalizing values in a semi-automatic way. Hierarchies can be saved and re-used or they may also be imported from other sources.
  3. Researchers choose the most suitable method for their situation (e.g., k-anonymity or km-anonymity) and link the hierarchies with the respective attributes of the records.
  4. From the results of the data generalization, researchers can choose the solution(s) that fits their needs.
  5. The anonymized data can be saved locally or directly deposited to Zenodo!

Hands-on and discussion: During the workshop, Manolis answered all the questions posed by the participants and had one-on-one break out sessions with those who needed more assistance. The discussions mainly focused on understanding the proper use of Amnesia in the different contexts and domains represented by participants and how anonymization is integrated in third party systems. Clarifications and tips included specific steps of the anonymization process in Amnesia, like the rounding of continuous variables where Manolis suggested selecting the largest range in the data to achieve the major accuracy.

Related Posts

 

Comments

No comments made yet. Be the first to submit a comment
Guest
27 Jul 2021

Captcha Image

OpenAIRE
flag black white lowOpenAIRE-Advance receives
funding from the European 
Union's Horizon 2020 Research and
Innovation programme under Grant
Agreement No. 777541.

Subscribe

  Unless otherwise indicated, all materials created by OpenAIRE are licenced under CC ATTRIBUTION 4.0 INTERNATIONAL LICENSE.
OpenAIRE uses cookies in order to function properly. By using the OpenAIRE portal you accept our use of cookies.