Skip to main content

Services use cases

Decido

Embedding Amnesia in Public Authorities Data Workflows for immediate anonymization with guarantees

Services uptake - what’s new: DECIDO integrates Amnesia!

The collaboration between the DECIDO project and Amnesia tool leads to an integration that ensures immediate anonymization of data produced by the project.

About

DECIDO is a Horizon 2020 project aiming to boost the use of EOSC (European Open Science Cloud) by Public Authorities, enabling innovation in the policy making sector, removing European fragmentation, allowing cross-support, and cross-collaboration and the use of secure compute – and data – intensive services.

Where data are involved

In this effort, data from the public sector and citizens (via outsourced actions) are collected and analyzed in an ICT infrastructure realized for the needs of the project. Data follow standard processes during their handling and processing in the 4 main pilot areas that are addressed in different countries across the EU.

DECIDO use case 1

Each partner responsible for the pilot stores personal data in its own storage facilities. Each personal data storage satisfies a set of requirements to secure data from being identifiable and / or leaked. Anonymization and pseudo-anonymization techniques are followed on the occasions of:

  • Anonymisation with Amnesia to share data outside the pilot/organization
  • Pseudonymisation with other techniques (encryption, hashing, tokenization etc.) through a component called “Anonymizer” (pseudonymization stuff).

DECIDO use case 2

Amnesia integrations

The Turin pilot about flood disaster management showcases the way that Amnesia is embedded in the data management workflows. Currently, there are two options for using Amnesia in the pilot:

A Leveraging the Amnesia UI (with all functionalities)
B Use Amnesia on NAS (potentially) in two diverse modalities:
B.1 Leveraging the Amnesia UI (with all functionalities)
B.2

Using Amnesia as black-box and SSH script to anonymise data in a “semi-automatic” way, due to hierarchy files that should be provided to the tool. Hierarchy files describe the “way” as data will be anonymized. The modality to invoke Amnesia as a script SSH is completed in three steps:

Step 1. In the end, a script will be invoked to anonymize data
Step 2. Amnesia returns an answer and writes the anonymized dataset in the location provided as input
Step 3. The generated file with anonymized data “ anonymizedData.txt“ is found on the path

Observations and suggestions

Amnesia was ported into a Docker environment to fix an installation issue on Synology NAS that has a proprietary operating system. The differences that were observed between the two options are:

Option A. Use Amnesia from local PC on LAN

  • Advantage (Performance): Amnesia runs on a PC located on municipality LAN so the NAS that provides only the DBMS is not stressed by Amnesia operations.
  • Disadvantage (IDM): Two diverse systems with 2 different IDM need to be managed.

When Amnesia is installed as-is, the server can absorb hardware capacity, but there is higher performance. However, there are issues regarding authorization and authentication that need to be overcome. The Security layer and IDM should manage access from another device.

Option B. Use Amnesia on NAS

  • Advantage (IDM): Amnesia can use the same IDM of the webapp that will provide (CRUD) APIs to the external world to interact with data stored in the NAS device.
  • Disadvantage (Performance): NAS is not a server, so attention needs to be paid to the core functionality of NAS which is the storage of data and the PLUS functionalities that are anonymised with Amnesia.

On the other hand, when Amnesia is used on NAS, both IDM and the security layer for authorization and authentication provide APIs and CRUD to interact with data. Though, in this case, performance is lower because in the NAS device there are two services up/running: DB (mongo) for data storage and Amnesia for data anonymization, in addition to HD specifications with hardware that are limited and not scalable.

What to look forward to

The collaboration between the DECIDO project and Amnesia tool led to an integration that ensures immediate anonymization of data produced by the project. Currently, two integration options are utilized by the Turin pilot. The next steps concentrate on expanding dockerization processes to leverage Amnesia with and without User Interface. There still needs to be agreed how to automatically retrieve or build hierarchy files, when need be.

 

Published: July 26, 2021