Amnesia implements data anonymization techniques from the field of Privacy Preserving Data Publishing (PPDP). The key idea in anonymization is that identifying information is removed from the published data, so that sensitive information cannot be attributed to a person.
Anonymization techniques present descriptive information in an obscure or generalized way, to guarantee that such associations cannot take place. Anonymization algorithms transform the data into a form that provides a privacy guarantee with the minimum possible distortion of the original data. A significant challenge for every anonymization method is to provide the best trade-off between privacy guaranty strength and anonymized data quality.
Amnesia supports k-anonymity and km-anonymity, two privacy guaranties that make each record indistinguishable from other k-1 records. Original data are anonymized by using generalization and suppression. Generalization is the substitution of a value in the original file, e.g., "Athens" with a more abstract one, e.g., "Greece". Substitutions take place according to a predefined hierarchy of values, e.g., "Athens" < "Greece" < "Europe", which is user defined or it can be automatically created by the tool.
Amnesia offers algorithms that take advantage of modern hardware architectures, that feature multiple computer cores. Moreover, Amnesia allows the user to guide the anonymization process by visualizing the candidate solutions and allowing the user to choose and customize the most convenient one.
Original and anonymized dataset with k=4
The user can explore the quality of the data with ad hoc queries and a visual representation of the value distribution. He can choose to suppress outliers in order to avoid unnecessary generalization of the original data and information loss. Amnesia focuses on usability and flexibility to allow the user to understand and guide the anonymization process. Since anonymization methods have not been extensively used in practice, it is essential that users will be able to tailor the anonymization processes and especially the information loss in the anonymized data to their needs. For more details, you can check our documentation.