This blog post is authored by S. Venkataraman (DCC) in the framework of the RDM task force's "Data Reuse" working group.
Not all data can be easily reused, if at all. These usually arise from sources that need to maintain confidentiality based on a number of factors that could lead to the identification of individuals or other sensitive information. In these cases, different measures need to be taken to ensure the protection of identities.
Meanwhile, the FAIR principles have been developed from a starting point that did not fully appreciate these restrictions but nonetheless do not preclude sensitive data from adopting them to at least some degree. Openness in research is a concept that is becoming increasingly commonplace but should not be confused with the FAIR principles. Indeed, there is a commonly used phrase in the RDM community when comparing the two: as open as possible and as closed as necessary. Moreover, the recent establishment of the GDPR in Europe has caused a seismic shift in the responsibilities of different actors in management of sensitive data: the onus has been placed on the data controllers to ensure that data is suitably safeguarded, where in this case a controller may be the repository holding such data, or the body that defines the rules. Failing in these responsibilities has major financial legal and ethical implications for the data controller.
The balance that needs to be struck in the case of data that is sensitive in nature will ultimately be case dependent but there can be common mechanisms that could be adopted, and this will be highlighted later through FAIR4Health as an example.
Before starting off on any new research project that involves sensitive data, it is essential that test subjects are given the opportunity to give permission to use their data in future for predefined research purposes. That is, to make their personal details, whether direct or indirect, visible to the community. The obligation lies with the researcher to do this and can be simply fulfilled through consent forms that survey test subjects on these matters. Failure to provide these forms can have serious ethical and legal implications which should be enough of a deterrent to ensure this does not happen. In the UK, there is the example of the NHS and the Medical Research Council that provides guidance on these matters as well as a template that can be used by researchers that are embarking on studies involving patients.
One of the most commonly used tools in the arsenal for protecting the identity of individuals is data anonymisation (for example, Amnesia developed by OpenAIRE) and pseudonymisation. Stripping key elements that can directly or indirectly identify an individual can be a robust method for upholding privacy. There are numerous mathematical models that have been developed that address these to varying degrees and which will have knock-on effects to the quality of the data analyses. Removing more identifiers will degrade the quality of the analysis and consequently the results and striking the right balance between the two will consequently be somewhat subjective
A complementary and/or independent method to anonymisation is to monitor and restrict where necessary access to sensitive data. This may take the form of vetting potential data reusers by requesting information about them and to assess their trustworthiness; providing different tiers of access. Typically, this is done through established ethical review and institutional review boards who will review requests on a case by case basis.
This is a specific framework developed at the UK Data Service but which incorporates many of the core topics that should be followed when considering sensitive data. The 5 Safes being referred to here are the Safe People, Safe Projects, Safe Settings, Safe Outputs and Safe Data. That is, that data should only be shared with trusted people and projects, and that such research would only be conducted using settings that have been pre-approved, while only outputs that do not divulge sensitive information will be allowed into the wider world. Together these sensitive data, both raw and processed, can be the subject of experiments.
A case example for sensitive data is that involved in health research. In our current example, this can include clinical data as well as social care data. Of course, such data will frequently be linked to individuals and this could consequently reveal identities of those individuals which should be protected at all costs. If this is the case, then how can we realistically allow reuse of these health data?
The FAIR4Health consortium is a European Commission funded project that is running for three years and aims to address these issues. It brings together 17 partners from across 11 countries in Europe to address these questions. At the heart of the issue that is being addressed is the need to harmonise the way in which different countries across the EU, and beyond, are tackling the issues surrounding data privacy and to ensure that they are managing their health research data in a consistent and uniform manner. This will require a number of different solutions at a technical level as well as at a policy level.
For the former, FAIR4Health is building a technical platform that can be used by different institutions across different countries that will allow the safe exchange of data and therefore its reuse. To accomplish this, several members of the consortium are software developers that have been brought in to build a bespoke platform that could then be reused in itself or emulated by others for their circumstances. The project also re-evaluated the typical workflow employed by researchers, based on a process outlined by GO FAIR, to FAIRify their data: what changes need to be made to allow the workflow to be useful and practicable for sensitive data? A draft version of a revised version of the workflow has been made and which will be tested.
In terms of policies, including ethical and legal aspects, FAIR4Health has done an analysis across the field to sample the different approaches taken in the different member states of the EU. This forms the basis for bringing together common elements and addressing disparities in the way that sensitive health data are handled.
A common problem that is seen amongst researchers of all fields is that they do not yet know about the FAIR principles and must therefore be trained. This will encourage uptake of FAIR as well as making it more well known and therefore accepted in the research community.
Similar to other areas of research, FAIR4Health also identified a need to provide an easy-to-use method to input data for researchers and data owners. Typically, this takes the form of data curation and therefore, a tool (data-curation-tool) has been developed and which can be used as a standalone application completely free to use. In conjunction with this tool, the data-privacy-tool has also been developed which will provide a means for data anonymisation. Curation should meet minimum requirements to allow reproducibility of experiments and be as rich as possible beyond these minimum set of descriptors. Of course, this means being FAIR compliant. To this end the FHIR standard is being adopted which will provide this framework for metadata and ontologies and throughout the concept of privacy-protecting distributed data mining (PPDDM) is fulfilled. As well as the standalone applications that are being made freely available for reuse, the FAIR4Health project is also creating a platform in which there will be members who can harness these tools as well as others in a "safe" environment designed for sensitive data. This platform will be launched in November 2020 and will be available to anyone in the world, and not just within Europe.
Finally, and to reach a wider audience than just that within the EU, FAIR4Health has been working on drafting guidelines that will allow HRPOs and researchers across the world to make their health data FAIR compliant. Initial reviews of the current global barriers have revealed many local obstacles that need to be overcome and which are unique in many cases to particular territories. These usually relate to legal requirements that need to be adhered to. However, it is the ambition of FAIR4Health to provide some unifying guidelines that can supersede these local barriers.
Building on the work already done at the European level and published as a deliverable here, the next step is to expand on this work to think more about the same challenges at a global level. A new RDA working group has been established to tackle this and which aims to survey the global landscape in similarities and disparities in FAIR adoption and application and consequently distil these into a set of unifying guidelines that can be published as RDA outputs.