Since the establishment of the GDPRE, the issue of transferring personal data to third countries has been a blocking point to the realization of many projects. Although necessary to respect the privacy of each individual, this measure impacts the data processing market and deprives the use of data potentially useful to the common good. The year 2020 has marked the news by hardening the exchanges towards the United States following the invalidation of the Privacy shield on July 16 by the Schremps II case.
Octopize, thanks to its Avatar anonymization method, allows to create a synthetic dataset that protects the individuals at the origin of the data, while keeping the statistical potential and the original granularity. The anonymized data produced is no longer considered personal data and can be freely transferred to providers or partners outside the EU.
For example, a European health care institution has a cohort studying a heart condition. This institution wishes to send the data from the cohort to a company specialized in artificial intelligence in the United States so that it can build a predictive model of the presence of a cardiac pathology in a patient. This model will provide diagnostic support to the institution's clinicians.
In this use case, we identify two objectives.
The transformation of data by the Avatar method is systematically accompanied by an evaluation of the security of the synthetic data generated through unique metrics. These metrics were developed to verify compliance with the 3 criteria identified by the European Data Protection Board (EDPB) (formerly WP29) to qualify data as anonymous under the GDPR; namely:
From our example we obtain the following results:
The results obtained indicate that it is impossible in practice for an attacker to re-identify the individuals in the cohort.
We seek to verify whether the Avatar-anonymized dataset transmitted to the non-EU partner will allow the latter to build a prediction model that performs as well as the one that would have been built from the original data.
To check the maintenance of the data structure after processing, a dimension reduction step is performed. The original (left) and avatarized (right) data are projected into the space determined by the original data using disease as an illustrative variable. The similarity of the distribution of individuals in the space between the original and avatarized data predicts the conservation of the signal after processing.
The transformation of data into avatars makes it possible to accelerate and facilitate data transfers outside the EU while respecting the privacy of individuals and ensuring a strong preservation of the statistical qualities of the original data.