Logo réduit OCTOPIZE - pictogramme

Data transfer outside the EU

This use case illustrates the problem of transferring personal data to a partner outside the European Union.


Since the establishment of the GDPRE, the issue of transferring personal data to third countries has been a blocking point to the realization of many projects. Although necessary to respect the privacy of each individual, this measure impacts the data processing market and deprives the use of data potentially useful to the common good. The year 2020 has marked the news by hardening the exchanges towards the United States following the invalidation of the Privacy shield on July 16 by the Schremps II case.


Octopize, thanks to its Avatar anonymization method, allows to create a synthetic dataset that protects the individuals at the origin of the data, while keeping the statistical potential and the original granularity. The anonymized data produced is no longer considered personal data and can be freely transferred to providers or partners outside the EU.


For example, a European health care institution has a cohort studying a heart condition. This institution wishes to send the data from the cohort to a company specialized in artificial intelligence in the United States so that it can build a predictive model of the presence of a cardiac pathology in a patient. This model will provide diagnostic support to the institution's clinicians.

Data type

The personal health data associated with this cohort are both qualitative and quantitative and specify the presence or absence of cardiac pathology in a patient based on different variables. This is sensitive data since re-identification can lead to the leakage of a patient's health status.
  • 303 individuals
  • 14 variables
This dataset is anonymized by the Avatar method which gives a new dataset with the same structure as the initial one (same number of individuals, same number of variables, same format).

Objectives of anonymization

In this use case, we identify two objectives.

  1. Make it impossible to re-identify individuals in the dataset.

  2. Retain the predictive capacity of the data.

How does Octopize ensure the anonymity of individuals?

The transformation of data by the Avatar method is systematically accompanied by an evaluation of the security of the synthetic data generated through unique metrics. These metrics were developed to verify compliance with the 3 criteria identified by the European Data Protection Board (EDPB) (formerly WP29) to qualify data as anonymous under the GDPR; namely:

  • Singling Out
  • Linkability
  • Inference.

From our example we obtain the following results:

  • Hidden rate: 93.63%.
  • Local cloaking : 12
  • Correlation protection rate : 100% (reference variables : age, sex)
  • Inference rate : 56.43% (reference variables : age, sex, target : disease)

The results obtained indicate that it is impossible in practice for an attacker to re-identify the individuals in the cohort.

Do the synthetic data generated provide the same results as the original data?

We seek to verify whether the Avatar-anonymized dataset transmitted to the non-EU partner will allow the latter to build a prediction model that performs as well as the one that would have been built from the original data.

To check the maintenance of the data structure after processing, a dimension reduction step is performed. The original (left) and avatarized (right) data are projected into the space determined by the original data using disease as an illustrative variable. The similarity of the distribution of individuals in the space between the original and avatarized data predicts the conservation of the signal after processing.

The training protocol consists in training several identical machine learning models, identical 2 by 2, on the original data on one hand and their avatarized equivalent on the other. The two models were then tested on the remaining original data.
The overall accuracy performance (the percentage of good prediction) of the different models is then compared. The result is that the models trained on avatars predict with a similar performance to the models trained on original data, regardless of the model used.


The transformation of data into avatars makes it possible to accelerate and facilitate data transfers outside the EU while respecting the privacy of individuals and ensuring a strong preservation of the statistical qualities of the original data.

Other use cases

Revaluation of a cohort for a new use

This use case illustrates the problem of reuse of personal data for a new purpose not foreseen by the processing purpose of the first consent.

Risk limitation in internal use

This use case deals with precautionary notions in the use and governance of personal data. How to limit the risks linked to an internal use of this data?

Storage of spatio-temporal data

The "New York Taxi" use case presents a context of anonymization of spatio-temporal data. The difficulty lies in the particular nature of this data, where the combination of spatial and temporal dimensions accentuates the risk of re-identification.
© Octopize 2022