In order to justify processing, personal data must be collected in compliance with one of the legal bases defined by Article 6 of the GDPR (most often consent). As a result, any further processing not provided for in the initial purpose must require a new consent to be lawful; which, in practice, is difficult to achieve (deceased individuals, change of contact details...). Many data sets with significant information potential are thus blocked because the new processing purposes could not be anticipated at the time of collection.
Octopize, thanks to its Avatar anonymization method, allows to create a synthetic dataset that protects the individuals at the origin of the data, while keeping the statistical potential and the original granularity. The Avatar method being certified as a true anonymization solution in the sense of the RGPD, the resulting synthetic dataset is no longer considered as a personal dataset and can be reused for any other purpose without constraint.
Let's take the example of a pharmaceutical company that has set up a cohort to evaluate the influence of a dietary supplement on body fat percentage. A start-up wants to use this data to develop a diagnostic application to predict the fat mass of individuals that is more accurate than current measurement tools. However, the transfer of personal data to the start-up is not compatible with the initial purpose of the collection. In addition, since the creation of the cohort, many individuals have changed their contact information, making it difficult to obtain new consent.
This dataset is anonymized by the Avatar method which gives a new dataset with the same structure as the initial one (same number of individuals, same number of variables, same format).
The transformation of data by the Avatar method is systematically accompanied by an evaluation of the security of the synthetic data generated through unique metrics . These metrics were developed to verify compliance with the 3 criteria identified by the European Data Protection Board (EDPB) (formerly WP29) to qualify data as anonymous under the GDPR; namely:
From our example we obtain the following results:
The results help obtained indicate that it is impossible in practice for an attacker to re-identify the individuals in the cohort.
We seek to verify whether the dataset anonymized by the Avatar method, and transmitted by the industrialist to the start-up, has a predictive potential of the percentage of fat mass equivalent to the original dataset.
The transformation of data into avatars makes it possible to revalue the dormant potential of certain data by allowing or facilitating their transfer while respecting the privacy of individuals and ensuring a strong preservation of the statistical qualities of the original data.