Spearheading Industry 4.0, digital twins are now spreading to the healthcare sector. Boosted by the Covid-19 epidemic, their market is exploding, as are the risks weighing on the privacy of the individuals who provide the data. How can we unleash the potential of digital twins without compromising on ethics? We have the solution: avatars, a unique data anonymization method that has been successfully evaluated by the CNIL. Impossible, in practice, to re-identify, the avatarized data goes beyond the RGPD. They become usable, shareable - even outside the European Union - and retainable without limits, while guaranteeing the quality of the initial data set. How do we differ from the competition? We prove all these points with our metrics. A real revolution in the current Health Data Hub context. What if avatars became the norm tomorrow?
"Houston, we've had a problem", said the Apollo 13 crew on April 17, 1970.
A few miles from the moon, an explosion has just occurred on board the spacecraft. Hundreds of thousands of kilometers away, on Earth, NASA teams diagnose and solve the problem remotely thanks to several simulators, a kind of "digital doubles", synchronized thanks to the data flow coming from the shuttle. The crew returns safely. The ancestors of the digital twins are born. NASA was the first to develop them, but it was not until 30 years later that the concept of "digital twin" emerged.
What is a "digital twin"?
In 2002, Michael Grieves was a PLM (Product Lifecycle Management) researcher at the University of Michigan. During the presentation of a center dedicated to product lifecycle management, he explained for the first time to the industrials present the notion of a "digital twin": a digital replica of a physical object or system. It is not a fixed model, but a dynamic model, reproducing its needs, its behavior and its evolution over time. As with Apollo 13, there is a deep connection between the physical entity to its digital twin: the flow of data from one to the other.
Since then, the concept of the digital twin has evolved little. It involves replicating an object (a piston or a car engine), a system (a nuclear power plant or a city) or an abstract process (a production schedule). The concept also applies to the living things: a molecule, a cell, an organ or a patient, such as a drug, a virus, a disease or an epidemic can have their digital twin.
Digital twins are an evolution, more than a revolution, combining mathematical modelling and digital simulation.
The result of the growth of new technologies (IoT, big data, AI, cloud, etc.) and computing power, digital twins are an evolution, more than a revolution, combining mathematical modelling and digital simulation. Incoming data, wherever it comes from - real, synthetic, collected in real time using sensors or via pre-existing databases - feeds a mathematical model to fine-tune it. The model can then be transformed into a digital guinea pig, on which to test different scenarios via simulations, in order to predict the evolution of the real system.
Product design and life cycle, automotive and aeronautics, energy production and distribution, transport, smart building and urban planning, digital twins are now one of the pillars of Industry 4.0. They have recently spread to other sectors, such as logistics and, above all, healthcare. According to a study by MarketsandMarkets, the digital twins market could grow from $3.1 billion in 2020 to $48.2 billion in 2026, a spectacular 58% growth, partly due to the Covid-19 epidemic.
The promise of digital twins in healthcare, myth or reality?
Last January, at the CES (Consumer electronics show) in Las Vegas, Dassault Systèmes presented its latest feat, the digital twin of a human heart, the result of 7 years of development. Powered by data collected from hundreds of doctors, researchers and industrialists around the world, it replicates not only the anatomy of the heart, but also its functioning: the flow of electrical current along the nerves, the behaviour of muscle fibres, the reaction to various drugs, etc. Thanks to advances in medical imaging, this digital twin is easily customisable. It takes less than a day to replicate the morphology and pathologies of a patient's heart.
Dassault Systèmes and its competitors are already working on other organs, including the lungs, liver and of course the brain, but exact replication is currently out of reach. And for good reason! Neurobiologists have yet to unravel all its mysteries. The perfect clone of the human body - modelling anatomy, genetics, metabolism, bodily functions and pathologies - is therefore not yet within reach. However, there is no need to wait for complete digital twins to make great strides. Digital twins, even partial ones, of certain organs, diseases or patient/drug combinations - such as those developed by the start-up ExactCure - are already sufficient to address specific problems.
If digital twins live up to their promise, they will ultimately signal the advent of personalised medicine.
Simulating the anatomy and functioning of our body at the molecular, cellular, tissue and organic levels; modelling tailor-made implants; simulating ageing or a disease; testing a drug or a vaccine on a virtual patient or cohort; rehearsing and assisting complex surgical procedures; monitoring patient flows in hospitals to rationalise human and technical resources: if digital twins fulfil all their promises, they will ultimately signal the advent of personalised medicine.
A study published in July 2021 in the journal Life Sciences, Society and Policy reviews the socio-ethical benefits of digital twins in health services. On the podium are the prevention and treatment of disease, followed by cost savings for some healthcare institutions, and finally, increased autonomy for patients - better informed, they are better able to make informed decisions about their care.
Risks commensurate with the hopes raised
Nevertheless, there are still many hurdles to overcome before we reach this public health Eldorado. The fundamental problem lies in the crux of the digital twins' war: health data. This highly sensitive personal data contains genetic, biological, physical and lifestyle information. The same study warns of the number one socio-ethical risk of digital twins, mentioned by all participants: the violation of privacy.
The fundamental problem is the crux of the digital twins' war: health data. This highly sensitive personal data contains genetic, biological, physical and lifestyle information.
If the digital twins are owned or hosted by private organisations, this information can be used without the knowledge of the patients, or even turned against them. The simplest example: a bank or insurance company with access to it could deny a loan or increase premiums to a sick person.
Add to this the security holes. If digital twins multiply, the risk of losing or having data stolen increases with them. But once the data has been leaked, it is too late. It can be used by anyone, in any way. This is a disaster scenario that is becoming increasingly common in France, where cyber attacks on healthcare organisations doubled in 2021. The theft of health insurance data from half a million French people in early 2022 is a striking example.
All the benefits of digital twins are therefore conditioned by the availability and quality of health data.
All the benefits of digital twins are therefore conditioned by the availability and quality of health data.
Then there is another risk: the low quality of the data. Indeed, AI algorithms are trained on available biomedical data. However, this data is often heterogeneous, incomplete and not always reliable. This is due to several reasons: lack of standardisation, pressure to publish, bias, tradition of not publishing failures, etc. Bad data means bad models and bad simulations.
All the benefits of digital twins therefore depend on the availability and quality of health data. However, it is extremely difficult for researchers to retrieve and use this data, particularly in France, where its use is strictly limited by the GDPR (General Data Protection Regulation) and the Loi Informatique et Libertés. In particular, their transfer outside the European Union is prohibited, a particularly sensitive issue in the current public debate. The cases follow one another at a frantic pace, from Google Analytics to Meta. The government has even preferred to postpone its request for authorisation from the CNIL for the Health Data Hub, while this health data centralisation project undergoes a transformation.
Avatars to unlock the growth potential of digital twins
To unleash the growth potential of digital twins, there is already a solution proposed by Octopize - Mimethik Data, our deeptech start-up. We have developed a unique and patented method of data anonymisation: avatars. Data anonymisation is not new and the methods are multiplying all the time. However, most of them do not provide proof that it is impossible to re-identify patients, far from it. Our disruptive innovation, based on a new Artificial Intelligence technique, allows personal data to be exploited and shared in full respect of privacy. Unlike our competitors, we can prove through our metrics the effectiveness of our avatars in terms of both privacy and data quality. Our secret? An AI algorithm focused on each patient, not on the whole dataset.
For each patient (i.e. each row in the database), we use a KNN algorithm to identify a number of neighbouring data. From these neighbouring data we build our model. At this stage, the real patient and his data have "disappeared" - it is impossible to know whether they are in the model or not, only his nearest neighbours are. We then generate an avatar using a local pseudo-stochastic model, i.e. we introduce random, and therefore non-reversible, noise for each attribute (i.e. each column in the database). It is impossible to go backwards, each time we run the model again for the same patient, we create a different avatar. This ensures anonymisation, while preserving the granularity of the dataset, the correlations between individuals and the distributions on each variable. Same Gauss curves, same means and same standard deviations, to within an epsilon.
The data, once avatarized, becomes synthetic data, without any risk of re-identification for the patients. It then falls outside the scope of the GDPR and its exploitation becomes unlimited.
The data, once avatarised, become synthetic data, without risk of re-identification for the patients. They are then out of the RGPD and their exploitation becomes unlimited. They can be stored, exploited, shared and reused without geographical or temporal constraints. Moreover, the CNIL has not been mistaken and has successfully evaluated our method in 2020, attesting to its compliance with the three criteria on anonymisation described in the G29 opinion. Thanks to avatars, the privacy risk inherent in digital twins is eliminated.
Avatars are also easily deployable and scalable. They can be configured to suit all needs, from internal use to open data. Another advantage is that avatars also solve the problems of availability and bias of health data. From a real dataset, we can generate synthetic datasets that are larger than the initial database, as each individual can give rise to several avatars. In this way we can amplify a cohort. In the end, we propose labelled and "clean" health datasets, ready for use, ready for all uses.
Beyond digital twins, avatars are in themselves a revolution, and not only in the health field.
By addressing issues of privacy, data availability and data quality, avatarisation is therefore a great opportunity to unleash the growth potential of digital twins. But beyond that, avatars are a revolution in themselves, and not just in the health sector. Banking, insurance, telecom, industry, energy, all sectors handling sensitive data now have a turnkey solution. Octopize - Mimethik Data defends with its avatars an ethical point of view at the service of value creation. We are firmly convinced that data avatarization, a disruptive innovation today, will be the new European standard tomorrow.