Since the GDPR came into force four years ago, how companies deal with personal data has changed drastically. Synthetic data promise a new approach to data usage. Recent developments, such as the steadily increasing GDPR fines, show that the issue of data protection is far from quiet. And that won’t change anytime soon.
However, the handling of data and the strategy behind it are still an integral part of successful companies. Whether in finance and insurance, the healthcare industry, or other areas: Large amounts of data have become indispensable. So how can these widely diverging developments be connected? The magic word could lie in the synthetic data approach.
The Balance Between Meaningfully Collected And Usable Data
The challenge lies in finding the right balance between meaningfully collected and usable data. If the data is personal, its use is strictly regulated – if it is anonymous, it is sometimes of little use in the evaluation. The ever-increasing GDPR fines mainly drive the discussion. From July 2018 with just one fine (400,000 euros) to July 2020 with 332 fines (more than 130 million euros), to the latest figures from March 2022 with 1,030 fines (more than 1.6 billion euros). International bodies in the EU and national bodies will regulate data-driven business models with their digital strategies and the planned ePrivacy regulation even more in the future.
After the Court of Justice of the European Union invalidated the Privacy Shield mechanism in the Schrems II case, even the US data giant Google Analytics will have to reconsider its data processing model, like the ban by the French data protection regulator CNIL and a similar decision by the Austrian court demonstrate. After the latest agreements between the EU and the US in March 2022, it is currently uncertain how things will play out under the new transatlantic data transfer agreement. But large sums of money were also fined against European companies, most of them in industry and commerce (233 total fines with more than 796 million euros), followed by media, telecommunications, and broadcasting (177 unlimited fines with more than 613 million euros). It is, therefore, hardly surprising.
Development Of Advanced AI And Deep Learning Models
Of course, it is no surprise that data continues to be the focus of the legislation. It is also no secret that companies are increasingly dependent on large amounts of data, and the untapped potential in the “gold of the 21st century” is immense. Not only the data itself is relevant in its processing. They can also assist in developing advanced AI and deep learning models, where they can determine a company’s success in the mid-term rather than the long-term. However, this data, usually personal data or the disclosure thereof, makes it possible to identify an individual, which is an invasion of privacy. But even correctly collected and managed data can, in turn, cause harm in the hands of the wrong people.
Synthetic Data Is A Possible Solution
But how can a company make the added value of all data usable without endangering the relationship of trust and the privacy of customers on the one hand and without fearing severe penalties under the GDPR on the other? One possible solution is synthetic data. These are generated from the sensitive data set. Artificially generated values are added to these, and links that allow conclusions to be drawn about individuals are irreversibly broken. However, the data’s basic structure and statistical properties are preserved so that this new data can be used the same way as the original data set.
It is not for nothing that Gartner sees an enormous relevance of the technology: The company predicts that in two years, around 60 percent of all data companies work with will be of synthetic origin (source). MIT also speaks of breakthrough technology for the year 2022 (start). Large companies like American Express are already using the technology (source).
Synthetic Data: How To Implement It
Even though the technology is complex and novel, when transforming data into synthetic data, companies should be able to integrate it into their data processing processes as efficiently as possible. This is necessary and useful because the integration must occur parallel to day-to-day business. Employees must familiarize themselves with the new component quickly despite their regular tasks. The simpler the integration is technically feasible, the less additional effort is on the shoulders of the employees.
For this reason, an on-premise solution that can be integrated into the local systems or the company cloud is a good idea. As part of a regional system, on-prem solutions can quickly meet all data protection requirements. While the data is safe as a result, the key itself is simple and easy to understand – which pleases the data protection officers of every company. The challenge is that personal data is so sensitive that it is almost impossible to use.
Personal Data In A Secure System
Anything other than on-premise would therefore be complicated to implement. By having an on-premise solution, companies can ensure they keep personal data in a highly secure system and facilitate approval by DPOs because it’s the software that gets delivered, not the data that goes out. Once implemented, the technology can then revise, alienate and “detach” existing datasets from their original to the extent that the new datasets are just as usable as accurate data – but without privacy concerns. Synthetic data can thus become a matter of course in companies.