Menu

Filter by
content
PONT Data&Privacy

0

The risks of anonymizing

Of the techniques of pseudonymization and anonymization, anonymization is often seen as the best way to protect personal data. After all, by anonymizing, the data become unidentifiable and are no longer personal data. However, this is not always the case: even with anonymization, so-called residual risks remain. These residual risks mean that nowadays the use of synthetic data is increasingly chosen.

February 15, 2024

Pseudonymization, synthetic data and anonymization

In pseudonymization, directly identifying characteristics are replaced by artificial identifiers, also called pseudonyms. These pseudonyms can only be linked to true identity by using additional information, which is usually stored and secured separately. Pseudonymized data are better protected than directly identifiable personal data. But they remain traceable. Therefore, the AVG simply applies to this data

Synthetic data are new data that, while reflecting the characteristics and patterns of real data, do not come directly from individuals. Analyses can be performed and models can be trained as if with personal data, without using real personal data.

In anonymization, the characteristics of personal data are removed or altered such that it is no longer possible to trace them back to a specific individual. All directly identifying characteristics such as names, addresses, and identification numbers are removed or altered. After anonymization, the anonymized set of data can no longer be traced to a specific individual, even by the organization managing the data. The data are not personal data; the AVG does not apply.

Residual risks when anonymizing

In practice, it is often thought that anonymized personal data is fully protected. But even anonymized personal data can still be at risk: we call these residual risks. When starting a process of anonymizing, it is good to identify these residual risks and see if they can be prevented. The residual risks are specific to each case but in general terms the following four residual risks can occur.

The biggest risk is that of re-identification. Malicious parties may try to de-anonymize the anonymized data. The identity will then still be retrieved. Using advanced techniques or by combining with other data sets, the anonymized data can be made traceable again to personal data.

The second residual risk is that of de-anonymization through contextual clues. The directly reducible characteristics are removed from the dataset, but through the use of so-called contextual clues, the data can still be traced back to an individual. For example, by using combinations of demographic data or location data, individuals can still be identified.

Another common risk is that of "persistence." If the data is anonymized, there is still the possibility that it is still present in the organization not anonymized. For example, the non-anonymized data are still on backup systems, in log files, or in other locations that the user or administrator may not be fully aware of. They can then still be recovered. This term also includes the risk that the data are currently well anonymized but that future techniques can make the data traceable again.

A final risk is that even if the data are anonymized and the identifiable data removed, traces or indications of this data may still remain in systems, databases, log files, or elsewhere. These so-called residuals could potentially be used to obtain information about individuals or their activities.

Take measures to control or eliminate residual risks

It is therefore important to be aware that even with anonymization there may be residual risks. Map these well before and during the anonymization process and take measures for them. It helps to use the most advanced, up-to-date anonymization techniques. But even these should be assessed periodically: do they still offer sufficient protection or have they been overtaken by new technological developments.

If anonymization is performed by the organization itself, it is important to check for the persistence and residual risk described above. Has the data not been "left behind" somewhere, or are there any traces left behind that would still allow identification? If the data have been anonymized for a particular use, such as research, and the data are still present in the organization in traceable form, strict access authorization and control is necessary.

In short, anonymization gives greater protection for personal data but it is not foolproof.

Want to know more about pseudonymizing and anonymizing personal data? Check out the course: Pseudonymizing and Anonymizing Personal Data.

Share article

Comments

Leave a comment

You must be logged in to post a comment.