Menu

Filter by
content
PONT Data&Privacy

0

AVG is an opportunity! To clean out the attic

One of the key principles of the General Data Protection Regulation (AVG) is data minimization.(1) Personal data must not be kept longer than is strictly necessary to accomplish the purpose for which it was collected.(2) That's quite a lot. Look at your own attic or garage: throwing stuff away is not in our nature.

25 November 2019

"Who knows, maybe I'll need it again" and "cleaning up is boring." You can already hear yourself and your children saying it. The same is true for businesses and certainly for IT professionals. Companies are focused on being able to account for things. For reports, an audit or for a lawsuit. With systems, it's no different. IT people are focused on being able to reproduce a situation, and especially not lose any data.

So the AVG calls for an opposite movement: systematically discarding data. Throwing away is one thing, doing it in a systematic way is another. The latter is vital, because throwing away too much personal data qualifies as a "data breach"; also an offense under the AVG.

Designing a "system" for discarding data consists of three steps:

  • Retention periods. Determine how long, what personal data should be kept.

  • Data inventory. Inventory what personal data is kept where and for what purpose.

  • IT and processes. Implement functionality, tools, processes and procedures to erase or anonymize personal data in a timely manner.

Step 1: Retention periods

No longer than strictly necessary. That is the criterion. But what does this mean concretely? There are a few basic principles that can help in drawing up a so-called "retention schedule" (see Figure 1). The starting point is that the necessity criterion can largely be traced back to the legal basis for collecting data. The main bases are:

  • meeting legal obligations,

  • entering into and performing a contract, and

  • consent of the data subject (employee or customer).

Figure 1: sample retention schedule

Meeting legal obligations
By setting the retention period for certain categories of data to the legal minimum or maximum retention period (for example, the seven-year tax retention period for financial data), an important part of the retention schedule can be completed.

Entering and performing agreement
The retention period is related to the role of the data subject (the natural person whose information is processed). As a rule, every company deals with two categories of data subjects: employees and customers. Customers come in two forms: consumers and companies. The latter does not involve personal information, unless the legal form is a sole proprietorship.

Employees and customers can again be further subdivided depending on their role. Different retention periods apply to the personal data of job applicants, employees and former employees. The same applies to prospective clients, customers and former customers. As long as there is an employment or service agreement (c.q. subscription or membership), there are grounds for retaining (almost all) data, such as old orders, mutations in customer data, etc. After all, it is also in the customer's interest to have a total 'customer picture'.

Data subject's consent
As long as you have the data subject's consent, you may process and thus retain the data. This is and remains a shaky basis. After all, consent can also be withdrawn and then the basis for processing falls away. In addition, under the AVG, data subjects have a number of rights (such as the right to oblivion) that can shorten the retention period.

Step 2: Data inventory

What's in the attic? This is a monstrous task. Especially given the variety of data carriers. These can be classified along the following two dimensions (see Figure 2):

  • Structured versus unstructured.

  • Digital versus physical.

Figure 2: Types of data carriers containing personal data

Digitally structured
In the upper left quadrant are applications that support core business processes: front- mid- and back-office systems, CRM, output management and more. Most of these applications have a database, which stores data in structured, searchable records. Of concern is the presence of user databases or log files containing personal data. Another concern is data exchange between applications through "intermediate files" that are often permanently stored on the file share.

Digital unstructured
In the upper right quadrant are applications and systems with unstructured data: Document Management system, file share, intranet and email. Side note here is that although the Document Management system contains unstructured data (scans, PDF, Word documents, etc.), it is indexed and thus searchable. Unstructured data is difficult to clean. There is no magic button in Outlook or File Manager that will clean all emails or documents from client x.

Physically structured
In the lower left quadrant are the physical archives (in boxes or binders) with, if you are lucky, an index of the contents. Cleaning individual (customer) files in these archives is very laborious: periodically retrieving and going through all the boxes or binders, filtering and discarding what needs to go, putting back what needs to be retained, etc. A labor-intensive and nearly impossible task. It often turns out that boxes in an (external) archive do not have sufficient characteristics to retrieve data. This makes it impossible to meet retention deadlines.

Physically unstructured
In the lower right quadrant is an amalgam of documents in desk drawers, stacks in closets, etc. Perhaps the easiest category: do nothing with them or throw them away integrally.

Step 3: IT and processes

There are several ways to clean up the attic. Some are more systematic and structural than others. The approach below assumes a fundamental approach in which the cleaning or anonymization is automatic as much as possible.

The approach consists of the following components:

  • Development of cleaning or anonymization functionality.

  • Digitization and destruction of physical archives.

  • Reorganization and automatic scanning of the file share and intranet.

  • Automatic cleaning of emails containing personal data.

Developing cleaning or anonymization functionality
Developing solid functionality is easier said than done. Most organizations have multiple (core) applications that together form "chains" in which (personal) data are processed.

Applications must be considered in context. It makes no sense to anonymize personal data if it is reset overnight thanks to synchronization with another application. Or if anonymization results in a failure because the data no longer meet the validation rules set by the technical interface with another application.

When developing cleaning or anonymization functionality, three concerns are important:

  • Data selection verification and validation.

  • Role and/or status-dependent retention periods.

  • Uniform anonymization strategy.

Data selection verification and validation
Way is Way! De-identification of personal data is irreversible. This is quite exciting. Especially if it involves business-critical customer data. How do you know if you are not throwing away too much (or too little)? The answer to this question is a procedure with checks and balances. Records to be de-identified are automatically selected. Selection functionality is included in the regression test of future releases.

The selection is checked manually and visually according to the four-eye principle, where records can also be excluded (e.g. in case of a legal dispute). This check is done using a plausibility report that indicates how many records should normally be cleaned. This report comes about by another route and "triangulates" the outcome.

Role or status-dependent retention periods
The retention period of personal data in many cases depends on the role of the data subject. Or rather, depending on when the role of the data subject has changed (e.g., when an employee left employment).

This means that applications that contain personal data, but do not know when the role change occurred and are not synchronized with an application that does "know," cannot independently clean or anonymize this data. These applications should receive a signal from the core system that does know the status.

Uniform anonymization strategy
Discarding or anonymizing relationship records has the same effect: data are irreversibly de-identified. In general, anonymization is preferable. Anonymized data can still be used for analysis purposes and deletion of relationship records can lead to a corrupt database.

However, anonymization only works if all applications use the same strategy. If one application retains zip code and the other retains date of birth, the individual can still be identified by linking the two records together via a key (e.g., relationship or contract number). This is essentially pseudonymization, rather than anonymization.

Thus, the choice of a unified anonymization strategy is important. It is recommended to choose the strategy in such a way that it can also be used to pseudonymize development and test environments. IT employees will then no longer have insight into personal data (another requirement under the AVG).

Digitization and destruction of physical archives
It sounds contradictory: in order to be able to clean personal data, they must first be digitized and centralized. Digitization is necessary to automatically dispose of documents and centralization is important to efficiently set up the cleansing based on a signal function from the back office or CRM application.

Digitization and centralization require the implementation of a Document Management system combined with a "scanning street. This is a project in itself and it would go too far to describe here what is involved. What is certain is that there are a lot of choices to be made. Scan current physical archives or let them 'walk out'? Shielding 'dead archives' via Compliance Manager? All choices that must be carefully made and documented.

Digitization of work processes and archives is the big opportunity presented by the AVG. Frankly, there is no other solution either. As already mentioned, manual cleaning of physical archives is almost impossible. This perhaps with the exception of financial archives whose retention period does not depend on the role or status of the person involved but on the age of the data.

Reorganization and automatic scanning of the file share and intranet
The file share and intranet are - almost inevitably - hotbeds of all kinds of files containing personal data. Ranging from hand letters, mailing files, payment runs to all kinds of reports. The solution consists of a combination of migrating correspondence to the Document Management system (e.g., handbills and mailings) and reorganizing other files into annual folders (e.g., payment runs, reporting files). The latter can be manually cleaned when they reach a certain age.

A useful tool for reorganizing the file share and intranet is a scanning tool that continuously scans these systems for files containing personal data. The tool searches files for "strings" that match a certain pattern (e.g., BSN, IBAN, zip code, etc.). Such tools can then provide directory owners with periodic reports with the 'hits' found. In this way, the cleaning is secured in the organization.

Automatic cleaning of emails containing personal data
For many employees, email is more than a communication tool. It is also a personal archive and knowledge system. Automatically throwing away all emails older than X years feels like a drain. However, manually cleaning thousands of emails is also not a solution. It's labor intensive, unverifiable and therefore not future-proof. So what then?

To begin with, it is important to archive all email correspondence with customers in the Document Management system. In addition, it is possible to implement an email solution that - similar to the scan tool for the file share and intranet - recognizes emails containing personal data and stores them in a separate repository. This repository can then be periodically cleaned. The advantage is also that emails containing personal data are sent securely (thus preventing data breaches).

Conclusion

Of all the challenges posed by the AVG, cleaning is (by far) the biggest. It's a lot of work and there is no "magic bullet. The trick is to do it in such a way that the attic stays tidy even after the implementation project ends.

Herein also lies the great advantage. Addressing the problem in depth and fundamentally forces the organization to digitize and optimize processes. Simplify! Simplify! Get rid of duplicate storage of correspondence on the file share and in the Document Management system. Gone are the back-and-forth mailing of files containing personal data. No more loose binders and piles of paper in cabinets and drawer units....

The understanding of information housekeeping and system architecture required for this is of great value beyond the AVG. It can serve as a springboard for innovation, efficiency gains and improved customer service.

Footnotes

(1) See Article 5(1)(c) AVG.
(2) See Article 5(1)(e) AVG.

Share article

Comments

Leave a comment

You must be logged in to post a comment.