When setting retention periods for applications and databases, the focus is often on content data. However, logging data is also personal data; after all, it records user actions. Many organizations are unaware of the nature or existence of this data. The assurance of privacy and the retention period are then at issue. In this article, I highlight a number of concerns from a privacy perspective, including the retention period of log data.
In computer systems, events that are important for security or disruption analysis are recorded in a log file. The information recorded for such an event includes the date and time of the event, user name and type of event. For an event recorded in a log file, one should think of whether or not a user successfully logs into an application, error messages generated by the system, or the recording of a user's action, such as accessing a medical record.
The Personal Data Authority (AP) regards logging as an important way to check whether users are going beyond their means when accessing personal data. We use the example of the HagaZiekenhuis here, but this applies to all organizations where sensitive personal data are processed, such as banks, web stores, municipalities, educational and healthcare institutions.
In the AP's motivation of the 2019 fine for the HagaZiekenhuis, logging played an important role. For practical reasons, many employees had access to patient data. The Electronic Patient Record provided for logging access to medical records. In April 2018, a whistleblower reported that medical records of a Dutch celebrity were circulating in the corridors of the hospital. Using the log data, it was possible to determine which employees had accessed the records and in which cases it was unauthorized.
The AP expects organizations not only to log access to data, but also to regularly analyze the log data so that wrongdoing comes to light quickly and not only after a whistleblower has reported it. A sample of six patient records annually, as at the HagaZiekenhuis, is not enough in this regard. An "intelligent" analysis is needed, selecting for situations that deviate from the desired use of the system. This could include, for example, files that are accessed by more employees than usual. In this way, the incident at the HagaZiekenhuis might have been detected even without a whistleblower.
Not in all cases will an alert involve unauthorized access, but discussing salient situations with employees also contributes to awareness.
Log data is also personal data. The people whose actions are logged also have a right to privacy. Therefore, it is important to ensure that log data cannot be viewed just like that. This can be done by limiting access to the log data to those people who need to analyze the data. In addition, automated reports can be used to analyze the log data, showing only the information necessary for verification. These reports then contain no or less information that can be traced back to individuals or only in the case of salient situations. Furthermore, it is important not to keep log data longer than necessary. I discuss this further in the remainder of this article.
In information security, log data is used to identify (attempted) breaches in information systems and also to be able to analyze afterwards what happened. In the investigation of the successful ransomware attack on Maastricht University in late 2019, log data were an important source of information. This includes technical events that are not the direct result of actions by users.
Personal data should not be kept longer than necessary for the purpose. This also applies to log data. This is at odds with the importance of information security, where one would prefer to keep log data as long as possible. And that while information security is also an important safeguard of privacy.
I explain this in more detail. From a privacy perspective, the data could be deleted after the periodic, say monthly, analysis. From an information security perspective, it is important to keep log data for as long as possible. This was evident in the ransomware attack on Maastricht University. Afterwards, it turned out that the hackers had been active on the Maastricht University network for several months. This could only be determined because log data had been saved. According to an IBM research report, it takes an average of 206 days, or nearly seven months, for a data breach to be noticed. And that's only talking about data breaches that have been detected, and it's an average. At a U.S. health insurer, it took nine years for the data breach to be discovered. Against that background, a 10-year retention period for log data is also plausible.
In practice, the retention period of log data is often not considered. In the past, use of storage capacity was a reason to regularly delete log data, but today storage is cheap and invisible (because in the cloud).
The AP does not specify how long log data may be kept. Each organization will have to make its own consideration. What starting points do we have for a balanced retention period for log data? I'll mention a few:
According to Article 33 paragraph 5 of the WBP Exemption Decree (now no longer valid), the storage of log data must be reported if it is kept longer than six months.
According to the Basel II Accord, which applies to banks, log data must be kept for three to seven years.
The Government Information Security Baseline (BIO) only mentions a three-year period in the event of an incident. The BIO's predecessor for municipalities, the BIG, assumes a minimum of three months.
The Information Security Service (IBD) for municipalities makes the retention period for logging dependent on the confidentiality classification of the system in question, from six months for business confidential information to seven years for classified information.
The Minister of Health, Welfare and Sport has established by ministerial decree that log data from healthcare records must be kept for at least five years.
The default setting for log file retention is one year for Office 365.
The default setting for log file retention is six months for Google's GSuite office automation environment and one to 13 months for IaaS cloud services, depending on the type of log file.
These leads show that it may make sense not to establish just one retention period, but to make it dependent on the type of log data. In general, six months can be seen as a minimum. If we want to keep log data longer than the detection time of most data breaches, we are at one year. This is a term we encounter a lot in practice as well. Longer retention periods require a more detailed justification, with a careful balancing of security needs and user privacy. That consideration will include the sensitivity of the data and the need for longer-term forensic investigations. In addition, the long-term storage of log data must be proportionate to the efforts made to detect a data breach as quickly as possible. This can be done through the analysis of logging: not only monthly, but also daily or real-time. There are dedicated Security Operation Centers (SOCs) for this purpose, which monitor systems 24 hours a day, 365 days a year, and intervene when necessary.
This article focuses on the dilemma of retention periods. These are part of the logging policy. From a privacy perspective, the following additional points of interest are important for this:
How is logging set up: what events and what information are logged? Here it is important to establish in outline per event type why logging is necessary and how it is evaluated. Logging user activities without evaluating them will generally not be proportional.
How is logging evaluated privacy-friendly? For periodic logging, it is not desirable to analyze the behavior of individual employees or for a log analyst to browse through the logging looking for anomalous behavior. It is more privacy-friendly to create reports that filter undesirable behavior so that only the names of employees with this behavior come up. Conversations can then be held with them.
However, when fraud, for example, is suspected, it may be justified to analyze the activities of an individual user. However, this would require appropriate safeguards.
The entire logging policy, including the above points, constitutes a regulation regarding the processing of employees' personal data. This means that the Works Council has the right of consent on this according to Article 27 k,l of the Works Councils Act.
In summary, it is important to examine where logging data is recorded, how the logging is set up, what is done with the logging data and how long it is kept. Then make deliberate choices and document them in policies, coordinated with the Works Council.
Logging is a complex subject because technology and privacy laws and regulations come together here. It is therefore important to have the right expertise at the table when developing and implementing logging policies.