Big data in surveillance: great care needed

From the self-scanning checkout at the supermarket to checking your tax return, algorithms are at work behind the scenes everywhere. The more data you feed to such a series of mathematical instructions, the smarter it gets. And that's where the shoe begins to wiggle for regulators, because how do you keep a grip on the use of that data? Professor Lisette van der Hel, professor of effectiveness of government supervision at Nyenrode Business University, and Nart Wielaard, expert in big data, identify pitfalls and points of interest.

October 11, 2019

Article

authors: Hel, Lisette van der & Wielaard, Nart

'Data technology is still in its infancy among Dutch regulators. They would like to make more use of the many new possibilities, but that cannot be unlimited,' says Lisette van der Hel. 'If a customer of commercial companies like Bol.com and Coolblue finds the suggestion of 'Others also bought' irritating, there is nothing wrong. But regulators cannot shoot with hail and must be particularly careful in how they use data.'
The use of data and algorithms can take supervision to a higher level, thinks Nart Wielaard. 'Provided you ensure that both the data you put in and the algorithms that work with it are sound. In practice, this is still quite a challenge. But it can be done: we often think that algorithms are a black box. But so are people's heads. And we can't look inside them. But we can make the operation of algorithms transparent.

Van der Hel questions this. 'Algorithms can also be rather opaque. At a certain point, people can no longer follow the technology entirely and the question arises as to how the results came about. For a regulator it is important to be able to substantiate choices, for example in court. Errors can also creep into the models used, causing citizens or companies to be wrongly selected for supervision or not. Can an algorithm take sufficient account of citizens' personal circumstances? This is an important question for supervisors whose strategy centers on influencing behavior. Think of the Tax and Customs Administration or the Dutch Food and Consumer Product Safety Authority (NVWA). In principle, an algorithm can on the one hand make risk selection more efficient while on the other hand it can provide better personal service.'

Validate

Humans pose a great risk of errors in data use, Wielaard emphasizes. 'We make the systems. A fool with a tool, is still a fool. A data scientist can only do his job well if, in close contact with the client, he learns to properly understand which conclusions from data are and are not relevant.' Models learn based on past data. 'When data are not used properly, it can lead to wrong predictions,' Van der Hel adds. 'Suppose that in the past the Inland Revenue carried out many checks on donations to charities. The subject of donations then appears disproportionately often in the data collected. A model might conclude, based on that data, that the Tax Administration should pay a lot of attention to donations, while there are other subjects for which you have collected less data that might be riskier. Data quality and model validation are therefore essential in supervision.

Just as important as quality is the context in which the data were collected. It has a great influence on its interpretation. This is where things often go wrong, according to Wielaard. 'A study of accidents involving boats showed that people wearing life jackets were more likely to drown. That seems very strange at first. But the conclusion that a life jacket is not safer is, of course, incorrect. Because people often don't put on a life jacket until conditions are already very bad. Seen this way, the data do make sense.'

Drones

Van der Hel endorses the importance of context for supervision. She gives an example. 'The Inland Revenue used existing CCTV footage from highway cameras for an investigation into private use of the company car. That was not allowed by the judge. Those highway cameras are meant to measure speed. The Supreme Court therefore ruled that this was a violation of the private life of company drivers, without an adequate legal basis. Thus, a regulator may not simply use data for a purpose other than that for which it is collected. An example in which new technology does work well is the NVWA's use of drones to check professional fishermen.

Privacy quickly comes into play when you start collecting data in combination with the use of algorithms, Van der Hel stresses. 'That requires care. It is therefore important that supervisors at the central level set up their processes in a well-considered way. For example, they guarantee data quality and the use of algorithms; are they programmed correctly, are privacy regulations observed, are they not discriminatory. Clear communication about this is very important. There is also a task for the government to ensure through legislation that supervisors deal properly with the many possibilities of data technology.'

This article can also be found in the Big Data dossier

source: Toezine