Menu

Filter by
content
PONT Data&Privacy

0

Prof. Piet Daas: 'Harnessing opportunities of big data for official statistics'

Big data can help produce reliable and timely statistics, CBS' core business. Prof. Piet Daas, professor at Eindhoven University of Technology (TU/e) and senior researcher at CBS, is unbridledly enthusiastic about the possibilities and ideas arising from cooperation within CBS' Big Data research program. "We can do much more with big data than we are doing now," he argued during his oration on May 26 of this year at TU/e.

CBS June 22, 2023

News press release

News press release

New thinking

Every hour of every day, huge amounts of data are produced by electronic devices such as phones but also online by people themselves: big data. Upon his appointment in 2019 as associate professor of Big Data in Official Statistics at TU/e, Daas already stated that big data offers enormous opportunities for official statistics, provided some fundamental questions can be answered. Official statistics once started with simply counting people, jobs, companies et cetera. Then came sampling, where the principle became: collect only the data you really need. 'When using big data, that approach is different,' Daas explains. 'To turn big data into official statistics, you actually need as much data as possible.' It is an example of a point where the 'old thinking' and the 'new thinking' clash. Also, it is part of the fundamental questions Daas deals with every day.

44 Pilots and prototypes

In traditional statistics, you work from a very firmly established theory. With big data-based statistics, it's the other way around: you start with the data, which you have in abundance. But there is often no theory (yet). In his oration, Daas mentions the 44 pilots and prototypes that have been developed in recent years by various organizations, but especially by CBS. Six of those projects have now gone into production. 'By no means everything we try turns out to be suitable or stable enough. For consumer price statistics, you can easily get information from the Internet about the development of the price of a carton of milk over a given period. That is direct observation; it is almost problem-free. But with derived forms of observation, it is more complicated to arrive at a stable production method.'

Penguins

Derived observation can go a long way in big data. To explain how that works, Daas uses the example of a study of penguins in his oration. "If you want to map how many penguin colonies live in Antarctica, you can go on an expedition and count them. But you can also use satellite photos and zoom in on the poop the colonies leave behind. This indirect way of observing turns out to be faster and more accurate.' CBS, meanwhile, is doing something similar: for a study of online platforms, company websites are automatically searched for words that indicate such a service. 'So, based on big data - the Internet - we indirectly make a selection of companies that we think are online platforms. Then those companies receive a questionnaire to confirm our suspicion.' This approach is successful but also brings a challenge: the relationship between words on the website and the types of companies can change. "So we need a method to check this and correct it if necessary.

Examples as inspiration

In his oration, Daas addresses, among other things, the quality of big data. 'The wrong answers of ChatGPT show that big data sources do not always provide reliable information - combining data sources and data science and the search for methods needed to use big data properly for statistics. Almost everything about using big data is still new. There is no handbook; we all still have to learn. That's why my oration contains so many examples. The theory-driven world of the university and the data-driven world of CBS come together nicely in my chair. But my oration is also an invitation to others: pick this up, get started and use the examples I give as inspiration. There is still much to discover.'

Proud of progress

That the use of big data raises many questions is interesting, according to Daas, but only if there are also answers. That, fortunately, is happening. Meanwhile, a method has been developed to correct for unstable perception in statistics about online platforms. 'There we have built in enough control by combining the big data source with a traditional way of collecting data and developed a model that can make the necessary corrections. That's a big step forward because that model is widely applicable. I'm very proud of that. The use of satellite photos as a big data source has also proven applicable for official statistics, and its use can still be expanded. Daas is now thinking about cause of death statistics: 'That is a statistic that involves a lot of manual work, because it is often difficult to arrive at the correct results in a fully automated way. Datascience could contribute there by meticulously analyzing the words used on the forms doctors fill out.' Progress is also being made in combining big data with other data sources. That will make the use of big data much more widely applicable in the future.

Moving forward together in the data world

Daas holds his oration before a room full of family, including his parents, friends and colleagues from both CBS and TU/e. 'This subject is so broad that you can't do it alone. I do this together with my colleagues in the Big Data research program. We often start blank, from a question from a CBS department about the use of data science. Together we come up with many different ideas, precisely because our group is so mixed, with mathematicians, psychologists and data scientists. That is great and necessary to move forward in this data world.

Share article

Comments

Leave a comment

You must be logged in to post a comment.