Big Data: the end of causality and the beginning of correlation | Pirelli
< Back To Home
PIRELLI.COM / WORLD

Big Data: the end of causality and the beginning of correlation

To Viktor Mayer-Schönberger, Big Data is not a just a new technology, but, in his own words, “a shift in mindset”: humans, he says, are gradually shifting from thinking in terms of causality to thinking in terms of correlation, from analysing small amounts of ordered data to processing huge amounts of messy data. A professor of Internet Governance and Regulation at Oxford University, in 2013 Mayer-Schönberger published together with Kenneth Neil Cukier, the Data Editor of The Economist magazine, the visionary book Big Data: A Revolution That Will Transform How We Live, Work, and Think, a bestseller both according to the New York Times' and the Wall Street Journal's lists. Since then, he tells Pirelli World, many things have changed and the role of Big Data has been proved even more important, especially for business and services.

Two years have passed since you published your book. Has since then the impact of Big Data increased further?
Two years ago, not many people had heard about Big Data. That certainly has changed. But people are even more uncertain what Big Data actually is. Countless companies, consultants and pundits have hyped technical tools and promised Big Data heaven. Quite a number of them are out to make money fast, but to push for even more Big Data tools obscures the core of Big Data: not a new technology but a shift in mindset, appreciating and embracing a new perspective on reality that will yield innovative insights generating value for companies and society.

Which are the business and research fields that have been mostly affected by the rise of Big Data?
So far, Internet services have benefitted mainly from Big Data. Amazon's system of product recommendation is driven by Big Data analysis and is said to account for 30 percent of Amazon's revenues. That is massive. Similarly, Internet search (whether through Google or Bing), language translations (either written text, like for Google Translate, or spoken words, as in Skype Translate), and automatic correction of typing errors have been used by hundreds of millions of people. And we all have watched self-driving cars stop in time to avert accidents – that, too, is Big Data. But the really big areas of Big Data disruption are others: health care, education and learning, as well as mobility and transportation. These areas will change beyond recognition thanks to Big Data.

How is the rise of Big Data affecting urban life?    
Cites grow relatively slowly. Big Data moves much faster. So the impact of Big Data on shaping cities is not prominent yet. But it will be so in a few years: Big Data is already being used to predict public transport needs and thus drives public transport investments in major cities; it is used in law enforcement and policing, and it will soon arrive in schools and education. The hope is that Big Data in the urban context will lead to better public policy decisions that enable more people to reach their potentials and lead better lives. We are just at the beginning – but within five years changes will already be obvious. Just take road traffic and parking in cities: currently individual cars are only utilised 4 percent of the time. That is a tremendous waste of resources. If cars were self-driving, there would be much less need for people in cities to own their own cars. They could order one whenever they needed one, and unlike taxis such self-driving cars would be much cheaper to use, and could be utilised day and night, seven days a week. Some predict that this might decrease car traffic in cities by 30 percent and more, and reduce the need for parking space at least by the same amount. So cities could get greener, airier and more people-friendly as a result of Big Data changing urban transport and mobility.

What's the relationship between the rise of Big Data and the Internet of Things? 
The Internet of Things is an important infrastructure for Big Data. Because through the IoT we can gather data at an unprecedented level and detail. So the IoT provides Big Data with the inputs, the data resource it needs to “do its magic”. At the same token, the IoT is just a technology – a technical infrastructure. The actual value-add, the new innovative insight, does not come out of IoT, it comes out of Big Data, the analysis of the comprehensive data streams that we will be available for us.

You argue that the core idea of Big Data is substituting a large amount of messy data to a small amount of pristine data. Does it mean the very foundations of Statistics has been revolutionised?
We humans have always used data to understand the world. Even thousands of years ago we observed the world in the hope to comprehend it. But we also realised that collecting data and analysing them was hard, costly, and time-consuming. And so we came up with methods, processes and institutions of making sense of the world that were premised on having small amounts of data available for analysis. But what if that underlying premise of the expensiveness of using data changes? What if collecting and analysing data comprehensively and at scale gets much, much cheaper and easier? Then we have to rethink how we make sense of the world. That's what we are experiencing right now. It is not only that many of the tools developed for small data statistical analysis don't work well in the context of Big Data (and thus may need to be replaced), it is that we have to rethink how we make sense of the world: are we really using data only to answer questions we already have? What if the questions are wrong? Can we turn things around and use data to come up with better questions? What is the role of causality – and in light of more data are we willing to concede that many of the supposed “causes” of things are in fact little nothing more than intuitions based on statistical correlations? The goal, the hope is to improve human decision making by using data comprehensively and at scale to understand the world we live in.

You also argue that we are shifting from looking for the causes of thing to looking for mere correlation. This seems quite a big philosophical shift! Could you explain it?
We humans understand the world as sequences of causes and effects. That's how we make sense of it. But as Nobel laureate Daniel Kahneman has shown such “fast thinking” of identifying causes is often wrong. Actually identifying causes in contrast is hard and time-consuming. So in some cases we don't have the time to identify the cause before we have to act. In such situations, knowing “what” may enable us to make important decisions even if we don't yet know “why”. And knowing “what” also helps us focus costly causal analysis on the most promising statistical connections we identify. For instance, researchers using Big Data identified patterns in vital signs (heart rate, blood pressure etc) of prematurely born babies to predict likely future infections. That gives doctors many more hours to intervene and treat infections before they get out of control, even though researchers do not yet know exactly why such a pattern in vital signs is a good predictor for this future illness. But even without knowing “why” we can use knowing “what” to  save lives. Yes, it is a shift in mindset, and it emphasises pragmatism, but it improves decision making, and – in this case – even saves lives.

You have written a lot about the concept of “datafication”. We now see as “quantifiable” things that previously we thought they weren't, including friendship and, some argue, love. Are we de-humanising ourselves?
Datafication – that is rendering even more aspects of our existence in data form – is crucial for Big Data analysis. I don't see it as de-humanising. Yes, it may reveal that we humans are more predictable and less ingenious than we perhaps thought. But thinking that this reveals Big Data's dark side would be shooting the messenger for a message that is accurate, but we do not like to hear. Rather than accusing Big Data of de-humanising us, it would behove us to be more humble in understanding that humans are less special than we thought. Of course, Big Data has dark sides (and the biggest dark side is that Big Data results tempt us to see answers to “why” things are happening when the data only reveals “what” is happening – and that in fact may lead to a de-humanisation), but datafication is not one of them.