Understanding human behaviour with online data

Investigating whether large online datasets from sources such as Google, Twitter, and photo sharing sites can help measure and predict human behaviour

Introduction

Our daily interactions with the Internet generate huge amounts of data. This stream of research at the Turing investigates whether these volumes of data can help us gain new insights into human behaviour, or even inform predictions of future events. Applications span a wide range of domains, including linking online behaviour to stock market moves, measuring disease spread, estimating crowd sizes, and evaluating whether the beauty of the environment we live in might affect our health.

Explaining the science

Making good decisions in policy and business depends on having the best possible understanding of what’s happening in our society right now, as well as what might happen next. For example, up-to-date measurements of the current economic wellbeing of society could inform timely interventions to support economic growth. Similarly, rapid measurements and predictions of disease spread could help authorities take action to reduce the impact of a potential epidemic.

However, gathering such measurements of human behaviour at large scale has traditionally been time-consuming and expensive. As a result, measurements are often delayed, such that the most recent data relates to what was happening weeks or months ago, rather than right now. In some cases, no measurements are available at all.

Society’s increasing use of online services such as Google and Twitter, as well as a range of photo-sharing sites such as Flickr and Instagram, is now generating vast volumes of data. People often post information to these services relating to activities they are carrying out right now, or search for information relating to plans they have for the future. Data on what people are posting or searching for is often available at low cost and high speed. These datasets might therefore offer the opportunity to gain rapid, cheap measurements of what is happening in society at the moment, as well as improved insight into what might happen next.

Project aims

Turing Fellows Professor Suzy Moat and Professor Tobias Preis and their team have developed a widely recognised research programme on measuring and predicting human behaviour from publicly available data sources. Their work is focused on three main strands:

This social data science research programme, which brings together disciplines as diverse as psychology, physics, computer science and finance, is centred at the Data Science Lab at Warwick Business School. The work aims to investigate whether data from Google, Wikipedia, Instagram and other online services can provide new insights into human behaviour and better forecast future real world actions.

The team works with a range of stakeholders who are interested in the opportunities that such data sources can offer, including governmental partners and health authorities. See applications below for specific examples of where this work is being applied.

Applications


Reducing cost and delays in measuring human behaviour

Stadium crowd

Google and the flu – Analyses by the team have provided evidence that Google search data can be used to improve estimates of the number of people suffering from the flu, as long as the model is configured to adapt across time so that changes in people’s search behaviour can be accounted for. Application of the model developed reduced the errors seen in a model using Centers for Disease Control and Prevention (CDC) data by up to 53%.

Twitter and crowd size – The team analysed Twitter and mobile phone data from Milan, Italy, and found that accurate estimates could be made of attendance numbers at the San Siro football stadium. Use of such data would potentially facilitate more timely estimates of crowd size in a range of emergency situations, such as evacuations and crowd disasters.

 


Predicting human behaviour

Financial news

Wikipedia and stock markets – Historic analysis shows increases in views of financially related Wikipedia before stock market falls of the Dow Jones Industrial Average. Online data may allow us to gain a new understanding of the early stages of decision making, giving us an insight into how people gather information before they decide to take action in the real world.

Google and market crashes – Using an automated method applied to data between 2004 and 2012, the team provided evidence that increases in Google searches for financially and politically related terms historically preceded falls in the stock market. Similar approaches could be applied to help identify warning signs in search data before a range of real world events.

 


Measuring human behaviour and experience that was previously difficult to measure

Steam train going over a viaduct

Scenic environments and health – Analysis of over 1.5 million ratings of over 200,000 photos taken across Great Britain allowed the team to provide quantitative evidence that people feel healthier when they live in a more scenic area. This link holds across urban, suburban, and rural areas of England, and is not explained by the income of local residents, or how green an area is.

Identifying scenic locations with AI – The team developed a neural network to automatically rate photos for scenic beauty. When presented with pictures of London, the network rated images of Hampstead Heath, Big Ben and the Tower of London as in the top 5% of London’s scenic locations. The research also provided evidence that bridges, historical architecture and trees increase the perceived beauty of a scene, but large areas of grass, despite being green and natural, are not necessarily linked to such boosts.

Recent updates

The team’s work has received considerable press coverage, including the following pieces:

Collaborators

Researchers and collaborators

Contact info

For more information, please contact The Alan Turing Institute

[email protected]