Blog

Democratising Data Science: A framework for data-driven storytelling

In this blog we interview Tomas Petricek, Visiting Researcher at The Alan Turing Institute and creator of The Gamma project, a new tool for aggregating and visualising data.

What does data science mean to you? 

Data science is such a new field that everyone still has their own definition of what it really is. For me, data science is about learning useful things from data. Sometimes, this can be hard computational work, but – I think – more often, it is about being able to find the right data and turn it into a visualisation that clearly tells a story about the data.
Programming is the tool for that, but I believe there is more to programming. Just like many philosophers say that “language limits our thought”, I think that programming tools and programming languages limit what we can do with data.

Tomas Petricek Visiting Researcher at The Alan Turing Institute

Tomas Petricek working at The Alan Turing Institute

In your work on data-driven storytelling, you talk about democratisation of data science. What does this mean and how could it affect people’s lives?

The UK government is extremely good at making data available through the Open Government Data initiative, but many people are not able to benefit from it, because learning something from this data requires expert programming skills. The idea behind “democratising data science” is to give everyone the tools and skills to use data.
Going towards this goal, I recently built an interactive web page that lets you explore data on Olympic medals. This is a neat small example to illustrate the idea, but we are quite far from doing something like this easily for all public data that government collects about the country.

Figure 1. The visualisation shows Olympic medals awarded for wrestling over the history of Olympic games. As a user, you can change the discipline and see medals awarded for cycling. See this visualisation live at The Gamma.

The expression ‘post-facts’ has become common over the past year. What is your interpretation of the ‘post-facts’ concept in relation to your work in supporting data-driven storytelling in journalism? 

The way I understand it, “post-fact” means that a message which is somehow interesting or exciting spreads more than a message which is less sensational, regardless whether it is factual or not.

I don’t think we can exactly “counteract” the trend, but I think that we can make factual messages a lot more interesting. If you post a carefully fact-checked message on Facebook today, it will look the same as someone sharing a fake news article, except that the fake news article will probably have a catchier title.

But what if a story based on data allowed anyone to interact with it in ways that are not possible when the backing data is not there? With the backing data, we can automatically provide multiple different perspectives or adapt the message to your local context – it can be tailored for your city or for people working in similar jobs, so that it tells a story that matters to you. I think people would again become more interested in factual claims, just because they would be more fun!

Figure 2. This example shows how you can use The Gamma to create your own visualisations. Here, we create a chart showing the top 8 teams from Rio 2016. You can explore this visualisation in The Gamma gallery.

You encourage journalists, alongside interested citizens, to use the Gamma.net to understand how claims are justified, explore data on their own and make their own transparent factual claims. In ten years’ time, what would newspapers look like if data driven storytelling was a widespread practice? 

To be clear, I do not think that newspapers should be just about data and I also do not want to replace journalists with nose for a good story and understanding of the human context with some sort of magic artificial intelligence black box! What I imagine is tools that make it easier for journalists to use data – when they find that data is relevant for the story – in an easier, more transparent and more engaging way.
Let’s say that we are reading about the recently announced budget for 2027/2028. It predicts 1.6% growth – this does not tell me much because I’m not an economist, but because the article could be linked to the data source, it could automatically tell me that average predicted growth in the EU is 1.8% and that 1.6% for 2027/2028 is more than 1.4% for 2026/2027. If you then want to share it on Facebook, the app might figure out that you have family in Costa Rica and friends in the USA, so it will automatically adapt the article to compare the UK, Costa Rica and the USA.

The budget might say that all spending on fighting air pollution have been cut, because the situation has improved over the last 10 years and there is no longer need for it. The article might give you a map of the UK confirming this. This might sound suspicious, because you feel that last week the air quality in your local area in Camden was quite bad. Because the map is linked to the data, you will be able to zoom-in on Camden and change the date and, indeed, you’ll find out that the level of pollution was above the limits. You will be able to share this in a comment (with all the supporting evidence), or even send it to your MP and make sure this gets appropriately discussed.

What is your next project at the Institute?

We have a very exciting internship project happening over the summer in collaboration with The Bureau of Investigative Journalism, where we want to turn some of the research ideas about more transparent data-driven storytelling into actual newsworthy reports. Interns at the Institute will be using real data from journalistic sources to visualise complex relationships – for example, can you guess which factors contribute to air pollution?

As a Visiting Researcher at the Turing, what have you valued the most of your experience at the Institute?

The mission of The Alan Turing Institute is pretty much perfectly aligned with what I care about – doing research in data science that has both academic value, but also practical impact on how we live in the modern world.

On a more personal side, I also found a number of great collaborators who care about similar things. I got to work with James Geddes, who was previously involved with the 2050 Energy Calculator, which is a great early example of the kind of open and transparent data-driven storytelling that I would like to become more common.

I also enjoy the interdisciplinary collaborations at the Turing – for our internship project, for example, our supervisor team includes Brooks Paige who is an expert on probabilistic programming and Maria Wolters who adds an extremely important human-computer interaction perspective to the project. I think this is exactly the mix that innovative work on data science needs and I wish there were more places like this!

You can find more information about Tomas’ work on the his Turing research project webpage.

The aim of the Visiting Researcher Programme is to generate collaborations, facilitate knowledge exchange, and explore new or emerging research topics in data science. We run regular calls for Visiting Researchers, please sign up to our newsletter for more updates.