U-shaped emotional rollercoaster

Using data science to understand box office success and movie viewers’ emotions

Wednesday 25 Jul 2018

Filed under

Many people regard motion pictures to be an inherent part of their lifetime cultural journey. It does not matter what you call it – ‘film’, ‘movie’ or a ‘picture’— we all have favourites which we remember from our childhood or quote on regular basis. But why do some movies become an almost immediate success going viral around the globe while others are quickly forgotten? 

The entertainment industry is not only a multi-billion dollar market, it is also a great storytelling enterprise. The stories told help us connect with the characters, relive our own experiences and even escape our daily lives. Why is that? In a recent paper we have explored the connection between emotional journeys which we undergo watching movies and correlated these journeys with the success of these movies. Three main lessons have emerged from this research.

1. Emotions are the key

For many years, scriptwriters have grappled with the magic formula for success, but it has emerged only very recently that even the best plot in the world will not go far if the story around it fails to make emotional connection with the viewer. Indeed, we find that emotions can determine a film’s success. As award-winning scriptwriter Frank Cottrell-Boyce puts it:

‘All the manuals insist on a three-act structure. I think this is a useless model. It's static. All it really means is that your screenplay should have a beginning, middle and end. When you're shaping things, it's more useful to think about suspense. Suspense is the hidden energy that holds a story together. It connects two points and sends a charge between them. But it doesn't have to be all action. Emotions create their own suspense.’

2. Data science opens new horizons for understanding what viewers want

Recent advances in data science allow us to better understand emotions and use this knowledge to better predict viewers’ preferences. In our recent study we use data science natural language processing methodology to explore whether and to what extent emotions shape consumer preferences for media and entertainment content. In a recent paper, ‘The Data Science of Hollywood: Using Emotional Arcs of Movies to Drive Business Model Innovation in Entertainment Industries’, we conducted a sentiment analysis of thousands of movies and showed that, much like novels, stories in movies fit six major emotional clusters, as outlined in this video below:


Together with my co-authors Marco Del Vecchio (University of Cambridge), Alexander Kharlamov and Glenn Parry (both from Bristol Business School at the University of the West of England), we obtained over 156,000 scripts from opensubtitles.org and complimented them with the data on movies from IMDb website as well as data on revenues from the-numbers.com. 

After a complex filtering procedure, their final dataset consisted of 6,147 movies with complete scripts as well as information about each movie’s gross domestic revenue in the country of first release, IMDb motion picture ID number; date of release; average IMDb user satisfaction rating from 1 (very bad) to 10 (excellent); critic satisfaction meta score from 0 (very bad) to 100 (excellent); all IMDb genres of the movie (multiple genres were usually listed for each movie on the IMDb website); rating count (number of individual assessments contributing to IMDb rating); number of user reviews; number of critic reviews; number of awards (Oscars and other awards); name of the motion picture director; runtime in minutes; and age appropriateness rating. For a subset of 3,051 movies, we also had data on worldwide gross revenue as well as production budgets. 

We split each script into sentences and for each sentence the valence was calculated by assigning every word its sentimental value from -1 (emotionally negative terms) to 1 (emotionally positive terms). After that, the sentiment for each movie was accumulated and represented using the motion picture timing from 0% (beginning of the movie) to 100% (end of the movie). 

3. Viewers go for U-shaped emotional roller-coasters

We find that customers tend to prefer emotional trajectories which could be described as U-shaped emotional roller-coasters. The analysis revealed that the highest box offices are associated with the Man in a Hole shape which is characterized by an emotional fall followed by an emotional rise (resembling a U-shape) – see below compared to the other five major arcs:

Figure showing six major emotional trajectories of movies
 Six major emotional trajectories of movies

This shape results in financially successful movies (in terms of gross worldwide and gross domestic revenues) irrespective of the genre and production budget. Yet, Man in a Hole succeeds not because it produces most ‘liked’ movies but because it generates most ‘talked about’ movies. Specifically, while Man in a Hole produces high revenue, it is not associated with high IMDb ratings. Yet, the numbers of ratings as well as user and critic reviews are a lot higher for Man in a Hole movies than for movies in any other emotional arc category.

Yet, it would be an oversimplification to say that motion picture industry should only concentrate on producing Man in a Hole movies. Interestingly, a carefully chosen combination of production budget and genre may produce a financially successful movie with any emotional shape. For example, the Icarus shape is good for low budget movies while if you want to shoot a successful tragedy (the Riches to Rags shape), make it epic with a large budget of over 100 million dollars. Other surprising results tell us that Sci-Fi, mystery, and thrillers with happy endings (the Rags to Riches shape) do not do well at the box office. Equally, it is not a good idea to shoot a comedy with a bad ending (the Riches to Rags shape). Also, Oedipus-shaped movies on average do not seem to do well at award ceremonies and festivals (other than the Oscars).

Our goal is to shift decision making in the industry to the viewers. By using sentiment analysis, movie makers can potentially engineer content which consumers really want to see. We would like to expand the research and ideally get some industry partners on board to provide us with better data. To date our analysis only looks at full-length movies which lasted on average 108 minutes. In the future we would like to create robust methods to analyse sentiment in all media. We’d like to examine non-fiction such as documentaries and much shorter videos, such as those on YouTube. Once we have optimised the tool it would be good to spin out a company that can commercialise the work and get it into the hands of industry colleagues.

In this blog, Ganna Pogrebna, ESRC Fellow at The Alan Turing Institute and Professor at the University of Birmingham, presents her recent research from the University of Birmingham published by Cornell University Library. Blending behavioural science, computer science, data analytics, engineering and business model innovation, Ganna helps cities, businesses, charities and individuals to better understand why they make decisions they make and how they can optimise their behaviour to achieve higher profit, better social outcomes, as well as flourish and bolster their well-being.