Introduction

Data visualisation is a tool for presenting data in a visual context. It has been described as a domain where art and science meet; a compelling visualisation requires solid data analysis skills coupled with a creative flair for visual design.

Visualisations deliver persuasive arguments because they increase message clarity by presenting complex data in a visual context. However, the visual aspect means that even subtle imperfections can disproportionately weaken the argument being made and the credence of the data analysis work. Therefore, in research data science project proposals visualisations should be considered carefully and resourced accordingly.

As a research software engineer, I have often been involved in research projects where deliverables include "a visualisation". I recognise the excitement that the promise of a slick and informative graphic brings to a project proposal.

The type of visualisations I have personally worked on range from simple line graphs indicating trends over time, to geospatial maps with multiple layers for filters, to dashboards containing a mix of all the above. Interactions within visualisations ranging from basic filtering on a single dataset, to cohort selection where entirely diverse datasets are integrated into a coherent picture. What I have discovered is that development work for visualisations is a complex beast, the basic requirement for anything of this sort are clearly defined objectives.

Unfortunately, in my experience, objectives for a data visualisation are so often defined in the vaguest terms. This makes my heart sink because 1) little thought and effort have been put into this aspect of the project 2) the challenges of developing a visualisation are underestimated and 3) the team cannot define what they want from this deliverable. As the developer, this means that I will start a project blindly, without knowing what message the visualisation is meant to impart.

Starting the project clock on a data visualisation project without having defined the visualisation's message always leads to an inefficient use of resources. This seems so obvious, but it took me time to learn this, and even more time to build the confidence to insist upon it. These are the lessons I have learnt.

Open questions are a waste of time

I had always assumed that the client would know what message they want their visualisation to impart. Therefore, when kicking off a visualisation project I asked open questions along the lines of "What is this visualisation about?". The purpose of this type of questions is to kick off discussion with collaborators, and also as a software engineer determine the feel of the project (in a finger-in-the-wind way). Will there be a single aspect of the dataset to display or will we require several plots to show entirety of the picture? How much of the project is also about user interface design, does this UI comprise of a row of buttons discreetly tucked at the top of the page or an entire dashboard taking up a third of the space.

This is a mistake because this question lands me nowhere close to what the data visualisation actually needs, which is, a message to impart.

It gets worse because many times, the answers to this question contain phrases such as "enabling users" to "explore the data" or to "discover new knowledge". Unless the project is specifically about enabling data exploration, my heart sinks a little bit more. Let me tell you why.

In research projects, when the visualisation is not the sole deliverable, it can be very difficult to get people to define the story that they want to tell. Visualisations are requested because they can provide a beautifully succinct display of data analysis or integration results. However, while a story or a takeaway message is essential for designing a visualisation, reality is that research measures and conclusions are fluid while the project is being scoped, and remain so during much of the project work.

This places the delivery of the visualisation in danger. Firstly, there is a resource penalty to be paid every time the message of the visualisation is changed. Many people do not realise that adding 'an extra input/variable' changes the visualisation in terms of clarity, planned visual flow and consistency. 'Just' adding extra texts can reduce the impact of a message and as a result makes a waste of all the efforts that has previously gone into it.

Secondly, when analysis results are produced, we may find that the data is not interesting and the whole edifice on which the visualisation was planned falls down. As a result, many projects safeguard their proposal by requiring the visualisation to be 'flexible', so that an interesting story may be 'discovered on the fly'. This is simply a way of kicking the ball down the road.

Prioritise defining the visualisation's message

Our research engineering team at the Turing have had many discussions about data science projects. One important message which we all agree on is the need to carefully consider the feasibility of questions to be asked of the data, and defining measures of success to show the extent to which the analysis has answered the questions.

It took me a while to extend this same type of consideration to data visualisation projects. My mistake was to believe that it was acceptable to define what message a visualisation should deliver after the project clock has begun. As a result, this process of defining the message continues to the very end of the project timeline. Without clear objectives, data visualisation becomes a process to find the best story or angle. This consumes the resources that should have been allocated to actual development.

In contrast, Turing data science projects are not considered ready to start until the questions, measures of success and data are ready. It seems to me that similarly, any visualisation included in a project proposal should be treated the same way and be subject to the same vigorous definition to minimise any 'fudging'.

Of course, just as data exploratory work is necessary in data science projects, some form of charting and prototyping is part of visualisation pipeline. However, the impact of changing the message of a visualisation mid-project is as significant as changing a data analysis question, and should be recognised as such.

In March 2019 I attended David McCandless's Workshops are Beautiful training in London. David spoke of defining a concept and a goal. A concept, he says is an idea you can convey to someone else. A goal is the message that you want to convey, examples of goals are 1) comparison of ideologies, or 2) a display of trends over time or 3) a summary of the most commonly used terms.

Example 1

Concept:  There are differences in the way perpetrators of domestic violence are described in the media, compared to the way their victims are described.

Goal: To compare the most commonly used descriptors of perpetrators and their victims. Highlight the difference by colour coding the descriptions depending on whether the terms are positive (eg. gentle, kind) neutral (eg. mother, sister) or negative (eg. aggressive, rude).

Example 2

Concept: Are people being assigned to projects that most fit their existing skill set, and ensured they are given the maximal opportunity to learn and apply new skill sets of their choice?

Goal: To show existence of links between people (listing their existing expertise and areas they want to develop) and projects (listing the expertise required).  

I think that the use of concepts and goals during project scoping can help in defining visualisation objectives. More importantly they can be a measure of how development work has drifted from original intentions, and allow either steps taken to mitigate this, or timelines adjusted to recognise this.

Conclusion

A single message to take from this post: We should understand the message we want our visualisation to impart before starting on anything else on the project.

If you have a slightly longer memory span: A data visualisation project specification should contain clear objectives for the data visualisation; the most important of which describes the message the visualisation should impart. Working on a data visualisation project without knowing the message to impart will lead to inefficient use of resources. If we have a clear idea of the visualisation's message, we can tell when work deviates and respond accordingly.