Analysts and researchers face major hurdles understanding the quality of their data and the knock-on consequences of the choices they make during one stage of data processing on those that follow. Data visualisation offers many benefits that could help analysts and researchers to overcome those hurdles. This project will identify how visualisation techniques should be exploited, and develop a novel visualisation tool for key aspects of data profiling and pipeline design.
This project has three aims. First, to characterise the way in which analysts and researchers profile data and design data processing pipelines. This is important in order to understand the limitations of current profiling and pipeline design methods, the barriers that analysts and researchers face, and the ways in which visualisation techniques could be transformative.
Second, the project will engage with public and private sector analysts and researchers to identify quick wins, share best practice and develop a research agenda for the adoption of visualisation techniques in data profiling and pipeline design. The primary measure of success will be organisations beginning to adopt the techniques that are proposed, to make their profiling and pipeline design more rigorous and efficient. This is a catalyst for more scalable and higher quality data science.
Third, the project will develop a novel visualisation tool for key aspects of data profiling and pipeline design. Success will be defined by uptake of the tool by analysts and researchers during field evaluation, and the benefits that the tool brings to their work. This will be a launch pad for further research, which acknowledges visualisation as an equal partner to computational modelling in data science.
This project can benefit any application area, because data profiling and processing are universal. Existing partners are drawn from domains as widespread as health, urban analytics, retail and government.