There is a data science gap: the quantity of data collected increases every day while our human perceptual abilities stay unchanged and yet we are asked to come to judgements and make decisions using increasingly huge data sets. This project is researching how the power of cloud computing can be used to automatically layout data for digital twins to make it easier to see the message in the data. It's also looking at how to best represent important attributes of data, such as uncertainty, that enhance the truthfulness of data visualisations.
The algorithms created are being tested in an urban scale 3D atlas, but they will be equally applicable in manufacturing, defence, healthcare, risk planning, and anywhere where 3D/4D data needs to be presented in its physical context.
Explaining the science
There are three interlinked science themes in this project. These three themes all involve novel research, which will be demonstrated and evaluated as part of a pilot software tool or tools.
Novel computer architectures at scale
In order to process, layout and visualise huge amounts of data new ways to use supercomputers for visualisation are needed. The project has researched, designed and tested a cloud based visualisation architecture that allows for scaling beyond 14petaFLOPS of compute performance on demand. This is something that would otherwise require dedicated time on one of the worlds fastest supercomputers to achieve - a level of resource not available currently in the UK.
Novel visual representations of data including uncertainty
While data is often represented as having a single value, or as not varying significantly, in reality measurements can only be made to a certain tolerance level and sensor readings vary randomly over time. For example, in air quality monitoring less expensive sensors are attractive, but may be less accurate. This track of the project is investigating how to visually represent a range of uncertainty alongside a measured value so that it is clear which sensors are more accurate (have less variance).
Novel visual layout algorithms for data at scale
In any digital twin large amounts of data can obscure both the digital twin and other data items. While traditionally a graphic designer might layout the data to minimise these problems in large scale data visualisation this becomes infeasible, especially when the data is being used interactively or presented over many frames in a film. This theme in the project is looking to use models of human vision and cloud supercomputing to optimise the visual layouts of data. Multiple versions of a visualisation will be produced, then scored and ranked and only the best, in terms of layout, presented to the end user.
This project is researching novel methods for data layout and representation in the visualisation of 3D/4D digital twins that exploit the power of cloud computing. It will be investigating novel psychophysically informed algorithms to automate the visualisation process. This will address a growing gap in data science, the link from data and analytics results to human cognition and decision making.
It is evident, despite advances in AI, that without better human comprehension of data derived knowledge and its related uncertainty we stand little hope of improving the collective sum of human wisdom. Currently, the scale of big data, such as illustrated above, is challenging the capacity of visualisation designers to bridge this gap.
The second aim is to exploit the power of cloud computing to automate the process of laying-out and annotating a 3D/4D visualisation to best fit a specification. The project will take an agile approach that first investigates visualisation optimisation using fixed models of human vision, then later seeks to develop self-learning models.
For example, it will start by investigating the use of known contrast sensitivity metrics, crowding metrics and gradient limits as objective functions. In systems terms this requires creating a potentially stochastic optimisation loop that uses supercomputer-scale cloud computing power to generate a set of tentative visualisation from which the highest ranked are selected and presented to the designer or end user.
In this project the primary output will be published novel systems architectures and algorithms that are rigorously tested in the context of 3D/4D urban atlas representing data from an urban areas within its 3D context.
The project is being researched in the context of digital twins of urban areas, however it should be applicable to any digital twin that uses 3D/4D (spatio-temporal) data. Examples of areas of application include, urban monitoring and planning eg air quality, civil defence planning, digital factory twins for manufacturing, healthcare design, planning and monitoring.
The project has successfully demonstrated that it's possible to design, deploy and use over 14 petaFLOPS of compute performance in the cloud to produce the world's first terapixel visualisation of urban data in a digital twin. An academic article on the Terascope pilot was published on 13th February 2019 at arxiv.org/abs/1902.0482