Big data is high volume and high velocity, as new datasets of increasing size are rapidly generated each year. Therefore, new forms of processing are required for more optimised analytics. This research explores methods of spatial data visualization for better perception of flow, with application of these methods to big data from the UK, in order to propose a workflow of effective spatial flow data visualization in the programming language R. Using a 'spatial interaction model' to predict migration pattern flows within the UK, flow data was visualized using multiple different techniques in R and the strengths and issues of generated outputs were assessed.
Explaining the science
This project utilised a number of different packages available in the open source programming language R.
The ggplot2 package was first used to visualize the data, with transparency indicating the amount of flow.
Mapdeck is an R package that allows the visualization of flow data on an interactive map, utilising the Mapbox GL and Deck.gl online resources.
Circos plots are a method of visualizing data in a circular layout, originally designed for comparative genomics. Chord diagrams, a type of circos plot, are ideal for exploring relationships between objects or positions, and as such they can be easily translated into spatial flow data contexts. The compactness of circular visualization allows data to be layered in a coherent manner, making it ideal for collation of multiple datasets. Chord diagrams are also highly customisable, making this visualization method appealing for both researchers and wider audiences such as the media or business contexts. The Circlize R function was used to produce a chord diagram output of flows.
The ChordDiag package was then used to generate interactive chord diagrams. Chord diagrams are a promising method of visualising spatial data. However, issues may arise when presenting results to an audience unfamiliar with how chord plots work – as it does not provide the same geographical context as mapping the flows on a traditional map. Therefore, it is proposed that the use of chord diagram visualization in tandem with traditional mapping techniques, such as Mapdeck visualization, is utilised so that the geographical context of the spatial data and general trends can be identified, whilst also providing the opportunity for more detailed data analysis from the specific flow information provided by the chord diagram.
Traditional spatial flow data visualization techniques struggle to represent the increased complexity of more recent datasets. This research explores methods of spatial data visualization for better perception of flow.
A spatial interaction model was used to predict migration pattern flows within the UK using census population data as an input, to provide a small dataset for quick proof-of-concept visualization. Data was visualized using R packages ggplot2, Mapdeck, Circlize and ChordDiag - the latter two being multidisciplinary R packages originally used in the visualization of genetic data.
The strengths and weaknesses of each visualization technique have been assessed, and a final workflow of visualization produced that combines the use of standard mapping techniques such as Mapdeck alongside non-standard techniques such as ChordDiag, as the latter provide more opportunity for detailed data analysis whilst traditional techniques provides geographical context of the spatial data.
Improving the visualization methods of spatial data allows researchers to draw conclusions from such data with more clarity and confidence. Being able to identify large flows that may have otherwise been missed is important in the field of urban analytics, allowing researchers to identify potential areas of interest and further research.