Population data about a local authority can vary across different data sources, which has substantial implications for a whole host of policy areas such as housing, land allocation and health expenditure. In 2017, GP registers for Leeds counted 60,000 more than the Office for National Statistics estimated to be living in Leeds. This project involves working with Leeds City Council to assess the differences seen and identify what discrepancies exist and why, in order to work towards achieving a single population count which may then be used to inform future policy-making.
Explaining the science
Data and methods
A classification was conducted of the UK, using variables derived from the 2011 Census outputs, to recognise demographic patterns across the UK and how these influence the disparity between population estimates from mid-year estimates (MYEs) and GP registers.
Variables were selected to reflect themes including: age, ethnic group, UK migration and social grade. Highly correlated variables were removed to reduce multiple collinearity. Counts were covered to percentages for Lower Super Output Area (LSOA).
K-means was performed to produce 7 clusters of the data using the selected 10 variables as inputs. The city of London was not included in the final classification model due to the unique attributes that solely occur in London influencing other clusters. This optimal number of clusters for creating the final clustering was determined using the elbow method.
Geographical Information Systems (GIS) were used to map cluster locations across the UK at LSOA level. GIS was also used to analyse patterns of difference percentage between population estimate counts across Leeds.
An optimum number of 7 clusters was found using k-means; these clusters were visualised using heatmaps of percentages of the variables. Each cluster displayed distinct characteristics of the population.
The classification of the UK presented here has highlighted a reoccurring pattern of higher GP counts occurring across the UK, which appear to be more pronounced in diverse clusters. This indicates that differences between population estimates is a wider problem occurring across the UK.
Leeds, however, is unique to the rest of the UK as it displays a higher frequency of LSOAs which contain demographics that could be driving the disparity between MYEs and GP registration counts, suggesting that ONS methods of collecting population estimates in certain areas require reviewing. As cluster distribution across Leeds reflects patterns of discrepancies of population estimates, this gives some indications that particular groups may have a larger influence over population estimates.
Since the 2011 Census, data gathered concerning population estimates of Leeds through counts of people register with GPs has largely differed with the population estimates obtained from mid-year estimates (MYEs) published by the Office for National Statistics (ONS). The importance of large discrepancies across particular areas in Leeds has implications for many areas involving city planning such as health planning, transport planning, and election preparation. A single agreed version is highly desirable. This project has used Geographical Information Systems and classification methods to assess where the discrepancies exist within Leeds and given further understanding as to why these discrepancies are occurring, indicating potential recent changes in the population composition of Leeds which are unaccounted for by the MYEs.
This work will likely inform future production and use of specific population data. Understanding where the differences in population estimates occur is able to contribute to aiding a single agreed version of population estimates required for aiding city planning services.