Introduction

Can we automate the UK’s system for handling planning applications? In a Data Study Group challenge put forward by Agile Datum we trained machine learning models on UK planning application documents to classify and detect floorplans in applications, with the aim of speeding up the process of validating submitted planning applications using deep learning.

The workshop, which was part of the Turing’s December 2019 Data Study Group, was proposed by Agile Datum, a UK based company specialising in using data science in agile applications for public and private sector organisations. Data Study Groups are collaborative hackathons, organised by The Alan Turing Institute, bringing together organisations from industry, government, and the third sector, with talented multi-disciplinary researchers from academia. This case study contains extracts from the full report published on the Turing’s website.

Case study

Over 3.5 million planning applications are submitted in the United Kingdom each year. That is not surprising considering that every construction project in the UK, including building or extending a house, fitting new windows on a listed building, or chopping down a tree, requires the submission of complicated forms and technical drawings. Once submitted, each of these documents needs to be manually validated and approved by a planning officer.  

As a result, each application requires 30-60 minutes of council time to validate whether the information in the forms and drawings is correct. This manual validation represents around 250,000 person hours of administration per year.  This creates large backlogs and a lack of information for applicants, leading to additional calls and emails to chase progress, which further increases workload on planning officers.

Over one third (1.2 million) of the submitted applications are rejected, often because the planning system in the UK remains complex and fragmented and thus, inefficient. For example, as of yet there is no standardised format for applications. Each council has its own application forms which have different fields of entry and there are no standard templates for submission of the drawings. The plans and forms are all submitted in a PDF format containing various layouts, formats and content.

Despite the wide diversity of applications, independent research led by Agile Datum shows that over 80% of rejected applications occur due to 12 common mistakes. These errors can be categorised as issues in the application forms or in the planning drawings of the existing and proposed work. These include incomplete sections of forms, incorrect application fee, the description of work does not match the drawings, missing or incorrect floorplans, missing or incorrect elevations (façade plans), incorrect or missing scale in drawings, north arrow missing in drawings as well as missing or incorrect site plans.

If AI methods could detect these common errors before the application gets to the stage of being reviewed by a planning officer, then the planning process could become much more efficient and streamlined.

In this Data Study Group, we used artificial intelligence (AI) and machine learning (ML), to detect and classify drawings and application forms, in order to automatically detect common mistakes. This automation could greatly reduce the amount of time it takes a planning officer to validate and approve a planning application, thus leading to a faster, more streamlined planning process. The challenge goes beyond identification of errors, by detecting individual components within an application, and offers a method for creating a database of classified drawings and a suggestion for a collective standardised planning application form.

The report developed from the Data Study Group at the Turing outlines the methodology used to perform this.  

Most common words found in planning application documents
Figure 1: Most common words found in planning application documents

All planning application documents are publicly available and are generally separated in two categories: the text forms and the drawing plans. 

To perform text analysis, we created a dataset with PDF file metadata, a dataset of (computer selectable) text directly extracted from the PDFs and a dataset of text generated from the image dataset using optical character recognition (a computer vision technique for identifying characters in images).  

For the drawings, we created a dataset of PDF pages converted to images. By doing this, we were able to create a searchable database, of all planning forms, including handwritten ones, in order to automatically detect errors in text forms. 

Planning drawings in the UK are not standardised and can vary from having one drawing per page to multiple drawings per page, being computer generated or hand drawn, and using varying scales and colours. Thus, the classification of individual components of the documents becomes a challenge. Drawings are also usually flattened in PDFs, meaning that each page can contain one or many images of floorplans and elevations.  

Regardless of format, each planning application should contain the following five components: floorplan, siteplan, elevation plan, section, north arrow and a textbox. Being able to detect these components is a first step towards automatically verifying whether an application contains the common errors mentioned earlier, such as a missing north symbol, that would result in an immediate rejection of the application. The challenge was to investigate whether a computer vision model could in fact allow us to automatically detect these different objects within the applications. 

This first required us to create a training dataset. We manually produced the training dataset, by labelling a sample of different drawings in real-life applications. We created labels for each of the five key categories mentioned above: a. floorplan, b. siteplan, c. elevation plan, d. section, e. north arrow, f. textbox.

Example of plans manually labelled to create the training dataset
Figure 2: Example of plans manually labelled to create the training dataset

Detecting floorplans

After creating the training dataset, we had to use object detection, which is a technique in image processing that deals with detecting semantic objects in digital images or videos (such as kangaroos, cats, humans or cars).  While Image Localization will specify the location of a single object in an image, Object Detection specifies the location of multiple objects. Finally, Image Segmentation will create a pixel wise mask of each object in the images.

Floorplan

To perform object detection, we used the deep learning framework Keras and the Mask-RCNN library (mrcnn) to fine tune a Mask-RCNN pretrained model on our training data to detect the components within the planning drawings. R-CNN (Region-based Convolutional Network), initialises small regions in an image and merges them with a hierarchical grouping.  The Mask Region-based Convolutional Network (Mask R-CNN), detects different objects in an image or a video. You give it an image, and it outputs the object bounding boxes, classes, and masks. The model was pretrained on the MS COCO dataset (a large-scale dataset for object detection). 

This tutorial by Jason Brownlee is a great resource for providing further details relating to the setup of such a pipeline. 

Even though we were only able to label only 30 images given the timeframe, the results were promising. The model was able to detect drawings and separate them from text blocks, or north arrows. The bounding boxes were also quite accurate in most cases, engulfing each of the drawings separately. 

Figures 3, 4 and 5 provide example predictions made by the model on two test images and on two training images respectively. The left image in each pair is the ground truth for that example (i.e. the bounding box labelled by the human) and the image on the right is the model’s predicted bounding boxes. Each class is represented by a different colour. 

The first and the second image shows an example of a successful prediction. The model accurately predicted 4/5 drawings, with relatively accurate bounding boxes in the first case, while in the second the model accurately detected the site plan and successfully categorised the north arrow symbol. The labels and drawing title are also successfully categorised. The second figure is an example of a false prediction. The model recognised half of the drawing and part of it was misclassified as drawing title. 

Example of a successful prediction
Figure 3: Example of a successful prediction. The model accurately predicted 4/5 drawings, with relatively accurate bounding boxes. The labels and drawing title are also successfully categorised.
Example of a false prediction
Figure 4: Example of a false prediction. The model recognised half of the drawing and part of it was misclassified as drawing title.
Example of a successful prediction in a Site plan.
Figure 5: Example of a successful prediction in a Site plan. The model accurately classified the drawing and north symbol, though an extra class may need to be added for the scale sign.

The results show that the object detection process can provide a successful method for the digitisation of planning applications, though the model may benefit from a larger training dataset as well as hyperparameter tuning. The detection of discrete elements such as symbols or individual drawings, can speed up the evaluation process by automatically eliminating planning applications with common mistakes, such as the missing north symbol, or a missing floorplan. 

The current developed model also does not yet recognise whether a plan is an existing floorplan or a proposed one, and neither can it separate between floorplans of different floors (i.e. if the proposed building has more than one floor). However, when planning officers look at a drawing, they can read the labels assigned to each plan. In order for the model to distinguish between the proposed and existing plans, we could give the model another feature of the “title text” found near the plan (usually labelled as “proposed” or “existing”) so that it could learn to associate them. If we then perform OCR on these title texts, we get a semantic text of the label and thus, each allocated drawing, providing a dataset of classified drawings. 

Conclusions

This work was a proof of concept, a first step towards identifying components in these planning applications that many applicants trip over.  There can be further future uses of ML models in this context, such as models being able to automatically identify and highlight the changes between existing and proposed plans. In this report, we presented a prototype based on the Mask R-CNN framework to perform image detection and segmentation. There are lots of techniques for semantic segmentation of images these days, in particular using clustering features of images given by pretrained model (which don’t necessarily require further training of a model or a labelled dataset). 

However, this work has been a demonstration of assisted AI to support and improve the UK’s planning application services. 

The planning support system in the UK, until this day, remains decentralised. Each local authority receives planning applications separately, and the review is performed manually. AI can aid in this context by speeding up the evaluation process, not by taking over the existing system, but rather by quickly detecting common mistakes, digitising and categorising drawings, and creating a searchable database of plans and text. Further to this though, this work can lead to creating a digital 3D database of the UK’s building stock. The challenge would be to take the classified drawings, convert them to an editable semantic form, and reconstruct a 3D building from the 2D drawings. This can provide a platform for an urban digital twin, a virtual replica of a city of past and new buildings, with the possibility of offering insights into performance and potential problems in the context of the smart city.

Acknowledgements

Our thanks to Agile Datum and the research team for their work on this case study.

 

Organisers

Anna Hadjitofi

Research Postgraduate Student, Institute of Perception, Action and Behaviour, University of Edinburgh