For a contact tracing system (automated or manual) to be trusted and effective, it will have to identify those who are actually at risk of contracting the virus and inform them of the need to self-isolate. If the contact tracing system is unable to accurately identify those who need to self-isolate, for example by notifying people who are not at risk (known as a ‘false positive’) or by not identifying people at increased risk (known as a ‘false negative’), the consequences may be severe.

Self-isolation may negatively impact people’s lives, by compromising their ability to go to work and feed their family, or affecting people’s mental health when they’re told that they’re at risk of having the virus. On the other hand, not notifying individuals who carry the virus can increase the reproduction number, resulting in potential outbreaks, unnecessary deaths and, as a result, restrictive policy decisions (such as further lockdown measures) being enforced on large parts of society, further impacting the economy and societal fabric.

It is therefore important to have an app that is able to appropriately characterise risk; that is, individuals who are actually at risk of contracting and spreading the virus are notified to self-isolate.

The problem

The app determines whether a specific individual is at risk of contracting the virus by assessing its interaction with other, potentially infectious individuals. A risk-scoring approach is used to decide the level of risk by calculating the distance between the devices, the duration of the contact, and the infectiousness of the individuals encountered.[1] Whilst the time and date of the encounter is accurately captured by the app (provided that the phone’s clock is appropriately synchronised), the distance and duration of the encounter is estimated from the phones’ Bluetooth connectivity and signal strength.

The reliance on Bluetooth is an issue, as it was never designed to estimate distance in the first place. The Bluetooth signal strength is subject to a significant amount of interference, such as the Bluetooth signal bouncing off objects in the phones’ environment (also known as ‘multi-path’) or suffering from attenuation (e.g. by a phone being placed in a pocket behind a wallet).[2] Similarly, assessing the duration of contacts made based on Bluetooth signal strength is subject to errors in cases where the signal strength is low or the sampling rate is low. For instance, if we receive two Bluetooth signals between devices, 5 minutes apart, we do not know if two people have been sitting close to each other for that length of time, or just happened to be passing by at both points when the connections occurred.

This is a problem faced by all contact tracing apps being developed around the world regardless of the software architecture being used, using Bluetooth as a proxy for distance and duration. We cannot change the factors associated with Bluetooth signals, but we can statistically model the impact of these errors by allocating a degree of uncertainty associated with the signal strength and use this knowledge to inform the risk-score attached to individuals’ encounters.

A technical roadmap

Experts from The Alan Turing Institute, the Department for Health and Social Care and the National Cyber Security Centre are undertaking work to characterise and improve the performance of the app.

Using version 1.4 of the Google / Apple Exposure Notification (GAEN) Application Programming Interface (API), we have experimental and modelling evidence to suggest that the current app has an area under the curve (AUC) of 0.68, a detection performance >99% (i.e. the phones almost always detect each other when within reasonable Bluetooth range), and at the risk threshold selected for England's public trial of the contact tracing app starting this week, a true positive rate (TPR) of 69% and a false positive rate (FPR) of 45%. By way of a simple illustration, during the recent Leicester outbreak, the app would have generated ~50 false positives a day in a population of 330,000. These false positives would be individuals who had been in contact with an infected individual for greater than 15 mins between 2m and 4m apart.[3]

Critically, the TPR and FPR values correspond to the problem of determining classification performance at 2m for 15 mins – for an interaction with an individual that has tested positive for Covid-19. In other words, if you have really been sufficiently close to an individual with Covid-19 for sufficiently long, and so are at risk, you will be notified. Conversely, if you’ve not been near to an app user that has tested positive, then you will not be notified. The false positive rate is primarily associated with deciding whether a potentially affected user is within, or just over, the 2m threshold in terms of their interaction with a positive app user.[4] However, in these ambiguous cases, we can do better.

In order to achieve an improvement, we have developed four new ideas:[5]

  1. Introducing a probabilistic framework to the risk score to assess and compensate for uncertainties in the distance and duration calculations.[6]
  2. Creating a new likelihood function that encapsulates many forms of environmental error in the Bluetooth signal such as multi-path and attenuation. We have used the MIT dataset[7] in order to evaluate the performance of this new function, verified in simulation and against other trials data.
  3. Integrating a statistical approach to distance estimation via an Unscented Kalman Smoothing algorithm.[8] This specific algorithmic choice is a compromise between computational complexity, mathematical complexity and accuracy. We have demonstrated an improvement in localisation accuracy (compared to other widely used approaches), in simulation, on the MIT dataset, and using other experimental data captured in the UK.
  4. Exploring the use of existence probabilities in order to incorporate the actual probability of people staying in close proximity for a long period of time versus them having moved in and out of contact during that time.[9]

Modelling the impact of these ideas, in particular the introduction of item 3, has demonstrated that we can achieve an AUC of >0.8 (generally considered “excellent” by, say, the international machine learning research community). We will continue to work with Apple and Google to obtain the best possible app using some of the ideas above.

Ongoing global collaboration

Finally, as with all scientific endeavours, we need to continue to test our hypotheses, revise our assumptions in light of new evidence, listen to, and continue to collaborate with colleagues from around the world so that we can help to build a universally valuable digital contact tracing technical approach. This is a global challenge that will be solved through collaboration and open science, so we rise to the challenges of a virus that knows no borders.


[1] For a detailed description of the risk-scoring approach, including other factors not listed here, see the (un-reviewed) paper on Risk scoring calculation for the current NHSx contact tracing app.

[3] Figures are for illustration purposes only and rely upon a set of simplifying assumptions for modelling, which are not detailed here.

[4] There is also a link between false positives and low-quality duration estimation, which is not detailed here for the sake of brevity.

[5] The novelty is in the combination of ideas that have been around for many years - all the way back to the moon landing in 1969!

[6] The beginning of this work can be found in the (un-reviewed) paper on Risk scoring calculation for the current NHSx contact tracing app.

[7] MIT dataset used is available here: https://github.com/mitll.

[8] For further details on Unscented Kalman Smoothing algorithm, see the paper on Smoothing algorithms for state–space models and a write-up of our most recent work.

[9] For further details on the use of existence probabilities, see Bayesian visual tracking with existence process.