Introduction
Personal wearable devices, such as traditional activity trackers and heart monitors but also a variety of new generation devices, are showing great potential for individuals’s continuous self-monitoring in a free-living setting. While these devices may be enablers for affordable, preventive, personalised healthcare, effective techniques are yet to be fully developed for exploiting the signal they produce.
Explaining the science
Several techniques come together to address these problems. Firstly, features that algorithms can learn from must be extracted from the raw activity data. Traditionally, this has been achieved by translating the raw signal into activity intensity levels. In contrast, the use of deep machine learning techniques is proposed to automatically extract more advanced features, that is a new representation of the data containing the digital biomarkers that can be used for learning. This is challenging as many different deep learning architectures may need to be explored to produce a useful representation space.
Secondly, clustering techniques are used to group individuals according to these features, reflecting their common activity patterns, so that focused models can be learnt for each group. Finally, possible associations between an individual's set of digital biomarkers and their genotypes will be explored, using statistical GWAS techniques (Genome-wide association studies), to try and explain correlations between activity patterns and certain genetics mutations.
Project aims
The main aim is to demonstrate that signals collected from personal devices by individuals over a period of time, suitably combined with a person’s genotype information as well as other clinical and self-reported features, can be successfully used to train machine learning models for the prediction of certain metabolic diseases, primarily type 2 diabetes and cardiovascular disease.
The main dataset for this study is the UK Biobank, where about 100,000 participants have contributed 1-week worth of activity tracking data over a period of several years. Genotypes and a wealth of clinically assessed as well as self-reporting data is also available for these individuals, making this an ideal testbed for the project.
Diagnoses of metabolic diseases for a fraction of these participants, along with their possible complications as documented by secondary care records (i.e., hospital admission events), provide the necessary ground truth used to experiment with a variety of predictive models, and ultimately to measure the success of the project.
Applications
The outcomes of this pilot project will inform emerging preventive and personalised healthcare practices. The project will also help address problems of data quality that currently limit the adoption of wearable sensing devices for medical use. It will suggest best practices for data collection to improve the standards of data completeness and reliability to the levels required by learning algorithms.