Introduction
This project aims to learn a mapping between motion-tracking data (from wearable electronics like 'fitbit') and classes of activity, such as running, office work and household chores. This will be done through developing new models that combine the rigour of probabilistic reasoning with the flexibility of deep learning. Since the UK Biobank contains motion-tracking data, the models developed can be used to infer patterns of activity and relate them to health outcomes, for example the links between different types of exercise and heart disease.
Explaining the science
To learn jointly from a large amount of unlabelled data and a small amount of labelled data, semi-supervised learning is required. This motivates the use of deep generative models, which combine probabilistic reasoning and the representational power of deep learning.
Here probabilistic models are defined for both the data and the unobserved variables the project is interested in (such as class of activity), and the model is parameterised using deep learning. The model is then optimised, and a classifier obtained that can be applied to new data.
Project aims
The main aim is to develop a principled, high performance method for time series segmentation and classification that can handle large amounts of data. While the principal interest is in accelerometry data, the models developed could be applied to video, audio, or other time series data.
A large amount of raw data has been gained from the UK BioBank, along with a small amount of curated, labelled data from an Oxford study. The latter consists of people wearing both a camera and an accelerometer, so that their activity type is known.
The models being developed can learn jointly from both the unlabelled data and the labelled data, to get better performance than learning from each on its own.
Applications
The models are being applied to UK BioBank data, but they could be applied more widely. This work also aims to improve the quality of insights drawn from wearables in particular, but it could be used to analyse time series data in general.
Recent updates
May 2018
Paper published on using classical statistical machine learning methods, training just on the smaller CAPTURE-24 model and applying that model to the UK Biobank dataset.