Data-driven evaluation of treatments for type 2 diabetes

Using machine learning to investigate differential effects of therapies on Type 2 diabetes patients

Project status



There are limited guidelines currently to inform the prescription of second-line therapies (treatment that is given when initial treatment doesn't work, or stops working) for type 2 diabetes patients. This project seeks to construct a comprehensive comparative study, over randomised controlled trial datasets, to identify subgroups of patients that experience particularly beneficial or harmful effects of an active treatment, estimate their treatment effects and characterise determined patient subpopulations. A primary focus is on the use of machine learning for variable selection and to evaluate its relevance in treatment selection over traditional regression models.

Project aims

In a heterogeneous patient population diagnosed with type 2 diabetes, it is of interest to examine how patients may vary in their response to second-line drug therapies and what might influence these differences. Primarily, this work seeks to present a comprehensive comparison study over the patient population using a careful validation exercise (i.e. both internal and external).

In other words, the study will evaluate effect heterogeneity using machine learning and compare results to a relevant traditional model. Numerous data-driven methodologies have been recently developed at the intersection of machine learning and causal inference to evaluate treatment effect heterogeneity. However, there have been limited applications in medical and public health research.

This work specifically seeks to bridge the mentioned gap by using sophisticated algorithms to inform treatment selection for type 2 diabetes. Additionally, variable selection is uniquely incorporated in this study by explicitly accounting for the varied nature (e.g. categorical, continuous) of patient characteristics, associated selection bias and the presence of highly correlated predictors.


This work would be of interest to researchers in medicine, health outcomes, pharmacy and applied statisticians. Additionally, this is also of interest to the pharmaceutical industry. 


Dr John Dennis

Independent Research Fellow in Medical Statistics, University of Exeter

Researchers and collaborators