Can data trusts be the backbone of our future AI ecosystem?

Wednesday 03 Oct 2018

The interest in exploring the concept of ‘data trusts’ - a way of sharing data responsibly between partners - has been gathering momentum, not least since the independent report published by government in October 2017, Growing the artificial intelligence industry in the UK.

This report, authored by Professor Dame Wendy Hall and Jérôme Pesenti, called for the establishment of data trusts to “facilitate the sharing of data between organisations holding data and organisations looking to use data to develop artificial intelligence.”

The term ‘data trust’ as currently understood is very broad. It could refer to a trust as a legal structure. It could also refer to a simple bilateral data-sharing agreement, or to a myriad of other possible structures.

Here at The Alan Turing Institute, we’re interested in how data trusts could help to shape the future artificial intelligence (AI) ecosystem. At present, lots of well-intentioned initiatives to create machine learning algorithms fail because of the lack of training datasets, and this is true both in the private and the public sector. A data trust could enable safe and secure data sharing that would allow the UK to develop and deploy AI systems to benefit society and the economy. For example, Secretary of State for Health and Social Care Matt Hancock recently described the power of genomics plus AI to use the NHS’s data to save lives as “literally greater than anywhere else on the planet.” However, it is imperative that data sharing occurs in a responsible, safe and ethical way, ensuring trust and fostering healthy innovation.

In response to the government recommendation and our own interest in data trusts, we recently hosted a workshop to explore different structures and features data trusts might have, raise concerns and questions important to their development, and sketch potential solutions. With a broad group of stakeholders including academics, industry and third sector representatives, we looked at the opportunities and the challenges from legal, ethical, technical, security, economic and policy perspectives.

The main finding from the workshop was that establishing data trusts presents great opportunities, for example in healthcare, social care and public services, where sharing data in a data trust scenario would enable more efficient and optimised frontline services. However, the technical, legal, ethical and human rights concerns are significant: for example, linking up sensitive datasets relating to highly personal information could have unintended consequences in terms of leaking personal information. To make progress, we likely need to focus on specific use-cases, identifying datasets, users and objectives in advance. A one size fit all approach is unlikely to be productive.  

Looking just at the ethical dimensions of data trusts, we learnt:

  • Individuals need to be empowered to understand the subtleties and complexities relating to sharing their personal data.

Personal data is highly complex: it is not limited to a name and an address but also comprises many other details such as a person’s social graph (including information on who their friends are) and behaviours. Individuals should give consent for personal data about them to be used, but there are questions about whether individuals are best placed to understand the full implications of giving consent – with many consent agreements using opaque and lengthy legal language. There are also related issues such as how long the consent should last for, and what happens when individuals withdraw consent. There may be a need for a data ombudsman as an interim body for the public to go to, as an early regulatory measure that doesn’t require legislation. An institution similar to Public Health England established in the digital and data sphere could be an alternative option to tackle this issue.

  • There is a need to establish a risk register for ethical risks associated with a data trust.

Risk protocols that cover security risks relating to software infrastructure should comprise an added component: a risk assessment for ethical breaches. One example is an evaluation of ethical risks from the human rights perspective. This type of risk could be relevant in the case of an attempt to merge data from the NHS and other governmental departments.

  • Ethical and political issues get intermingled.

Part of the complexity of the task of assessing ethical risks relating to data sharing arises from the fact that it is often difficult to answer the question ‘is it ethical?’ without specifying subgroups or individuals. This question often morphs into the sub-question ‘to whom is it ethical?’. For instance, there are people who are ‘data poor’ or ‘data rich’ depending on the amount of digital and data footprint they leave. Data poverty may be intertwined with other types of poverty: often financial, as poorer people tend to be digitally excluded; or social; or it may be the result of a deliberate choice to avoid leaving a digital footprint. Whatever the reason, it is important to ensure data poverty does not result in or exacerbate other types of poverty. Designing services based on individuals who are ‘data rich’ may not be fairly catering to the needs of the ‘data poor’ individuals.

  • There are challenges relating to adequate ethical oversight in practice.

New software features and updates are typically developed (or ‘shipped’) on a very short timescale, often weekly. In this sort of operating environment with delivery time pressures, trade-offs in features or data usage or app design are made ‘on the fly’ and ethical oversight could struggle to keep up with the pace of software shipping schedules. An ethical oversight body related to a data trust should have practical understanding of technology and related issues, not just theoretical, and it would need to be agile in staying up to date and to look ahead. 

Get in touch if you would like to be involved in future discussions at the Turing about the ethics of data sharing and data trusts.

Acknowledgements

The findings related in this blog are thanks to contributions from Lorna McGregor from the Human Rights Centre at Essex University, Rachel Coldicutt from doteveryone, Sam Smith from MedConfidential and Turing Programme Director Adrian Weller.