Trustworthy AI forum: Human-machine teaming


Jonny Freeman

Jonny Freeman, Communications & Engagement Coordinator

Monday 20 December 2021


How can we ensure that human-machine teams are trustworthy, safe, and ethical? And what are the logistical and regulatory challenges posed by advances in machine learning? These were some of the questions discussed at the Turing’s first Trustworthy AI forum, on the theme of human-machine teaming, which took place on Friday 29 October 2021.

The forum was convened to connect data practitioners in industry with academic experts in the field, in line with the Turing’s goal of acting as a trusted broker between industry and data science and AI researchers. Participants included Turing-affiliated researchers and senior data practitioners from diverse industrial sectors including finance, retail, media, aerospace, and government.

Human-machine teaming is the process by which humans and machines collaborate on decision making. At the start of the forum, several broad questions were outlined: How can we design a machine with which humans can have a meaningful interaction? What’s the right structure for this machine? And at what point should the human intervene? 

Human-in-the-loop vs. human-on-the-loop 

A major theme of discussion was the relative virtues of human-in-the-loop (HITL) vs. human-on-the-loop (HOTL) models of human-machine teaming. HITL is any model which requires human intervention to complete the process – the machine generates predictions or suggestions which are then actioned or vetoed by the human. By contrast, HOTL means the machine can complete the process without any action from the human, though the human still has oversight and the power to intervene if necessary.  

Each model poses different challenges, with varied legal and ethical implications. HITL might often be preferable for safety-critical systems, where machine error could result in death or injury. HITL also has an accountable human for all actions it carries out, making things simpler from a legal and regulatory perspective. HOTL is more complex – who is accountable for the harm done by an autonomous system? Is it the human on the loop who didn’t detect the error, or the system’s designer or deployer? 

In cases where it is not clear who the accountable person is, one participant suggested it might be necessary to ensure the algorithm is insured for any harm caused, for example in the case of a self-driving car which crashes into another vehicle.  Another offered that for conventional cars, we have driving licenses to regulate human capability, and an MOT to regulate machine capability. Do we need an equivalent for AI and users of AI?

One participant suggested that legal and regulatory frameworks which mandate HITL for safety and ethical reasons are fast becoming obsolete, due to advances in machine learning; and come from a misguided notion that humans are the “gold standard” when it comes to decision-making.

Making human-machine teams explainable and trustworthy

The work of the psychologist and economist Daniel Kahneman was referenced at several points in the discussion. Kahneman’s study of decision-making at boardroom level found executives to be significantly overconfident in their ability to make rational decisions, and one forum participant suggested this was down to human attachment to the emotional “rush” of making decisions. The forum agreed on the need to adapt algorithms to account for this psychological attachment to decision-making – and suggested that cognitive and data scientists could work together where appropriate, to convince humans to surrender some of their decision-making capacity to machines.

Though Kahneman’s studies have shown simple algorithms can outperform experts in some decision-making tasks, one participant counselled vigilance. Machine learning models trained on human decisions might reproduce—or even exacerbate—human errors and biases.

In order to inspire confidence in machine decision-making, one participant stressed the need to “take people on the journey” when explaining a machine learning process, “layering complexity” so that the user can follow each step. This level of explainability may be sufficient to inspire user confidence, without the need for the user to understand each individual element of the algorithm. Additionally, when evaluating the success of a human-machine team—it is important to evaluate each step of the “decision pathway” rather than just the individual machine learning elements.

Other participants identified the importance of increasing machine trustworthiness – and stressed that this is distinct from machine accuracy. A machine that is 60% accurate may be more trustworthy than one which is 90% accurate, if the humans in the team do not know where that inaccurate 10% lies. Similarly, a scan informing a doctor that a patient has a 70% overall risk of a given disease may be more trustworthy if it can also highlight elements of the scan which indicate higher risk or lower risk. For a human-machine team to work well, the humans need to understand the limitations of the machine. One participant noted “it is better to be roughly right than precisely wrong”.

Another important aspect of trustworthiness is defining clear roles for the human and the machine in the team. A self-driving car with a steering wheel was given as an example of this: such designs are potentially very dangerous if the human does not know exactly when to intervene or to exactly what extent. However, human-driven cars with automatic AI-powered brakes or lane-correction are much safer – the role of the machine is more limited here, but clearer to the human.

As the forum wrapped up, one participant offered a note of caution regarding explainability – we shouldn’t necessarily assume that companies are acting in good faith and being totally transparent about how their systems work. We ought to consider what might be “hidden under the covers” when systems are deployed.

In addition, several participants stressed the need for a clearer and more strictly defined vocabulary when referring to concepts such as explainability and trustworthiness—and even machine learning and artificial intelligence—noting that people are often referring to different things when they use these terms.

An ongoing discussion

This forum was only the first in a series of discussions on trustworthy AI and human-machine teaming. Future forums will be increasingly focused on exploring specific solutions to some of the challenges outlined in this session. If you are interested in being involved in future discussions, please contact [email protected]