Mind the gap: machine learning systems and the passage of time

Changing societal values raise important questions about how we design predictive algorithms

Monday 11 Apr 2022

Filed under

There are many aspects of our lives where the ability to make accurate predictions is useful. Predictions necessarily draw on past experience: as human experience is finite and inherently biased, machine learning (ML) systems trained on vast amounts of data have the potential to provide more accurate predictions than we are capable of alone. To deliver on such a promise, several factors need to be kept in check. One of them has to do with the passage of time. Whether it be in the domain of crime prevention, education or health, both our behaviour and the values we associate with these goals change over time.

On the behaviour front, computer scientists are familiar with the problems that arise when characteristics that used to be good predictors of future behaviour (it may be a combination of educational profile and socio-cultural background) become less so due to interventions (such as new arrest policies) or sudden societal changes. While computer scientists sometimes refer to this issue as ‘concept drift’, few policy makers seem to understand its importance. That none of the AI Act’s provisions aimed at avoiding scenarios where an ‘AI system’ is deployed in a setting whose characteristics are too distant from those of the training, validation and testing datasets consider the impact of the passage of time is a case in point. 

The impact of the passage of time

Yet the passage of time also raises a different kind of problem. What if it’s not just the predictive power of the characteristics of those whose future behaviour is predicted – the convict applying for parole, say – which changes? What if important characteristics of those yielding those prediction tools – that’s all of us – also change as a result of these tools’ widespread use? Since tools have always shaped us in more or less dramatic ways, it is tempting to dismiss the latter question as a matter for anthropological study. Why would it concern those designing ML systems deployed in contexts such as healthcare or the justice system? The answer has to do with the way in which health and justice practices depend on our retaining precisely the characteristics likely to be affected by certain ways of designing ML-based prediction tools. 

The values that preside over our justice and healthcare systems are in flux: we are constantly in the process of re-articulating the relative salience and meaning of an array of evolving aspirations, from solidarity to fairness via justice, integrity and toleration. This reflects our ‘work in progress’ nature as human beings who are always in the process of learning to live together. This dynamic dimension is ignored at our peril. 

The comforts of ‘objective’, algorithmically generated predictions can make us forget that these predictions merely reflect some past way of structuring inherently fallible, value-loaded practices. If predictions about the risk of being arrested remain mostly accurate 10 years down the line, it is not necessarily good news. Why? Because it may mean that our drive to call for policies that change the contours and roots of crime, arrest etc. has been eroded by increasingly uncritical reliance on ML tools. To make the outputs of such tools as conducive to contestation as their human counterparts requires design choices that go well beyond the current focus on individualistic transparency and explanation.

How does this work?

When it comes to ML systems meant to be deployed in morally-loaded contexts, our focus should shift towards making these systems’ outputs collectively contestable in the longer term, rather than for a particular individual at a particular time. To understand how this might work in practice, consider the following example:

Kasia and Omar follow a remote high school learning programme which optimises the selection of educational content based on Kasia and Omar’s respective profiles. When Kasia asks why she gets less challenging science lessons than Omar:

  • Option 1: The course coordinator sends a note which suggests that the factors that had most impact on the ML system’s content selection were her recent psychological test results (she scored highly on the anxiety scale). 
  • Option 2: As well as sending a note, the course coordinator also shares with Kasia the results of a second and third ML system. In contrast to the first, the second does not allow psychological test results to influence the selection of content and tasks. The third comes in two versions, trained on data generated by girls-only or boys-only schools. Kasia is struck by the very different content recommendations issued by each system and starts questioning the extent to which she is well served by the system favoured by her course coordinator.
  • Option 3: Students are regularly ‘switched’ from one system to another. Every time a switch takes place, students are notified and asked to relay the extent to which they felt adequately challenged, motivated etc. Similar feedback is open to both parents and course coordinators, who are encouraged to discuss their views.

Option 1’s explanatory note does little to improve Kasia’s degree of agency within her education programme. Option 2 allows Kasia to compare different systems, putting her in a position where she can start to appreciate the impact of different parameters and training datasets. Expanding upon this comparison, option 3 also includes interactive features which foster more collective types of feedback. This is important. Value-loaded practices such as our justice and healthcare systems evolve in large part thanks to a process of bottom-up, spontaneous questioning that can be stifled by uncritical reliance on ML tools. 

The ‘ensemble contestability’ features proposed in my paper ‘Diachronic interpretability and machine learning systems’ point at one concrete way of taking on board the need to design ML tools meant for morally-loaded contexts, which incentivises not just individual but collective contestation over time. The successful design of such incentives should mean that the practices within which these tools are embedded keep evolving. The structural changes entailed by such evolution are likely to entail a degree of ‘concept drift’, to refer back to the first way in which the passage of time affects such ML systems. This should be welcome, provided we wake up to the fact that the prediction tools we now rely on in so many aspects of our lives have the potential to trap us into futures that are less and less of our own making. To avoid this, the design choices mentioned above should be combined with public debates and transparent monitoring policies. 

Read the paper:
Diachronic interpretability and machine learning systems


Top image: Aron Visuals / Unsplash