Addressing ethical risks in generative AI through a data justice lens

A Turing project is providing new perspectives on bias and inequality

Wednesday 08 May 2024

Generative AI has created waves in the past year. But while this technology heralds potential for efficiency, creativity and innovation, there are questions about how it should be developed, implemented and regulated. A key concern is the ethical issues that are rapidly being identified around copyright, disinformation, online harms and – perhaps most prevalently – bias

It has been well noted that generative AI models often exhibit a lack of inclusiveness in their outputs. One example was a Bloomberg analysis demonstrating that the text-to-image model Stable Diffusion generated images for ‘high-paying’ jobs (such as ‘doctor’ and ‘lawyer’) that were dominated by people with lighter skin tones. Similarly, most jobs in the images were dominated by men, apart from certain ‘low-paying’ jobs such as ‘housekeeper’ and ‘cashier’. 

As a longstanding issue shaped by overlapping social, political and cultural factors, bias is not exclusive to a particular model or company. Researchers at Stanford University have similarly highlighted the gendered and racial biases in popular models such as GPT-4 and PaLM 2. By identifying the stark discrepancies in how Black- and White-sounding names are treated by generative AI (the models systematically provided advice that disadvantaged Black-sounding names), their work underlines systemic biases in present-day models, which can further the harms already experienced by marginalised communities. 

With these ethical challenges in mind, how can we make sure that generative AI works for everyone and not just a select few? 

Data justice as a path forward

One approach to tackling bias in AI is through the lens of data justice. Since 2021, researchers in the Turing’s Advancing data justice research and practice (ADJRP) project have been responding to the ethical issues arising from new AI technologies, collaborating with the Global Partnership on AI (GPAI) and 12 policy pilot partners around the globe. Data justice is a movement that advocates for applying the principles of social justice to all instances of data collection, processing and use. 

Our project – part of the Turing’s public policy programme – provides a practical framing for addressing and countering AI harms through guiding priorities that we call the ‘six pillars of data justice’: power, equity, access, identity, participation and knowledge. Each of these six pillars is very much relevant to the ethical challenges posed by generative AI.  

Generative AI is trained at scale on vast amounts of data scraped primarily from the internet, often without the knowledge of the people represented in the data. This data ‘crawl’ tends to overrepresent certain forms of knowledge and encodes predominant understandings of identity that are imbued with social biases. The reinforcement of these biases in model outputs results in further harm for individuals and groups that have been historically vulnerable, marginalised or discriminated against. A crucial question of equity that technology creators or policy makers can ask themselves is: what routes do those affected by these harmful depictions have for raising and challenging issues? 

Icons for the six pillars of data justice
The six pillars of data justice

These impacts are augmented by the stark power imbalances that exist between those who develop or fund generative AI models and those who are impacted each day by the often extractive and exploitative practices of the technology’s lifecycle. The extensive financial and computational resources required to train and develop these models means that only a specific set of actors are involved in setting the agenda for generative AI. This concentration of power cascades down to create unequal opportunities for accessing these resources. One of our policy pilot partners, Digital Natives Academy in Aotearoa (New Zealand), has underlined how although communities such as the Māori have an intricate history and way of relating to data, they “don’t normally have the capital, the funds, or the technology to share that story” (Potaua Biasiny-Tule). It is vital that communities impacted by these systems can meaningfully and inclusively participate in shaping them, especially before they are deployed on a large scale.

How can you engage with our work? 

In 2022, the Turing’s ADJRP project produced practical guides that can enable policy makers, impacted communities and developers to work together to address the ethical risks of AI technologies through the principles of data justice. These guides, which can be applied to generative AI, are now being translated into French, Spanish and Mandarin Chinese so that as many people as possible can contribute to creating a fairer future. We have also created a three-part documentary series about data justice with our international partners: watch it here.

 

Top image: Serhii