Building better AI classifiers for hate

Leveraging advances in artificial intelligence and data-centric methods to develop better models for detecting abuse and hate speech

Project status

Ongoing

Introduction

We use data-centric AI methods and leverage advances in language modelling to build better abuse classifiers, and explore ways that models can generalise and transfer to new domains.

The traditional AI paradigm focuses on increasingly complex models and static datasets. Data-centric AI focuses on the construction of high-quality, efficient datasets, that minimise harm done to human annotators and are updated over time to account for temporal shifts in language. In addition, developments in foundational language models prompt a shift in focus away from model complexity and towards data quality. We curate high-quality datasets and employ techniques such as active-learning to build more efficient and accurate models for detecting abuse. We focus both on specialist and generalist classifiers, studying how models trained in certain domains can adapt to new domains, to help provide a basis to help platforms and regulators understand how to detect and monitor abusive content.

Organisers

Researchers and collaborators

Previous contributors