How data lies

Practical, actionable support to data scientists who are making efforts to be responsible, while recognising why this can present challenges.

Course overview

This course is alternatively titled "How learning to lie with data is essential to prevent AI from being sexist and racist." This is intended to catch attention but also highlights the content of this course which intends to support data scientists looking to do responsible AI. The first part of the title comes from a book from 1954 titled "How to lie with statistics" which has been brought back into consciousness through another book "Rebooting AI".

The first part of this course presents elements of how data can be misleading, while providing concrete tips to identify and address these data issues. The second part of the title refers to a series of recent scandals where it is argued that AI has not been used responsibly. These scandals, some of which are used as case studies in this course, are leading to the legislation coming in to ensure ethical uses of AI.

The second half of this course is focused on these ethical considerations needed for using AI responsibly. The course aims to support Data Scientists and their managers to increase their understanding of potential ethical challenges in the application of AI and provide concrete tips to support them to be responsible.

This course has been commissioned as part of our open funding call for Responsible AI courses, with funding from Accenture and the Alan Turing Institute.

Who is this course for?

This course is designed primarily for Data Scientists who are actively looking to be responsible in their work. Part of it is also intended to be appropriate for managers of data scientists or even their collaborators who may benefit from the broad discussions but skip some of the practical details.

Learning outcomes

By the end of this course, you will be able to:

  • Demonstrate an awareness of some ethical considerations which are shaping the future of AI and why data scientists need to be responsible in their role.
  • Identify some common pitfalls where data mis-interpretation can arise, and be presented with concrete advice to avoid them.
  • Gain practical experience working with data to draw correct conclusions in data containing complexities.

License

This course is released under a CC BY 4.0 license.
Materials can also be found on GitHub.

Details

1. Introduction

Module Name

Topic

Introduction Course background and information
2. Data considerations

Module Name

Topic

Module 1 Definitions Matter
Module 2 Data Matters
Module 3 Variability Matters
Module 4 Interactions Matter
Case study 1 COMPAS case study
Case study 2 Apple and Amazon case study
Case study 3 Ofqual case study
Case study 4 Protein folding case study
Practical approach Interactive example to consume the content using STACK
3. Ethical Considerations

Module Name

Topic

Lesson 1 Introducing ethics of AI
Lesson 2 Fairness and debiasing
Lesson 3 AI ethics beyond debiasing
Lesson 4 Accreditation of Trustworthy AI
4. Conclusion

Module Name

Topic

Conclusion Key takeaways

Instructors

Professor Peter Diggle

Distinguished Professor, CHICAS, Lancaster University and Steering Group Mentor, RSS COVID-19 Taskforce