Dr Margaret Mitchell, Dr Yacine Jernite, and members of the BigScience Data Governance group will join speakers from The Alan Turing Institute to lead a discussion on their approach to building a Large Language Model (LLM) in the open with researchers around the world.
About the event
The BigScience model, which finishes training in July 2022, has pioneered a new approach to building open & responsible AI, inspired by open science creation schemes such as CERN and the LHC, in order to facilitate the creation of large-scale artefacts for the entire research community. This effort addresses the current models of data governance and LLM development by big technology giants, which poses problems from research advancement, environmental, ethical, and societal perspectives, and issues raised in a paper co-authored by Margaret: On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
In this open conversation with the Turing community, Margaret, Yacine, and members of the BigScience team will present on BigScience’s model for data governance. Leaders from Turing’s tools, practices, and systems (TPS), public policy, and AI programmes will respond and address open questions with regards to building AI through distributed, international collaborations.
- Value pluralism
- Language modelling
- Distributed development
- Data sourcing
- Data protection laws
- Data hosting agreements
- Responsible AI Licensing
- BigScience Workshop
- HuggingFace Ethical Charter
- BigScience Workshop, Data Governance in the Age of Large-Scale Data-Driven Language Technology
- Maori Data Sovereignty License
- Mozilla Data Futures Lab, Overview of Governance Structures
- Ada Lovelace, Participatory data stewardship
- Data Justice in Practice: A Guide for Developers