Matthew is a lecturer at the University of Birmingham. He qualified as a doctor in 1990, and trained in general medicine and neurology before starting his research career in functional brain imaging and the Hammersmith Hospital and Oxford University. He moved to the MRC Cognition and Brain Sciences Unit in 1996 to study brain imaging of movement, but his interests gradually shifted to analysis methods of brain images, and teaching brain imaging methods to psychologists and neuroscientists.
From 2003 to 2005, he worked at the University of California, Berkeley, where he first came across the principles of reproducible science; this led him to implement the first reproducible paper on brain imaging analysis (Aston, Turkheimer & Brett 2006). While at Berkeley he started working with Jarrod Millman, at the Brain Imaging Center, on new libraries for brain imaging analysis in Python, later funded with an NIH grant. He became a core contributor to SciPy, and developed his own libraries for brain imaging analysis, including Nibabel, that is now the base layer for other Python brain imaging libraries.
He returned to Cambridge in 2005, but went back to Berkeley in 2008 to continue his work on scientific Python and brain imaging. While at Berkeley, Matthew started to concentrate on teaching of brain imaging analysis, statistics and reproducibile research, and the central role of code in teaching basic ideas in imaging and statistics. He returned to the UK to work at the University of Birmingham, where his main focus is on teaching data science to undergraduates in life sciences and across the University.
Matthew's primary interests are in two related areas:
- The definition of data science
- Teaching data science
The term "data science" carries little meaning without context and it has been understood in many ways. Initial reports from industry characterised data scientists as scientists who had become much more effective in data analysis by using code. The first response from academia has been to characterise data science as "a superset of the fields of statistics and machine learning which adds some technology for ‘scaling up’ to ‘big data’" (Donoho 2015).
Does this response from academia capture what is really important in data science? Donoho argues strongly that it does not. Our understanding of data science will have fundamental implications for what we study and what we teach. For example, Berkeley's hugely successful undergraduate course on data science puts great emphasis on understanding data through code, with relatively little time devoted to machine learning, and none to techniques for analysing big data.
Matthew is working on methods for teaching data science to first-year undergraduates of any background, using the Berkeley course as a starting point. The textbook for his 2018-19 course is https://matthew-brett.github.io/dsfe. He also works on arguments for a productive definition of data science; see https://matthew-brett.github.io/dsfe/chapters/01/what-is-data-science for a summary.