The process of transforming a raw dataset into useful knowledge is called data analytics. It comprises many different stages and phases. While some elements of the data analytics process have benefited from considerable development through software or tools, there has been little methodological research into so-called data ‘wrangling’, even though this is often laborious and time-consuming, and accounts for up to 80% of a typical data science project.
Data wrangling includes understanding what data is available, integrating data from multiple sources, identifying missing, messy or anomalous data, and extracting features in order to prepare data for computer modelling.
In this project, researchers at The Alan Turing Institute are drawing on new advances in artificial intelligence and machine learning to address data wrangling issues; they aim to develop systems that help to automate each stage of the data analytics process.
The resulting technology will benefit researchers, industry and government, dramatically improving the productivity of working data scientists, and revolutionising the speed and efficiency with which data can be transformed into useful knowledge.