The central research issue addressed by this group is to look at ways of closing the loop between machine learning and systems. This can be approached by simultaneously developing and adapting novel systems techniques in order to enable more scalable, efficient, and accurate machine learning and, on the other hand, developing and adapting novel statistical machine learning techniques in order to enable next generation intelligent data science infrastructures.
Explaining the science
Systems for machine learning
Enabling AI faces formidable challenges given ever-increasing problem scales and complexity (data volumes, store-compute-network resource limitations, and the need for timely responses to human data needs, etc). If proper attention to this is not paid, enabling AI, if at all possible, will be a matter of few resource-rich organisations that can 'throw resources at the problems'.
Machine learning for systems
The functioning of many critical system components depend on predictions and learning from previous activity, while balancing error guarantees against time and costs. Examples abound from cache subsystems in distributed systems, to data management systems and query processing engines, and big data analytics stacks. Statistical machine learning techniques and models are inherently excellently positioned to revolutionise software systems.
A mutual relationship
The focus of this interest group rests on exploring and leveraging this mutually-influential relationship between systems and statistical machine learning research. This relationship has the potential to deliver next-generation intelligent data science infrastructures and enable timely, accurate, and scalable learning for data-driven applications across all domains of human endeavour.
The main aims of the group are to:
- Bring together and create a critical mass of expertise from systems, statistical machine learning (SML), and other relevant research areas to study in depth the cross-fertilisation possibilities of systems-SML interaction.
- Develop software artefacts and encourage their uptake by the community of data scientists within The Alan Turing Institute, and further afield.
- Match the group's research expertise against relevant problems faced by the industrial sector, whose solutions require intelligent data analytics stacks, and/or scalable, efficient, and accurate SML models and algorithms.
Can we democratise data analytics systems/platforms, making them available even to individuals, organisations, or SMEs with few resources?
Challenges: Many analytics systems currently require large, sophisticated hardware software network infrastructures, whose usage requires large money costs
Example output: Learning from systems' day-today functioning to predict their results without engaging them
How should machine learning be best scaled and distributed?
Challenges: System scale-up and scale-out infrastructures and distributed/parallel algorithms for data partitioning and resource management play a key role in this
Example output: Distributed/parallel SML model training methods
How to best include system metrics, and optimise for both time/resource usage/costs and accuracy?
Challenges: Simultaneously achieving high accuracy and low response times, resource usage, and money costs
How should future system designs be altered to include SML models in order to improve performance?
Example output: Approximate Query Processing Engines based on SML models yielding a model-based query answering system (instead of a data-based system) that is faster, more accurate and with smaller overheads
Andrew Mallinson, Intel