As implementations of new data science algorithms are developed, there is a need to effectively and transparently quantify their computational and statistical performance in appropriate settings, and on a variety of systems. This project aims to address the multiple-system aspect of this problem using containers that have reasonably low performance overhead. This will allow researchers and users of data science algorithms to determine which tools are right for them, and which tools and areas are most in need of attention.
The primary work involved will be designing a flexible benchmarking infrastructure that supports benchmarking on multiple systems with different hardware and software. The emphasis is on selecting appropriate, existing systems tools for this purpose. Attention will be paid to particular aspects of data science algorithms, e.g. the need for testing against particular datasets and the need to account for approximation errors in addition to computational time for most problem classes.
A secondary, but also crucial, element is the design of an appropriate interface through which people can run or introduce new benchmarks and view the current state-of-the-art for various tasks. A primary purpose of running the benchmarks is to introduce healthy competition for different data science domains, and encourage researchers to submit new implementations with improved performance profiles.
This will benefit algorithms researchers, and application areas in which benchmarked algorithms are used. It will shed light on bottlenecks in algorithms, implementations, systems, languages and architectures.