Anomaly & Change Detection in Streaming Big Data
16th November at the British Library
Flexible, scalable models have proven to be key to the detection of anomalies and changepoints in big data. Methodologies based on nonparametric approaches have been able to represent a variety of patterns, and have seen success and wide use in areas ranging from machine learning and outlier detection to a variety of application areas as diverse as credit scoring and neuroscience. Among many important examples, a prototypical one is the use of Gaussian processes to detect changepoints in financial time series. Methods such as these become more important as larger, more heterogeneous volumes of data are collected both in high dimensions in time and in cross-section. In such cases a batch mode of analysis is often no longer applicable, leading instead to a streaming paradigm in which one must summarise and model nonstationary time series. Solving the challenges associated with this paradigm requires bringing together mathematics, statistics, computer science and engineering. This workshop seeks to scope key data science questions in this area as follows:
- How do we sequentially define outliers in the context of online nonparametric methods, yielding both controlled sensitivity and an ability to incorporate prior information?
- What mathematical structures can we develop and employ to analyse the performance of such methods? (For example, one special case arises when a class of features forms an algebra and there is an algebraic basis, enabling the separation of points by a linear quadratic functional in this basis. Another involves looking at the geometry of the system in high dimensional settings)
- How do we ensure that such methods are scalable? In particular, can computational guarantees be given to scale linearly, or even sublinearly, under suitable assumptions?