Time series, a series of data points indexed (or listed or graphed) in time order, are a key motif in modern data science and AI, but introduce complexity whenever they appear. Due to this, data science tools for time series usually focus on a specific task or model class. The 'sktime' project aims to provide an integrated toolset for easy construction and success control for algorithms in the time series context.
Explaining the science
- 'Task' objects that encapsulate meta-data from a dataset and the necessary information about the particular supervised learning task, e.g. the instructions on how to derive the target/labels for classification from the data
- 'Strategy' objects that wrap low-level estimators and can be used to fit and predict methods using data and a task object
The low-level interface extends the standard scikit-learn API to handle time series and panel data. Currently, the package implements:
- Various state-of-the-art approaches to supervised learning with time series features
- Transformation of time series, including series-to-series transforms (e.g. Fourier transform), series-to-primitives transforms AKA feature extractors, (e.g. mean, variance), sub-divided into fittables (on table) and row-wise applicates
- Pipelining, allowing to chain multiple transformers with a final estimator
- Meta-learning strategies including tuning and ensembling, accepting pipelines as the base estimator
- Off-shelf composites strategies, such as a fully customisable random forest for time-series classification, with interval segmentation and feature extraction
The 'sktime' project aims to implement an open source time series toolbox within the PyData ecosystem.
Eventually, the project should support, via a unified interface, multiple different time series related modelling tasks, including:
- Time series classification and regression
- Classical forecastingSupervised/panel forecasting
- Time series segmentation
- Time-to-event and event risk modelling
- Unsupervised tasks such as motif discovery and anomaly detection, and diagnostic visualisation
- On-line and streaming tasks, e.g., in variation of the above
Data scientific modelling is a key part of the modern data science and AI workflow – modelling software toolboxes with a unified modelling interface (one task – many solutions – one interface), such as Weka, mlr and scikit-learn, have become a core asset to the modern data scientist’s knowledge and toolbase.
Typical functionality of AI toolbox packages usually includes:
- A unified model specification and model execution interface, for training and applying the models to data
- Model composition and model tuning functionality, for manual or automated construction of improved strategies out of simpler ones
- Success control functionality checking usefulness of the modelling strategies, often in the form of semi-automated benchmarking and evaluation workflows
Distinct gaps exist in a number of use cases involving time series, which this project is addressing. A toolbox is, by definition, a tool that facilitates applications broadly. Examples could be:
- Time series forecasting: predict tomorrow given today, e.g. extrapolate past observations in financial data to the future, or predict the weather tomorrow given past weather
- Time series classification: given a time series, assign a label to it, e.g. identify a spoken word from a recorded audio sequence, or identify a type of motion from a video recording
- Panel data prediction: given some time series, predict the values in others, e.g. predict the healthcare trajectory of a hospital patient, having previously observed other, similar patients
Please visit www.github/alan-turing-institute/sktime for the most recent updates.
Please feel free to raise issue on the issue tracker (e.g. feature requests). Contributions are very welcome.