Developing data science tools for improving enterprise cyber-security

This Data Study Group (DSG) challenge aims to carry out a preliminary investigation of some statistical and machine learning tools for analysing certain types of cyber-relevant data sources. Specifically, we consider a unified repository released by Los Alamos National Laboratory (LANL) comprising both network flow records and process-level Windows service logs collected on the same enterprise computer network over a three-month period.

Three aspects tackled in this challenge include anomaly detection, data fusion, and visualisation. Within the DSG week, we have aimed to consider if fusion of the data sources can give a more coherent view of this network’s behaviour and what visualisations can be used to aid a prioritisation of of potential threats for analysts. Other explorations developed during this study group have been provided and the potential applications or limitations described. This report does not provide a ‘white paper’ on cyber-security tools, but rather aims to detail the methods attempted by different groups of participants in this DSG.

Citation information

Data Study Group team. (2019, November 29). Data Study Group Final Report: Imperial College London, Los Alamos National Laboratory, Heilbronn Institute. Zenodo. http://doi.org/10.5281/zenodo.3558251

Additional information

Bertrand Nortier, University of Bristol
Camelia Simoiu, Stanford University
Francesco Sanna Passino, Imperial College London
Ghita Berrada, King's College London
Hanne Hoitzing, G-Research
Henry Clausen, University of Edinburgh
John Booth
Karl Hallgren
Keli Liu
Leigh Shlomovich
, Imperial College London
Qi (Katherine) He, UCL
Roberto Jordaney, HP Labs.
Silvia Metelli, Imperial College London

Turing affiliated authors