Data-Driven Discovery of Models

Speaker: Wade Shen (Defense Advanced Research Projects Agency, USA)

Date: 8 June 2017

Time: 14:00 – 15:00

Location: The Alan Turing Institute

Email Turing Events to register your place

Watch the live stream

Recordings will be made available on our YouTube channel following the event


Picture

Understanding the complex and increasingly data-intensive world around us relies on the construction of robust empirical models, i.e. representations of real, complex systems that enable decision makers to predict behaviours and answer “what-if” questions. Today, construction of complex empirical models is largely a manual process requiring a team of subject matter experts and data scientists. With ever more data becoming available via improved sensing and open sources, the opportunity exists to build models to speed scientific discovery, enhance Department of Defence/Intelligence Community’s intelligence, and improve United States Government logistics and workforce management, but capitalising on this opportunity is fundamentally limited by the availability of data scientists.

Today, construction of complex empirical models is largely a manual process requiring a team of subject matter experts and data scientists. With ever more data becoming available via improved sensing and open sources, the opportunity exists to build models to speed scientific discovery, enhance Department of Defence/Intelligence Community’s intelligence and improve United States Government logistics and workforce management. However, capitalising on this opportunity is fundamentally limited by the availability of data scientists.

The Data-Driven Discovery of Models (D3M) programme aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background, to create empirical models of real, complex processes. This capability will enable subject matter experts to create empirical models without the need for data

This capability will enable subject matter experts to create empirical models without the need for data scientists and will increase the productivity of expert data scientists via automation. The D3M automated model discovery process will be enabled by three key technologies to be developed in the course of the programme:

  • A library of selectable primitives. A discoverable archive of data modeling primitives will be developed to serve as the basic building blocks for complex modeling pipelines.
  • Automated composition of complex models. Techniques will be developed for automatically selecting model primitives and for composing selected primitives into complex modeling pipelines based on user-specified data and outcome(s) of interest.
  • Human-model interaction that enables curation of models by subject matter experts. A method and interface will be developed to facilitate human-model interaction that enables formal definition of modeling problems and curation of automatically constructed models by users who are not data scientists.

Automated model discovery systems developed by the D3M programme will be tested on real-world problems that will progressively get harder during the course of the program. Toward the end of the program, D3M will target problems that are both unsolved and underspecified in terms of data and instances of outcomes available for modeling.

This talk is suitable for technical experts in artificial intelligence, machine learning and data science.