Data-driven geoscience

Merging data-centric techniques with modelling and simulation, to answer questions in geoscience and make timely decisions

Project status



This project is aiming to tackle three of the major data science challenges in geoscience - the systematic acquisition and processing of very large amounts of geoscientific data, their integration across modalities and scales, and their use in deriving and validating models to answer scientific questions and make timely decisions.

Project aims

The three major data science challenges being addressed in this project are:

  1. Acquisition and processing: A wide variety of geophysical data (potential fields, electromagnetic data, seismic data, weather data, etc.) is acquired with very broad wavelength ranges, from surface sensor arrays, drilled wells, satellites and many other sources. These data sets are collectively among the largest science data sets in use, comparable in size and complexity only to those from astronomy and particle physics.
  2. Integration: jointly understanding the different types of data is a major challenge. Methodologically, there is a major gap between statistical modelling and machine learning on one side and numerical or physical modelling on the other. Hence a systematic approach to consistent data integration and model building is of highest value and priority.
  3. Deriving and validating models: Both data sources and models come with recognised issues that existing methodologies have difficulties coping with – such as features for which exact physical models are unknown (e.g., sub-surface geology, earthquakes), or models which are difficult to reconcile (e.g., seismic measurements vs social media alerts) – but which novel data science based approaches can address.

Research will be conducted together with translational stakeholders and world-leading domain experts, focusing on the interrelated topic areas detailed in 'Applications'.


  1. Geohazards and geo-risk:
    Earthquakes, landslides, tsunamis, flood risks, risks to mechanical structures, predictions and risk assessment, risk mitigation, early warning, emergency response.
  2. Geomodelling and model inversion:
    Shared Earth models, parameter fusion for deep Earth physics, assimilation of disparate and distributed data sources, multi-scale modelling, data integration, upscaling and downscaling.
  3. Resources, energy, carbon capture/storage and avoidance:
    Optimal, responsible and sustainable resource use and exploitation, water, mineral, and hydrocarbon exploration and production, carbon cycle understanding, carbon capture and storage for energy-cost and climate sustainability.
  4. Advanced analytics, statistics, machine learning:
    Statistical modelling, machine learning and modern data analytics methodology for the geosciences, e.g., spatio-temporal modelling, probabilistic modelling and uncertainty quantification; open methodological questions in model building, model assessment, prediction and forecasting workflows.
  5. Data acquisition and integrated models:
    Obtaining and integrating different types of Earth-related data, of different provenance, different scales and resolutions, reconciling and integrating different types of models, e.g., numerical vs statistical.
  6. Integrated data federation for geoscience:
    Practical challenges in sustained collaboration between different communities, data sources, scientific expertise spectra, computational resources, translational stakeholders.


Contact info

[email protected]