Traditional data processing and storage applications are becoming increasingly inadequate to deal with the explosion of Earth system data from models, in-situ observations, and remote sensing. The Turing will work with the Met Office and Microsoft to support Pangeo, the growing international platform for Big Data geoscience. The Pangeo project provides a framework for big data geoscience on the Cloud and high-performance computing by using open-source components from the Python ecosystem. It allows for interactive and scalable computing on large, gridded datasets used by ocean, atmosphere, land and climate scientists.
Led by the Turing's Research Engineering Group (REG), who will work closely with the Informatics Lab at the Met Office and Microsoft’s AI for Earth programme, the project aims to improve Pangeo’s ability to handle petabyte-scale datasets on the Microsoft Azure platform. Two new environmental data science projects will also be piloted and Pangeo’s available tooling will be extended. The organisers welcome interest from those who would like to contribute new case studies on Pangeo.
- Development of a working prototype Jupyterhub + Azure Machine Learning spawner application, that lets users easily spin up Azure Machine Learning Compute Instances and Workspaces and interact with them via JupyterLab. This has been deployed both on the Met Office Azure subscription and at the Turing.
- The Met Office and Research Software Engineering team have developed an intake package that makes it easy for users to access some Met Office datasets in a cloudoptimised way.
- Theo McCaie, Scientific Systems Manager at the Met Office Informatics Lab presented a poster and talk demonstrating these developments at AGU.
Researchers and collaborators
To get involved please contact Aida Mehonic at [email protected].