Traditional data processing and storage applications are becoming increasingly inadequate to deal with the explosion of Earth system data from models, in-situ observations, and remote sensing. Moreover, working with these large complex datasets often presents a substantial barrier to new users which needs to be mastered before focusing on scientific exploration. The Turing is working closely with the NERC Research Centres, The Turing Way, the Met Office and Microsoft to support Pangeo, the growing international platform to empower users to process and extract information from big geoscience data. The Pangeo project provides a framework for such datasets on the Cloud and high-performance computing by using open-source components from the Python ecosystem. It allows for interactive and scalable computing on large, gridded datasets used by ocean, atmosphere, land and climate scientists.
Led by the Turing's data science for science research programme and the Research Engineering Group (REG), and in collaboration with NERC Research Centres, The Turing Way, the Informatics Lab at the Met Office and Microsoft’s AI for Earth programme, the project aims to improve Pangeo’s ability to: i) handle petabyte-scale datasets on the Microsoft Azure platform; and ii) provide demonstrators to empower new users to apply new digital technologies. The organisers welcome interest from those who would like to contribute new case studies on Pangeo.
- Alejandro Coca-Castro (Turing PDRA) is actively supporting the expansion of the Pangeo Europe network, an emerging regional community-driven initiative aiming to integrate and highlight European users and developers of Pangeo’s open-source software and infrastructure for Big Data geoscience. Further info of regional meetings and notes can be found on the Pangeo website.
- Alejandro leads the development of the Environmental Data Science book (EnvDS book), a community-driven online resource showcasing and supporting the publication of data, research and open-source developments in environmental sciences. Integrating actively with the Pangeo Gallery and The Turing Way, the community has successfully published numerous python-based notebooks covering data visualisation and modelling in different environmental systems. The notebooks consume common Pangeo stack e.g. intake, iris, xarray, hvplot for interactive visualisation and manipulation of a wide diversity of environmental datasets.
- The executable notebooks within the EnvDS book follow FAIR principles by testing and incorporating technologies developed by REsearch LIfecycle mAnagemeNt for Earth Science (RELIANCE). The EU-funded RELIANCE project develops a suite of innovative and interconnected services integrated into European Open Science Cloud (EOSC) supporting the research lifecycle management for Earth Science Communities and Copernicus Users.
Alejandro, on behalf of the EnvDS book community, presented a talk demonstrating the main progress of the initiative in a dedicated Pangeo Europe session in European Geophysical Union 2022 (EGU22).
- Development of a working prototype Jupyterhub + Azure Machine Learning spawner application, that lets users easily spin up Azure Machine Learning Compute Instances and Workspaces and interact with them via JupyterLab. This has been deployed both on the Met Office Azure subscription and at the Turing.
- The Met Office and Research Software Engineering team have developed an intake package that makes it easy for users to access some Met Office datasets in a cloudoptimised way.
- Theo McCaie, Scientific Systems Manager at the Met Office Informatics Lab presented a poster and talk demonstrating these developments at AGU.
Researchers and collaborators
To get involved please contact Aida Mehonic, [email protected].