Introduction
Cryogenic electron microscopy (cryo-EM) is the fastest growing technique to explore the structure of biological macromolecules. Determining structures from these images can be challenging, and often the data alone is not enough to generate solutions. This project aims to develop a computational pipeline that can exploit much more of the existing knowledge about biological structures in the cryo-EM structure determination process. This approach should improve the reconstruction results for particularly challenging datasets.
Explaining the science
In Cryo-EM, in order to limit radiation damage, images are recorded under low-dose conditions, which leads to high levels of experimental noise. To reduce the noise, one averages over many images, but this requires alignment and classification algorithms that are robust to the high levels of noise. When signal-to-noise ratios drop, cryo-EM 3D reconstruction algorithms become susceptible to overfitting, ultimately limiting their applicability.
The algorithms can be improved by incorporating prior knowledge, however to date, all implementations have used a relatively poor source of prior information: the observation that in a typical cryo-EM real-space reconstruction the density values do not change rapidly from one voxel (3D-pixel) to the next. This project is investigating a more informative prior that uses the vast amount of prior knowledge that structural biology has gathered in the past 50 years.
This prior would incorporate knowledge that proteins are made of polypeptide chains, which fold to form specific secondary structure elements, adopting specific protein domain shapes, while leaving the solvent region around it flat. This prior will also restrict the number of possible solutions for a given dataset, and thus enable processing data with higher levels of noise. Previously, it was hard to imagine how the rich knowledge that is available about protein structures could be expressed in a form that would be suitable for computational optimization. However, convolutional neural networks (CNN) have in recent years been shown to be well suited to do just that.
Project aims
This project aims to develop a computational pipeline that can exploit much more of the existing knowledge about biological structures in the cryo-EM structure determination process.This prior knowledge will be expressed through convolutional neural networks (CNNs) that have been trained on many reconstructions, and use these networks in novel algorithms that optimise a regularised likelihood function.
Similar approaches have excelled in image de-noising and reconstruction in related areas. Preliminary results with simulated data suggest that significant improvements beyond the existing methods are possible, both in computational speed and in signal recovery capabilities.
Applications
The objective is to explore machine learning methods that have been successful in other imaging modalities, like computed tomography, to produce more informative priors for cryo-EM structure determination. The proposed methods will enable faster computations with less user involvement, but most importantly, they will extend the applicability of cryo-EM structure determination to many more samples, alleviating the existing experimental requirements of particle size, ice thickness and sample purity.