Machine learning for protein folding

This report presents the outputs of a week-long collaboration between The Alan Turing Institute and Woolfson laboratory in the School of Chemistry at the University of Bristol, to predict different states of a type of protein fold called coiled coils (CCs) (Woolfson (2017) which are the structural motifs that consist of two or more α-helices winding around each other, from linear sequences of amino acids by machine learning methods.

The main objective of the data study group was to develop a machine learning model that can, if given a sequence of amino acids, be able to predict whether this sequence will participate in either a) a parallel dimer, b) an anti-parallel dimer or c) any other structure such as a trimer. It was neglected whether this sequence folds with another homomer or heteromer, or whether it binds with a sequence from the same or a different protein chain. Also the orientation for any non-dimer structure was neglected.

The second objective was to predict the above-mentioned classes using the sequence and the information about whether a sequence binds with another homomer or a heteromer.

Citation information

Data Study Group team. (2020, June 5). Data Study Group Network Final Report - Woolfson Laboratory. Zenodo. http://doi.org/10.5281/zenodo.3877119

Additional information

Stephanie Seiermann, BJSS
Christopher Doris, University of Bristol and Heilbronn Institute for Mathematical Research
Misa Ogura, BBC
Amit Kumar Jaiswal, University of Bedfordshire
Sydney Vertigan, University of Bristol
Qingfen Yu, UCL