www.buffalo.edu
Introduction The ability to predict the loop structure in a protein is
useful in many studies, including homology modeling, protein design and docking.
There are significant challenges in obtaining the high quality models as the loop length increases.
The current research aims to overcome the challenges caused by the ruggedness of the energy landscape around a native protein structure, i.e. the presence of high energy barriers immediately around the structure by locally manipulating the shape of the energy landscape during certain steps of the conformational search.
Methodology
Sequence – Robust Loop Modeling with PyRosettaAparajita Dasgupta, Dr. Sheldon Park
Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Email: [email protected], [email protected]
PyRosetta is the Python version of Rosetta, a suite of software to support computational protein structure analysis. In the context of Rosetta, the kinematic closure (KIC) loop algorithm, allows prediction of the structure of loops of up to twelve amino acids with high accuracy, i.e. < 1 Å (Mandell et al Nature Method 2009, 6:551-2).
We note that protein structure, especially the main chain conformation, often exhibits robustness against small sequence variations. Using such transient mutations which smooth the energy landscape creates the possibility of improving results during the conformational search.
Figure 1: Procedure to improve conformational search by introducing transient mutations using KIC loop protocol in PyRosetta
Results Most protein structures yielded “funnel – shaped”
continuous graphs while only some diverged from this trend
Merely increasing the number of wild type structures (structures without any alanine mutation) did not lead to improved results
Results Future Work
Citations
Acknowledgments
Figure 2: RMSD vs minimized energy for each of the 20 wild type (non-mutated) proteins. Each graph represents 600 structures generated by the KIC loop protocol. Note the funnel shaped contour in most cases. For the proteins where the contour develops differently, prediction of loop structure is very difficult due to the presence of multiple conformations with different energies at the same RMSD
Figure 3: RMSD vs minimized energy for 3 wild type(1cnv, 1t1d and 1i7p) proteins. Each graph represents 7500 structures generated by the KIC loop protocol for wild type structures. Although the overall energy surface behaves similar as in the case of 600 structures, there is no marked improvement in either minimizing energy or predicting loop structure. This leads to the conclusion that site directed mutagenesis is indeed the right approach. Furthermore, increasing the structures also did not yield the classic “funnel-shaped” energy contour that is favorable for loop prediction as is evident in the cases of 1cnv and 1i7p. This is due to the fact that while the number of conformations does indeed increase, the energy landscape is not smoothed and hence those structures which may be possible but are not calculated due to the presence of a local maxima are not taken into account in this case as well.
One dimensional analysis of RMSD did not yield any conclusive results to point out which amino acids (if any) led to more difficult energy landscapes for modeling purposes
Mutated structures led to lower energy and resulted in better structure prediction
Figure 4: Boxplots depicting distribution of LRMSD for each of the 20 amino acids. For each proteins and its 13 versions (12 mutants and 1 wild type), the minimum RMSD was calculated and the mutated residue for that particular structure was noted. Boxplots were plotted to visualize if any clear trends appeared signifying which amino acids posed an issue in de-novo modeling. While some amino acids are common in occurrence as compared to others, a clear trend was not visible while plotting. The main conclusion drawn from this exercise was that one dimensional analysis does not yield any trends and that a two dimensional analysis of RMSD with another observable property (Energy, in current experiment) is vital to clearly understand the bottlenecks associated with loop modeling
Figure 5: RMSD vs minimized energy for all 20 proteins for wild type and mutant structures. Each data point on each graph represents a single average structure from the cluster which were formed from each type of mutant. The blue data points are mutant structures while the purple data points are wild type structures. In all cases the mutated structures had lower energy than the wild type structure. This leads us to the conclusion that site directed mutagenesis can indeed lead to improved de novo structure prediction when coupled with the KIC loop protocol. Since energy and RMSD are significantly lower than the wild type structures, the odds of arriving at a correct structure increase greatly when using these mutated structures.
While applying site directed mutagenesis led to better results, there are still minor differences in the predicted structure and the actual structure
Our initial approach was to combine all mutants and wild type structures together and determine whether this smoothed the energy landscape further
However, this approach did not yield conclusive results
The current approach is to aim to linearize the RMSD and energy relationship for each protein near the lowest energy threshold obtained using linear regression techniques and neural networks
The authors would like to thank the UB School of Engineering and Applied Science
Figure 6: 1cnv native structure and minimum energy model mutated back to wild type. The RMSD is 3.3 A for this system. The current algorithm still leaves a few questions to be answered with regards to the energy function, the role of each type of amino acid and the characteristic energy landscape for each protein
1. Mandell, J. D., Coutsias, A. E., & Kortemme, T. (2009). Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nature Methods .
2. Baugh, E. H., Lyskov, S., Weitzner, B. D., & Gray, J. (2011). Real-Time PyMOL Visualization for Rosetta and PyRosetta. PLOS One .
3. Das R, Baker D (2008) Macromolecular modeling with Rosetta. Biochemistry 77: 363–382.
Top Related