The accurate modeling of energetic contributions to protein structure is a fundamental challenge in computational approaches to protein analysis and design. We describe a general computational method, EmCAST (empirical Cα stabilization), to score and optimize the sequence to the structure in proteins. The method relies on an empirical potential derived from the database of the Cα dihedral angle preferences for all possible four-residue sequences, using the data available in the Protein Data Bank. Our method produces stability predictions that naturally correlate one-to-one with the experimental results for solvent-exposed mutation sites. EmCAST predicted four mutations that increased the stability of a three-helix bundle, UBA(1), from 2.4 to 4.8 kcal/mol by optimizing residues in both helices and turns. For a set of eight variants, the predicted and experimental stabilizations correlate very well (R2 = 0.97) with a slope near 1 and with a 0.16 kcal/mol standard error for EmCAST predictions. Tests against literature data for the stability effects of surface-exposed mutations show that EmCAST outperforms the existing stability prediction methods. UBA(1) variants were crystallized to verify and analyze their structures at an atomic resolution. Thermodynamic and kinetic folding experiments were performed to determine the magnitude and mechanism of stabilization. Our method has the potential to enable the rapid, rational optimization of natural proteins, expand the analysis of the sequence/structure relationship, and supplement the existing protein design strategies.
- Databases, Protein
- Protein Folding