Sampling Protocols
Introduction
Much of the work in IMP involves trying to find conformations which minimize a scoring function. The scoring function is composed of scoring terms and the space of possible conformations is called the conformation or sampling space.
Most of the scoring functions used are on points or spheres in 3D and have terms which limit the packing density (excluded volume, VDW, sterics etc).
Protocols details
Most sampling protocols involve a combination of
- manipulating the representation to reduce the number of dimension of the conformation space
- manipulation of the scoring function so as to smooth it or bring minima closer to the current conformation
- running an optimizer to improve the current conformation
- perturbing the conformation in the sampling space by adding noise
Sampling pitfalls
- instability due to excessively high values of various terms of the scoring function
- rough scoring functions have many local minima in which protocols can get stuck
Representation manipulations
- use rigid bodies
- reduces the dimensionality
- use a coarser grained representation of the entities, eg, represent a protein with 10 spheres rather than 1000 atoms
- reduces the dimensionality
- smooths the scoring function
Scoring function manipulations
- removing "detail" terms from the scoring function. Example include, removing excluded volume interactions or force fields.
- this smooths the scoring function
- it can also bring minima closer to the current conformation (by increasing the set of conformations which score at the minima)
- remove long range terms to allow the optimizer to find a conformation in which the short range ones are satisfied before (re)adding the long range ones
- trying to satisfy both short range and long range terms simulatenously can be too much
- satisfying the long range ones first can lead to a system which is too packed for the conformation to sort out the short range terms
- perhaps best to do "after" finding conformations which satisfy the long range ones (just a guess)
capping terms, such as replacing Harmonic with TruncatedHarmonic terms.
- improves numerical and algorithmic stability of the optimizers
- it is probably better to remove the terms entirely and add systematically
- scaling parameters of the system. For example, it is useful to scale the radius of particles down when there are many steric clashes
this removes many terms from the scoring function (since when using HarmonicLowerBound-style terms
- reduces the magnitude of others to remove numerical instabilities and keep terms in balance
- remove high resolution terms from the scoring function. For example, remove all terms that refer to atoms and just deal with the residues. This is best accompanied by coarsening the representation.
- doing it smooths the scoring function
Adding noise to the system
Adding noise to the system is a good way to expand the set of conformations searched and escape from local minima. Ways to add noise include
- randomized starting conformations
- Monte Carlo steps
- high temperature MD simulation
Optimization support in IMP
Currently supported methods
- Monte Carlo
- good for adding noise to the system
- many parameters to play with: the move set, the temperature
- Conjugate Gradients
- needs even ranges for parameter values (eg angles have to be rescaled to have same range as x,y,z)
- gets stuck when large parts of the system are cannot be improved: The CG steps involve changing all the parameters in a certain direction. If many of the parameters cannot be significantly improved, yet still have derivatives, CG is unlikely to find a direction in which it can take a reasonable sized step.
- When the scoring function is smooth, it can take large steps
- not numerically stable when some terms are extremely large
- molecular dynamics/brownian dynamics
- mostly local optimization
- moves particles independently so can still many progress even if may particles are stuck
- not numerically stable when there are very large forces (BD takes variable sized steps, so it is better in this regard)
- Domino
- needs a sampling of subspaces of the conformation space and then searches the product set efficiently
- Keren should write something for this
Future methods
- Quasi Newton
- Frido likes it. We have an implementation based on GSL
- Langevin dynamics
- normal modes
- motion planning
- iterative Domino