IMP logo
Introduction

IMP is a library for solving a a wide variety of molecular structures and dynamics using many different data sources. As a result, it provides a great deal of flexibility. In order to best make the required decisions about how to use IMP to solve a particular problem, it is useful to understand the overall structure of IMP.

  1. Theory
  2. Concepts
  3. Examples
  4. Modules
  5. C++ and Python
  6. Conventions
  7. Incremental scoring
  8. Reporting bugs
  9. Where to go next

1. Theory

Structure and dynamics modeling in IMP proceeds in a five stage iterative process

dot_inline_dotgraph_1.png

IMP provides a large number of functionality to facilitate this process. Links to representative classes are given for future reference.

  1. Data acquisition: we can't be much help here as we are computer people and don't know what to do with test tubes.
  2. Representation selection: representation in IMP is via a collection of entities called particles (IMP::Particle objects). Each particle can contain one or more of the following sets of data In addition, IMP can enforce relationships between particles: New types of representation can be easily added via a decorator mechanism, which is explained more below and on the IMP::Decorator page.

    Representations can loaded from a number of standard file types, for example see IMP::atom::read_pdb() and IMP::atom::read_mol2().
  3. Encoding the data as a scoring function: Proposed models are scored based on how well they match the data, with a low score meaning a closer fit than a high score. In IMP the scoring function is the sum terms, each of which is computed by an IMP::Restraint object. The scoring function terms can be based on things like Other terms can be formed by using IMP::SingletonScore, IMP::PairScore, IMP::TripletScore, IMP::QuadScore objects in conjuction with general purpose restraint creators. These allow large number of parts to be scored in similar ways more efficiently than creating many restraints. Examples using this include
  4. Sampling good conformations: Once the scoring function has been designed you need to search for conformations of the model that have low scores (and therefore fit the data well). Sampling produces a set of conformations of the model, organized into an IMP::ConformationSet. Currently IMP provides two sampling protocols, IMP::core::MCCGSampler which uses a combination of IMP::core::MonteCarlo and IMP::core::ConjugateGradients with randomized starting conformations and IMP::domino::DominoOptimizer which uses a graph based inference algorithm. Sampling is an iterative process that tends to be structured as follows:
    dot_inline_dotgraph_2.png
  5. Analysis of good conformations: Finally, one needs to analyze the set of conformations produced by sampling. IMP provides a variety of tools to help display the conformations, in IMP::display, and to cluster them, in IMP::statistics. Display capabilities include

Knowledge about the system being modeled enters the process at all stages, but a few need extra note:

Coming up with the right choices for representation, scoring and sampling for a given system typically takes a few iterations and trial and error. IMP provides tools to help monitor how things are performing.

2. Concepts

As has already been hinted at, IMP is organized around a number of core concepts. Representation is handled via a collection of IMP::Particle objects. Each has a set of arbitrary attributes (such as an x coordinate or a mass). In order to make particles more friendly, we provide IMP::Decorator classes which, guess what, decorate, an existing particle to provide a higher level interface to manipulate the attributes of a particle. See the IMP::Decorator page for more details.

IMP provides containers in order to aid managing sets of particles. These inherit from IMP::Container (notice that IMP::Particle objects are containers and can contain lists of particles). A container could be as simple as an IMP::container::ListSingletonContainer which simply stores a list of particles. However, it could also be more involved, such as the IMP::container::ClosePairContainer which keeps track of all pairs of particles which are close to one another in space. It can be used to implemented non-bonded operations for example.

Scoring is handled by a collection of IMP::Restraint objects. Each of these keeps a list of particles and scores those particles based on how well they fit some sort of data. Some restraints are designed to see if a set of particles fit a particular experimental measurement (eg IMP::em::FitRestraint). Some other restraints are more general, allowing a particular sort of score function to an arbitrary container. For example, one can use an IMP::container::PairsRestraint, coupled with an IMP::container::ClosePairContainer and an IMP::core::SoftSpherePairScore, to make sure that a collection of balls don't overlap.

The IMP::Model ties together the representation and score. In addition to storing all the particles and restraints, it allows one to enforce invariant between particles (eg allows IMP to implement rigid bodies using IMP::core::RigidBody), allows one to specify maximum allowable scores for restraints and ranges for attributes. Invariants are enforced using IMP::Constraint objects stored in the IMP::Model. They maintain some hard invariant of the representation. Examples include, keeping a rigid body rigid, or ensuring that the IMP::container::ClosePairContainer really contains all close pairs. Constraints are updated as part of the IMP::Model::evaluate(). This means that the constraint does not necessarily hold except during score evaluation. In order to ensure that all constraints hold, call IMP::Model::evaluate() before inspecting the particles.

Once the representation and scoring are set up, one needs to find good conformations of the model. This is done via IMP::Optimizer and IMP::Sampler-derived classes. The former takes the current state of the model and tries to change the optimized attributes it so that the score improves (an optimized attribute is a float attribute where IMP::Particle::get_is_optimized() returns true). The latter, run more involved sampling algorithms and return an IMP::ConfigurationSet which allows one to inspect the found conformations. The process of optimization or sampling can be observed and influenced via IMP::OptimizerState objects. The most generally useful of these write the optimization steps to files so that the process can be observed (eg IMP::atom::WritePDBOptimizerState). The set of attributes which are manipulated by the optimizers are controlled by setting the optimized flag using IMP::Particle::set_is_optimized() or a decorator method such as IMP::core::XZYR::set_coordinates_are_optimized(). Certain attributes which are computed as functions of other attributes should never be set as optimized. Examples include the x,y,z coordinates of members of a rigid body.

Finally, the found conformations should be analyzed. These conformations would typically be stored in an IMP::ConformationSet. The conformations can be clustered via the IMP::statistics::create_lloyds_kmeans() function coupled with an IMP::statistics::ConfigurationSetXYZEmbedding. Alternatively, they can be exported as PDB files (IMP::atom::write_pdb()), Pymol files (IMP::display::PymolWriter) or Chimera files (IMP::display::ChimeraWriter).

3. Examples

The following examples give some idea of the basics of using IMP. They are all are in Python, but the C++ code is nearly the same.

Each module has an examples page linked from its main page.

Creating some particles

The function creates a bunch of particles and uses the IMP::core::XYZR decorator to given them random coordinates and a radius of 1. Note that this is not a fully runable snippet. Please see, eg, the coarse grained nup84 example for a similar code that can be run.

import IMP
import IMP.atom
import IMP.container
import IMP.display
import IMP.statistics
import IMP.example
import IMP.system
import parameters

display_restraints=[]

def create_representation():
    print "creating representation"
    m= IMP.Model()
    all=IMP.atom.Hierarchy.setup_particle(IMP.Particle(m))
    all.set_name("the universe")
    def create_protein(name, ds):
        h=IMP.atom.create_protein(m, name, parameters.resolution, ds)
        leaves= IMP.atom.get_leaves(h)
        all.add_child(h)
        r=IMP.atom.create_connectivity_restraint([IMP.atom.Selection(c)\
                                                  for c in h.get_children()],
                                                 parameters.k)
        if r:
            m.add_restraint(r)
            display_restraints.append(r)
            m.set_maximum_score(r, parameters.k)
    def create_protein_from_pdbs(name, files):
        def create_from_pdb(file):
            sls=IMP.SetLogState(IMP.NONE)
            t=IMP.atom.read_pdb( IMP.get_example_path("data/"+file), m,
                                 IMP.atom.ATOMPDBSelector())
            del sls
            #IMP.atom.show_molecular_hierarchy(t)
            c=IMP.atom.Chain(IMP.atom.get_by_type(t, IMP.atom.CHAIN_TYPE)[0])
            if c.get_number_of_children()==0:
                IMP.atom.show_molecular_hierarchy(t)
            # there is no reason to use all atoms, just approximate the pdb shape instead
            s=IMP.atom.create_simplified_along_backbone(c,
                                                        parameters.resolution/2.0)
            IMP.atom.destroy(t)
            # make the simplified structure rigid
            rb=IMP.atom.create_rigid_body(s)
            rb.set_coordinates_are_optimized(True)
            return s
        if len(files) >1:
            p= IMP.Particle(m)
            h= IMP.atom.Hierarchy.setup_particle(p)
            h.set_name(name)
            for i, f in enumerate(files):
                c=create_from_pdb(f)
                h.add_child(c)
                c.set_name(name+" chain "+str(i))
            r=IMP.atom.create_connectivity_restraint([IMP.atom.Selection(c)\
                                                      for c in h.get_children()],
                                                     parameters.k)
            if r:
                m.add_restraint(r)
                display_restraints.append(r)
                m.set_maximum_score(r, parameters.k)
        else:
            h= create_from_pdb(files[0])
            h.set_name(name)
        all.add_child(h)
    create_protein("Nup85", 570)
    ct= IMP.atom.Selection(all, molecule="Nup85", terminus= IMP.atom.Selection.C)
    d= IMP.core.XYZ(ct.get_selected_particles()[0])
    d.set_coordinates(IMP.algebra.Vector3D(0,0,0))
    d.set_coordinates_are_optimized(False)
    create_protein("Nup84", 460)
    create_protein("Nup145C", 442)
    create_protein("Nup120", [0, 500, 761])
    create_protein("Nup133", [0, 450, 778, 1160])
    create_protein_from_pdbs("Seh1", ["seh1.pdb"])
    create_protein_from_pdbs("Sec13", ["sec13.pdb"])
    return (m, all)

def create_restraints(m, all):
    print "creating restraints"
    def add_connectivity_restraint(s):
        r= IMP.atom.create_connectivity_restraint(s, parameters.k)
        m.add_restraint(r)
        m.set_maximum_score(r, parameters.k)
        display_restraints.append(r)
    def add_distance_restraint(s0, s1):
        r=IMP.atom.create_distance_restraint(s0,s1, 0, parameters.k)
        m.add_restraint(r)
        m.set_maximum_score(r, parameters.k)
        display_restraints.append(r)
    evr=IMP.atom.create_excluded_volume_restraint([all])
    m.add_restraint(evr)
    S= IMP.atom.Selection
    s0=S(hierarchy=all, molecule="Nup145C", residue_indexes=[(0,423)])
    s1=S(hierarchy=all, molecule="Nup84", molecules=[])
    s2=S(hierarchy=all, molecule="Sec13")
    add_connectivity_restraint([s0,s1,s2])
    add_distance_restraint(S(hierarchy=all, molecule="Nup145C", residue_indexes=[(0,423)]),
                           S(hierarchy=all, molecule="Nup85"))
    add_distance_restraint(S(hierarchy=all, molecule="Nup145C", residue_indexes=[(0,423)]),
                           S(hierarchy=all, molecule="Nup120",
                             residue_indexes= [(500, 762)]))
    add_distance_restraint(S(hierarchy=all, molecule="Nup84"),
                           S(hierarchy=all, molecule="Nup133",
                             residue_indexes=[(778, 1160)]))
    add_distance_restraint(S(hierarchy=all, molecule="Nup85"),
                           S(hierarchy=all, molecule="Seh1"))
    add_distance_restraint(S(hierarchy=all, molecule="Nup145C",
                             residue_indexes=[(0,423)]),
                           S(hierarchy=all, molecule="Sec13"))
    for l in IMP.atom.get_leaves(all):
        r= IMP.example.ExampleRestraint(l, parameters.k)
        m.add_restraint(r)
        # make sure rigid bodies are as not all particles in them can be on the x,y plane
        m.set_maximum_score(.5*parameters.resolution**2*parameters.k)


def create_geometry(all):
    print "creating geometry"
    gs=[]
    for i in range(all.get_number_of_children()):
        color= IMP.display.get_display_color(i)
        n= all.get_child(i)
        name= n.get_name()
        for l in IMP.atom.get_leaves(n):
            g= IMP.core.XYZRGeometry(l)
            g.set_color(color)
            g.set_name(name)
            gs.append(g)
    # also display the restraints to see which particles they connect
    for r in display_restraints:
        try:
            g= IMP.display.create_restraint_geometry(r)
            gs.append(g)
        except:
            pass
    return gs

Creating some particles

Once the particles are created, we have to add some restraints. To do this, you must choose which particles to restraint and then how to restrain them. Given that you create a restraint, initializing it with the chosen particles and then add it to the model.

import IMP.example
(m,c)=IMP.example.create_model_and_particles()

uf= IMP.core.Harmonic(0,1)
df= IMP.core.DistancePairScore(uf)
r= IMP.core.PairRestraint(df, IMP.ParticlePair(c.get_particle(0), c.get_particle(1)))
m.add_restraint(r)

Preventing collisions

The IMP::container::ClosePairsContainer maintains a list of all pairs of particles that are closer than a certain distance. The IMP::core::HarmonicLowerBound forces the spheres apart.

import IMP.example

(m,c)=IMP.example.create_model_and_particles()

# this container lists all pairs that are close at the time of evaluation
nbl= IMP.container.ClosePairContainer(c, 0,2)
h= IMP.core.HarmonicLowerBound(0,1)
sd= IMP.core.SphereDistancePairScore(h)
# use the lower bound on the inter-sphere distance to push the spheres apart
nbr= IMP.container.PairsRestraint(sd, nbl)
m.add_restraint(nbr)

# alternatively, one could just do
r = IMP.core.ExcludedVolumeRestraint(c)
m.add_restraint(r)

# get the current score
print m.evaluate(False)

Restraining bonds

Load a protein and restrain all the bonds to have the correct length. Bond angles is a bit trickier at the moment.

import IMP.atom
import IMP.container
m= IMP.Model()
prot= IMP.atom.read_pdb(IMP.atom.get_example_path("example_protein.pdb"), m)
IMP.atom.add_bonds(prot)
bds= IMP.atom.get_internal_bonds(prot)
bl= IMP.container.ListSingletonContainer(bds)
h= IMP.core.Harmonic(0,1)
bs= IMP.atom.BondSingletonScore(h)
br= IMP.container.SingletonsRestraint(bs, bl)
m.add_restraint(br)
print m.evaluate(False)

Sampling and analysis

Once we have set up our restraints, we can run a sampler to compute some good conformations. Our basic sampler is the IMP::core::MCCGSampler which uses a combination of Monte Carlo and conjugate gradients to find conformations. It then returns an object which allows one to load the saved conformations for analysis.

import IMP.example
import IMP.statistics

(m,c)=IMP.example.create_model_and_particles()
ps= IMP.core.DistancePairScore(IMP.core.HarmonicLowerBound(1,1))
r= IMP.container.PairsRestraint(ps, IMP.container.ClosePairContainer(c, 2.0))
m.add_restraint(r)
# we don't want to see lots of log messages about restraint evaluation
m.set_log_level(IMP.WARNING)

# the container (c) stores a list of particles, which are alse XYZR particles
# we can construct a list of all the decorated particles
xyzrs= c.get_particles()

s= IMP.core.MCCGSampler(m)
s.set_number_of_attempts(10)
# but we do want something to watch
s.set_log_level(IMP.TERSE)
s.set_number_of_monte_carlo_steps(10)
# find some configurations which move the particles far apart
configs= s.get_sample();
for i in range(0, configs.get_number_of_configurations()):
    configs.load_configuration(i)
    # print out the sphere containing the point set
    # - Why? - Why not?
    sphere= IMP.core.get_enclosing_sphere(xyzrs)
    print sphere

# cluster the solutions based on their coordinates
e= IMP.statistics.ConfigurationSetXYZEmbedding(configs, c)

# of course, this doesn't return anything of interest since the points are
# randomly distributed, but, again, why not?
clustering = IMP.statistics.create_lloyds_kmeans(e, 3, 1000)
for i in range(0,clustering.get_number_of_clusters()):
    # load the configuration for a central point
    configs.load_configuration(clustering.get_cluster_representative(i))
    sphere= IMP.core.get_enclosing_sphere(xyzrs)
    print sphere

Writing a simple restraint

See IMP::example::ExampleRestraint.

4. Modules

Functionality in IMP is grouped into modules, each with its own namespace (in C++) or package (in Python). For example, the functionality for IMP::core can be found like

in C++ and

IMP.core.XYZ(p)

in Python.

A module contains classes, methods and data which are related and controlled by a set of authors. The names of the authors, the license for the module, its version and an overview of the module can be found on the module main page (eg IMP::example). See the "Modules" tab above for a complete list of modules in this version of IMP.

Modules are either grouped based on types of experimental data (eg IMP::em) or based on shared functionality (IMP::core or IMP::container).

5. C++ vs Python

IMP can be used from both C++ and Python. We recommend that you:

If you are new to programming you should check out a general python introduction such as the official introduction to Python and Python 101. Users who have programmed but are not familiar with Python should take a look at Dive into Python, especially chapters 1-6, and 15-18.

While effort has been made to ensure that the interfaces are the same between the two languages, a number of differences remain due to differences in the languages and limitations of the program used to generate the connection between the two languages. Key differences are

Conventions

To ensure consistency and ease of use, certain conventions should be adhered to when writing code using or in IMP.

Measurements

Unless there is a good reason, the following units are to be used

Anything that breaks from these conventions must be labeled clearly and accompanied by an explaination of why the normal units could not be used.

Biological names

When describing biological entities, natural biological names should be used as much as possible. That means, residues should be referred to by their index in the protein (that is the residue index in the pdb), rather than their offset from the beginning of the loaded set of residues.

Passing and storing data

Values and Objects

As is conventional in C++, IMP classes are divided into two types

Python does not have this distinction.

A few classes in IMP are designed for fast, low level use. Their default constructor leaves them in an unspecified state. This is similar to the built in types in C++ (int, double). For example

      IMP::algebra::VectorD<3> v; // the vector has unknown coordinates
      std::cout << v << std::endl; // illegal
      v= IMP::algebra::VectorD<3>(0,1,2); // now we can use v

Unless the documentation says otherwise, all value class object in IMP can be compared with other equivalent objects based on their contents. Object class objects allow checking of equality to see if they are the same object (not whether two have the same state). In C++, this is done by comparing the pointers.

Standard Methods

All objects should have a const method show(std::ostream&), which writes some basic information about the object to the supplied stream. In addition, on the C++ side, all objects support standard output to stream via <<. In addition, all objects support __str__ in python so that they can be printed and displayed.

Names in IMP

RAII

RAII-style objects are a convenient way of controlling a resource. They assume "ownership" of the resource on creation and then "free" it on destruction. Examples include, using a reference counted pointer to make sure an object is destroyed when it is no longer needed

    {
      Pointer<PymolWriter> pw= new PymolWriter("afile.pym");
      // write to pw
    } // pw is deleted here

Or temporarily removing a restraint from the model

    {
      ScopedRemoveRestraint srr(new MyRestraint(), m->get_root_restraint_set());
      // optimize the "relaxed" model without the restraint
    } // restraint is automatically added back

RAII objects also help with exception safety since they guarantee that the cleanup code occurs when an exception occurs. Compare

    void transform_map(std::string in) {
      DensityMap *map= read_map(in);
      // transform the map
      write_map(map, "/unwriteable/directory/map.mrc");
      delete map;
    }

When write_map() throws an exception due to being unable to open the file for writing, the (large) block of memory allocated in map is lost. Instead, one should do:

    void transform_map(std::string in) {
      Pointer<DensityMap> map= read_map(in);
      // transform the map
      write_map(map, "/unwriteable/directory/map.mrc");
    }

So that map is always destroyed.

As pretty much any operation can throw an exception any time, one can never count on general cleanup code to excute.

7. Incremental Scoring

Scoring in IMP can be performed in two different ways,

Whole model scoring is faster when more than approximately half of the particles change each time the Model::evaluate() is called. Either one will produce the correct (and same) answer in all instances.

To set up incremental scoring call IMP::Model::set_is_incremental() with the value true. See Scoring for implementation information.

8. Graphs

Graphs in IMP are represented using the Boost Graph Library. All graphs used in IMP are VertexAndEdgeListGraphs, have vertex_name properties, are BidirectionalGraphs if they are directed.

The Boost.Graph interface cannot be easily exported to Python so we instead provide a simple wrapper IMP::PythonDirectedGraph.

9. Making IMP run faster

Sampling can often be very computationally expensive. If you computation is taking longer than you would like, the first thing you should do is to profile it. We find Shark and Instruments which are part of the Macintosh developer tools to be the best free Mac/Linux options. gprof is a free alternative on linux, but it requires a static build (and hence can't work with python) and is not so friendly to use.

Once you know where your application is spending its time, we provide a number of facilities to speed up IMP computations. These include:

Parameter tuning

Certain classes, such as IMP::container::ClosePairContainer, have parameters which influence how fast they perform. IMP::container::ClosePairContainer has a helper method, IMP::container::get_slack_estimate() which tries to figure out a good value for that slack.

Specializing for speed

The usual pattern in IMP is to plug various classes together via what is known as virtual function calls. Composing this way is very flexible, but is not necessarily very fast as the C++ compiler is not able to take advantage simplifications across function calls. To get around this, we provide some specialized classes which act as composits of other classes. For example:

C++ users can also take compose classes via templates at compile time. This is done using IMP::core::TupleRestraint, IMP::core::TupleConstraint, IMP::container::ContainerRestraint, IMP::container::ContainerConstraint. When using these, make sure you provide the actual types of the scores, modifiers and containers used (not the base classes). For example do

m->add_restraint(IMP::core::create_restraint(new IMP::core::HarmonicDistancePairScore(3, 1),
                                            IMP::ParticlePair(p0, p1)))

which is equivalent to

m->add_restraint(new IMP::core::TupleRestraint<HarmonicDistancePairScore>(new IMP::core::HarmonicDistancePairScore(3, 1),
                                                                          IMP::ParticlePair(p0, p1)))

which, in turn, is equivalent to, but faster than

m->add_restraint(new IMP::core::PairRestraint(new IMP::core::HarmonicDistancePairScore(3,1),
                                              IMP::ParticlePair(p0, p1)));

10. Reporting bugs

While we strive for perfection, we, lamentably, slip up from time to time. If you find a bug in IMP, please report it on the IMP bug tracker. This will ensure it does not get lost. The best way to report a bug is to provide a short script file that demonstrates the problem.

11. Where to go next

Instructions on how to build and install IMP can be found in the installation instructions.

There are a few areas of core functionality that have already been mentioned.

Then look through the examples which can be found linked from the page of each module.

There are a variety of useful base classes which are used to provide most functionality. They are:

There are a few blocks of functionality that cut across modules. They include

When programming with IMP, one of the more useful pages is the modules list.

For general help, you can use the imp-users mailing list.


Generated on Fri Feb 10 2012 23:36:19 for IMP by doxygen 1.7.5.1