Ambiguity in IMP

On this page we'll address the elusive notion of ambiguity.

What is ambiguity

Several distinct types of ambiguities can be discriminated, that are usually mixed with one another during a modelling process. Three major cases are detailed hereafter.

Ambiguous identities

Experimental data usually provide knowledge on protein types, whereas when building a model, one is ultimately concerned with "instances" of proteins. For example, a complex may be composed of two proteins of type A. When building a model, one will ultimately distinguish both instances A1 and A2 of protein type A.

One part of the model building process consists in converting (or interpreting) "type-based" data coming from experimental sources into a 3D "instance-based" description.

Ambiguous connectivity mechanism

Certain experiments tell you that two proteins interact, but not how. For example, you could know that two proteins touch, but not what the interface is which they touch on.

Ambiguous connectivity graph

Some experimental techniques (such as affinity purification or pull-outs) provide informations on interconnectivity between a set of proteins. In such cases, the only information that is accessible is that these proteins are somehow connected with one another, but the exact arrangement between proteins is unknown.

Handle ambiguity when modelling

Methods to handle ambiguities have been proposed in the following publications :

How is ambiguity handled in IMP ?

IMP provides some tools to help one handling ambiguities:

In all of these, certain edges can be explicitly disallowed by dropping them from the candidate set or setting their cost to infinity.

Towards better handling of ambiguous data

Cases where we would like a better tree to be computed:

  1. we would like to enforce that a graph is connected
  2. we would like to be able to generate multiple equivalent graphs in the case of symmetry
  3. we would like to be able to limit stoichiometry in the graph?

The first is relatively easy. Instead of computing the MST, one simply grows a tree from each of the copies in a particular class by adding edges (if desired) in sorted order and takes the cheapest tree. The score will be 0 if and only if the relevant things are connected.

The second can be done if one of the types has the same number of copies as the number of trees. Then you grow a tree from each copy. The cost of an edge is given by the cost of the minimum matching between the copies of the endpoints in the tree with any of the copies of the appropriate type not in the tree.

The last is hard and seems to require heuristic search (eg A*).

Ref: Presentation on mass spec

IMP: Ambiguity (last edited 2011-03-10 00:09:51 by ArgyrisPolitis)