Domino 2
This page is a discussion of proposed devisions to domino. They are in 3 main areas.
Efficient evaluation of score for using normal optimizers
Dave wants to using MD or some other "normal" IMP optimizer to do local sampling within the sets in the junction tree.
problem: only some particles changed, but many evaluate calls are made with the same set of changing particles. Using normal evaluation would result in recomputing the scores of the unmoved bits many times. Incremental evaluate would help, but it would involve subtracting off the old score each time.
proposed solution: add a method get_dependencies(Particles) to model which returns all the restraints which depend on a set of particles and then set all the rest of 0 weight (restraints with weight 0 are not evaluated). The only real change needed is to make it easy to set all other other restraints to have weight 0. One solution would be to add a per-restraint weight. Have to work out how that interacts with RestraintSets, but wouldn't be bad.
The nonbonded computations could be sped up by
- either only patching the list when the set of moved particles is small
- or, more simply, replacing it with a static list (since things aren't allowed to move too far).
Neither is hard, but the second requires no work, so probably should be tried first.
Make Domino simpler for a naive user to figure out
- Make domino a sampler which produces a configurationset
- automatically build dependency graph and call the external program to generate the junction tree. Perhaps need API to exclude certain interactions (or just get in the habit of replacing nbl with static list)
- don't use separate name for particle in domino, just use pointer value and translate on I/O
- defining the state space for a particle
- sampler-sampling space abstraction: get number of states for particle p, put particle p in state i
- make the code to replace how the set of states for a node is defined optionally replacable. That way Dave can replace it for doing MD based sampling, and one can replace it when a permutation is called for, but one doesn't have to deal with it. Proposal: have the sampler store a mapping of Particle to state space, then multiple particles can share a state space, for example for a permutation-based sampling. The two abstractions above would define two ABCs.
Simplified example. Does this detail all the custimization points we would desire?
import IMP
import IMP.domino2
import IMP.core
NUM_PARTICLES=6
NUM_STATES=40
MAX_SCORE=1
#### Example main code ####
mdl=IMP.Model()
#1. set up the particles
print "setting up particles"
ps=IMP.Particles()
for i in range(NUM_PARTICLES):
p=IMP.Particle(mdl)
IMP.core.XYZR.setup_particle(p,IMP.algebra.Sphere3D(IMP.algebra.Vector3D(0.,0.,0.),1.))
ps.append(p)
#3. add restraints (defining the scoring function)
print "setting up restraints"
sf = IMP.core.Harmonic(1.0, 0.5)
for pair in [[0,1],[0,2],[1,2],[2,3],[3,4],[4,5],[3,5]]:
r=IMP.core.DistanceRestraint(sf, ps[pair[0]], ps[pair[1]])
mdl.add_restraint(r)
#5. optimize
print "optimizing"
d_opt = IMP.domino2.DominoSampler(mdl)
#2. set up the discrete set of states
print "setting up a discrete set of states"
bb=IMP.algebra.BoundingBox3D(IMP.algebra.Vector3D(-10.,-10.,-10.),
IMP.algebra.Vector3D(10.,10.,10.))
# could provide keys and upper and lower corner instead when creating InBoundingBox
# could replace the object the DominoSampler uses to generate the table for a node to
# for example, allow multiple particles to be assigned to the same state
# also could replace the evaluator to do something other than call model evaluate
d_sampler.set_sampling(ps, IMP.domino.InBoundingBox(ps, NUM_STATES, bb))
d_sampler.set_maximum_score(MAX_SCORE)
sol_set=d_opt.get_sample()
#6. get results
for i in range(0,sol_set.get_number_of_solutions()):
sol_set.set_configuration(i)
print "solution with score:", mdl.evaluate(False)Preposed domino2 module public API (skipping implementations of the various ABCs). Does this API provide enough flexibility to support all usages of domino that are desired?
/** Handle the states for a particular particle (or "class" of
particles. For example a state enumerator class could take
a bounding box and a number,n, and generate n points in the
bounding box. Then the get_number function woudld return
n and update_to_state would modify the particle to have the
coordiantes for state i.
*/
class StateEnumerator: public Object {
public:
virtual unsigned int get_number_of_states(Particle*) const=0;
virtual void update_to_state(Particle *, unsigned int) const=0;
};
/** The set of particles defining a node in the junction tree. A
SingletonContainer is used so that the pointer value uniquely
identifies a node (and the container has a name so that
it can be nicely written for display).
*/
typedef SingletonContainer Subset;
/** Store the association between particles and the classes
which manage their states. I'm not a huge fan of having
this class, but I haven't thought of a better way to store
the information that is easily exposed to python
and gets to all the right places. It is initialized internally
in the DominoSampler
*/
class StateEnumeratorTable: public Object {
std::map<Particle*, internal::OwnerPointer<StateEnumerator> > enumerators_;
public:
// implementation methods use this to get the enumerator
StateEnumerator* get_enumerator(Particle *) const;
};
/** Enumerate the states of a particular subset. Straight forward examples
would just return the product of the number of states returned by
the StateEnumerator for each of the particles, while a permutation
based one would have methods to define equivalency sets and only return
permutations of the states of these sets.
The default one might look at the StateEnumerators and treat particles
with the same state enumerator as an equivalency set and only return
permutations for them. Not sure.
*/
class SubsetStateEnumerator: public Object {
Pointer<StateEnumeratorTable> table_;
public:
void set_enumerator_table(StateEnumeratorTable *table){ table_=table;}
virtual unsigned int get_number_of_states(Subset*) const=0;
virtual Ints get_state(Subset*, unsigned int i) const=0;
};
/** Return the score for a state defined by the subset of particles in
the given enumerated states. set_enumerator_table() is called when
the evaluator is handed to Domino. set_current_subset is called
at the start of the evaluation of a subset.
*/
class Evaluator: public Object {
Pointer<StateEnumeratorTable> table_;
Pointer<Subset> subset_;
public:
Subset *get_current_subset() const;
StateEnumeratorTable* get_enumerator_table() const;
void set_enumerator_table(StateEnumeratorTable*);
virtual void set_current_subset(Subset*);
virtual double get_score(const Ints& state) const=0;
};
//! A simple sampler.
class IMPCOREEXPORT DominoSampler : public Sampler
{
internal::OwnerPointer<StateEnumeratorTable> enumerators_;
internal::OwnerPointer<CollectiveStateEnumerator> node_enumerator_;
public:
DominoSampler(Model *m);
// use these functions to set up the state space for the particles
void set_state_enumerator(Particle *p, StateEnumerator *se);
/** Advanced. Default values are provided, you only need to replace these
if you want to do something special.
*/
void set_evaluator(Evaluator *eval);
void set_subset_state_enumerator(SubsetStateEnumerator *cse);
IMP_SAMPLER(DominoSampler);
};The DominoSampler::do_sample method would look something like this internally
- extract dependency graph from the model
- write out the graph to a file
- run the junction tree java app on the file
- read in results to define the Subsets
- for each subset
call SubsetStateEnumerator::get_number_of_states()
- for each of these states
- get the state
- call Evaluate::get_score() on the state
- propagate up tree making sure sets of states overlap
build final solutions into ConfigurationSet using StateEnumeratorTable
return ConfigurationSet
Initially, this functionality would be implemented using existing domino classes, when possible. As we discovered ways to make it more efficient, we would replace existing domino implementation with implementation in domino2.
Make Domino faster
mapped sampler just takes has add_state(Floats) and takes the FloatKeys and Particles in its constructor
- permutation sampler is similar
Allow Domino to support "interesting" table generation methods
Dave would then implement the follow for his work (plus other things in helping to craft the new domino. First, a class for storing the shared data which both his Evaluator and his SubsetStateEnumerator which have access to
class DavesData {
struct SubsetData {
struct StateData {
double score;
int config_set_index;
};
std::map<Ints, StateData > states;
ConfigurationSet* configurations;
};
std::map<Subset*, SubsetData> map;
};Then the three functions would be needed
DavesSubstateStateEnumerator::get_number_of_states():
- run md,
foreach md frame, figure out which state it is and if it is new or lower score store things in map[subset]->states[state]
return map[subset]->states.size()
DavesSubsteStateEnumerator::get_state()
return entry in map[subset]->(states.begin()+i)->first
DavesEvaluator::get_score():
return map[subset]->[state].score
To run the MD:
- set irrelevant restraints to 0 (need access to dependency graph, Daniel will provide nice access)
makes sure frames are stored (Dave can implement OptimizerState to add to a ConfigurationSet every k frames)