IMP logo
Classes
IMP::parallel Namespace Reference

Detailed Description

Support for parallel tasks that communicate infrequently.

This module allows for IMP tasks to be distributed to multiple processors or machines. It employs a master-slave model; the main (master) IMP process sends the tasks out to one or more slaves. Tasks cannot communicate with each other, but return results to the master. The master can then start new tasks, possibly using results returned from completed tasks. The system is fault tolerant; if a slave fails, any tasks running on that slave are automatically moved to another slave.

To use the module, first create a Manager object. Add one or more slaves to the Manager using its add_slave() method (example slaves are LocalSlave, which simply starts another IMP process on the same machine as the master, and SGEQsubSlaveArray, which starts an array of multiple slaves on a Sun GridEngine cluster). Next, call Manager::get_context() method, which creates and returns a new Context object. Add tasks to the Context with the Context::add_task() method (each task is simply a Python function or other callable object). Finally, call Context::get_results_unordered() to send the tasks out to the slaves (a slave only runs a single task at a time; if there are more tasks than slaves later tasks will be queued until a slave is done with an earlier task). This method returns the results from each task as it completes.

Setup in IMP is often expensive, and thus the Manager::get_context() method allows you to specify a Python function or other callable object to do any setup for the tasks. This function will be run on the slave before any tasks from that context are started (the return values from this function are passed to the task functions). If multiple tasks from the same context are run on the same slave, the setup function is only called once.

Troubleshooting

Several common problems with this module are described below, together with solutions.

Examples:

Author(s): Ben Webb

Version: SVN.r14091

License: LGPL. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

Classes

class  Context
 A collection of tasks that run in the same environment. More...
class  Error
 Base class for all errors specific to the parallel module. More...
class  LocalSlave
 A slave running on the same machine as the master. More...
class  Manager
 Manages slaves and contexts. More...
class  NetworkError
 Error raised if a problem occurs with the network. More...
class  NoMoreSlavesError
 Error raised if all slaves failed, so tasks cannot be run. More...
class  RemoteError
 Error raised if a slave has an unhandled exception. More...
class  SGEPESlaveArray
 An array of slaves in a Sun Grid Engine system parallel environment. More...
class  SGEQsubSlaveArray
 An array of slaves on a Sun Grid Engine system, started with 'qsub'. More...
class  Slave
 Representation of a single slave. More...
class  SlaveArray
 Representation of an array of slaves. More...

Generated on Tue May 22 2012 23:33:37 for IMP by doxygen 1.8.1