Good Practices When Using IMP
IMP is a fairly complicated piece of software to use, so it is important that you follow various good practices when using it in order to reduce the chance of important bugs making their way into your programs and to make it easier to track down the source of problems when they do arise. These suggestions are divided into two, no very well separated groups based on whether they are simply good programming practices or are somewhat IMP-specific as well as a list of things to do when you find a problem.
General good practices
- delegate: look for existing code which already does what you want to do. The more people who use some code, the fewer bugs it will have, mostly.
- document: comment and have a readme with your code which at least says what it does and how to run it. Someone else will look at it later, if your are lucky (or you will use it much later). Similarly, provide examples for anything it took work for you to figure out. Submit them to IMP.
- simplify: keep everything as simple as possible by having each piece of code have a single clear purpose. This makes it easier to check for correctness and to understand.
- orthogonalize: make each piece of code as independent as possible. This result in simpler, easier to test and understand code.
- reduce: pull any shared functionality should be into a new object or function. This way you only have to debug the shared functionality once and fix each bug in one place. You will forgot to fix some copies of the bug otherwise. Ideally, you should never write the same bit of code twice. Never copy an existing one of your functions, paste it somewhere and modify it slightly.
- as a special case, when writing C++ code, use RAII (look it up if you don't know what it means) to control all resources (such as memory allocated with new) to make sure it is cleaned up properly. That is, whenever you call new, hand off the memory to an IMP::Pointer, boost::scoped_pointer, boost::scoped_array etc. Even better, don't use new and use a std::vector instead of an array (or a boost::array).
- organize: use the filesystem to organize things by having data, scripts, results etc directories. Perhaps just copy the IMP conventions as they are pretty ok.
- be consistent: use consistent naming schemes for classes, variables and function. Preferably the IMP naming scheme. This helps make it clear what data is used as input and what is modified (eg a function called get_x() should not modify anything).
- track: put everything (scripts, data, output) in version control (eg svn) and check them in frequently.
- leverage: take advantage of all the tools out there designed to make your life easier. In C++
- turn on all warnings (-Wall) when compiling and inspect each of them
- learn to use gdb (break IMP::internal::assert_fail) and valgrind (valgrind --gen-suppressions=yes --db-attach=yes --undef-value-errors=no --suppressions=/flute1/home/drussel/src/IMP/svn/svn/tools/valgrind-python.supp)
avoid undocumented behavior: if the documentation or the name of the function does not explicitly say that it does something, don't assume that it always will. Eg, the documentation says that anything derived from IMP.Object (eg particles) can be compared against one another. But it doesn't say what the order of these comparisons is. So don't assume that p0 < p1 just because p0 was created before p1. If you are unsure of behavior or want to count on undocumented behavior, ask on one of the IMP list.
- ask: someone else probably knows how to do what you are trying to figure out and a third person is probably also interested in how to do it. So ask, preferably by posting to the imp-users list so that everyone can read the answer, but in person if more efficient.
- refactor: when you need to make changes, first rearrange the working code to make the changes fit in easily, then verify that the code still works, check it in to svn and only then add the new code. By dividing the task in two, you make it much easier to avoid errors.
- push work down: use outside in design by first writing the outermost bit of code (the test, usage or example code) and using that to specify the interface. Then work to implement the functionality needed, specifying the next layer of functionality in the process. For example, if we want to add force fields to IMP, we want to be able to write something like
restraint=create_force_field(molecules)
- where molecules is a set of hierarchies. So we write some code which sets up a model and loads some molecules and calls the create_force_field function. We then go about implementing that function by writing
- Now go about implementing each of load_data, get_bonds etc.
- test: write test cases to verify any behavior that is important to you and make sure things behave appropriately. The test cases should be simple and run quickly, otherwise you won’t check them too often.
- disambiguate: make your code as unambiguous as possible. eg, make sure that all the parameters of a function are unambiguous and that one can’t forget and exchange the order of two of them without something complaining. To do this, make sure that either the type (eg a number vs a restraint) is unique or use python named parameters.
- ratchet: whenever you fix a bug or a mistake, add a check,(or ask the developer of the functionality to add a check) to make sure the same bug can not happen again. These can take the form of test cases (aka regression tests) or simple runtime checks added to the code.
IMP-specific
- make it easy to dynamically simplify your code
- support using very coarse grained representation without major rearrangements of your code. Running with fewer particles makes it much easier to figure out what is going wrong. To do this, make sure there is a level of indirection such as IMP::atom::Selection used when selecting the particles that each restraint acts on. Likewise, use IMP::atom::create_simplified_along_backbone() to allow one to easily change how many particles are used to represent a structure.
- make sure it is trivial to ignore part of your biological system for a while so that you can focus on a misbehaving part. To do this you have to make sure that molecules are referred to in as few places as possible
- always run each script with checks turned on each time things (IMP, data, compiler, or your script) change. We put lots of effort into catching common usage errors with runtime checks. To turn checks on use either a debug or release build (debug mode, or release mode with IMP.set_checks_level() set to IMP.USAGE_AND_INTERNAL).
- run final scripts against a fast build of IMP. A fast build can be one or more orders of magnitude faster than a release build even with checks and logging disabled.
- use visual debugging: make use of the IMP.display module to display all intermediate structures and restraints graphically. It is often trivial to spot problems via visual inspection that are very hard to spot in text output. In addition, it is important to see what IMP sees for the representation, and not assume that what IMP sees matches what you believe it should see.
- every once in a while, inspect all of the solutions by hand (or at least a random sample) to make sure that they are consistent with your expectations
- write functions to analyze the running and results of your scripts. If you need access to some sort of data from IMP you can't easily extract, or don't know how to implement some analysis, ask. Others probably want the same thing.
- use logging: all objects derived from IMP.Object have a name which is used to make the logging messages clearer. Make sure the names make sense to you, the default ones aren't always very informative. If you produce lots of output, make sure you read it all so that warnings don't get lost. Or send stderr and stdcout to different files. You can change the log level for individual objects so that you can, for example have a single restraint on verbose while the model as a while is silent.
- Keep track of the last working SVN version of IMP tested agains: have a file in the repository that keeps track of the version of IMP used with that version of the scripts (eg by dumping the "Configuring" lines from the IMP build output).
- report difficulties: if you find it hard to figure something out, alert the person responsible so they can try to make it easier. Even better, provide an example which shows how to do it so that others can benifit from your effort
- be aware of the IMP coding standards as they help make it clear what various functions do and how they do it
Tricks
- start from existing examples and modify them gradually to have the functionality you need. That way you can always stay close to working code.
When you find a (potential) problem
- run with all checks turned on to see if there is a usage error that IMP can catch
- run gdb/valgrind to isolate the problem (a stack trace is way more useful than a python exception printout)
- keep track of what IMP version you used and what type of machine (or in the lab, what machine name) you encountered it on 1.simplify your code to provide a fast, random number and parameter free block of code which reproduces the problem. In addition to making life much easier for the person fixing the code, the simplification process often reveals problems in the code calling IMP.
- send it to the IMP list