Desired functionality
- Input of forcefield data from common file formats (e.g. CHARMM)
- Patching; methylation, etc.; disulfide bridges; addition/removal of atoms, residues, chains (e.g. grand canonical MC)
- Population of IMP bond, angle, dihedral decorators (for nonbond exclusions)
- Extensible support for non-standard residues (e.g. ligands, PDB HETATM residues)
- Use of internal coordinates/Z matrices to generate starting coordinates for a set of connected residues (e.g. polypeptide extended chain)
- Nonbonded interactions: soft sphere (basically already done), Lennard Jones, Coulomb
- Statistical nonbonded potentials (e.g. DOPE): could be treated as a "mini forcefield" or implemented separately
- Conversion between PDB, force field and internal atom/residue names
- Handling of coordinates missing from PDBs - e.g. construction of hydrogen positions from internal coordinates
- Support for a superset of the residues in PDB - e.g. different protonation states
- Possible to use forcefields for only part of the model, or different forcefields for different sets of particles (e.g. atomistic CHARMM for one protein, coarse grained approach for another)
- Different forcefields have different representations - e.g. a CHARMM heavy-atom-only forcefield or even a CA-only. (The topology implied by the forcefield is therefore different.)
- Should be possible to do limited tweaking of the model on a per-atom basis (e.g. changing the radius or charge of a single atom)
Existing Modeller approach
- Reads in CHARMM format top.lib and par.lib and stores the information in memory (essentially a direct translation of the files - little processing is done).
- top.lib is modified from standard CHARMM to use IUPAC atom names (Modeller does not convert them internally, although it used to).
- par.lib is essentially unmodified (some minor tweaks to force constants here and there).
- CHARMM top.lib IC records used to construct coordinates from primary sequence (extended chain when starting from scratch, loop conformations, addition of missing atoms such as hydrogens).
- Each atom has integer CHARMM and DOPE types; these are populated by lookup of the CHARMM atom type (e.g. CT1) and atom name/residue name pair (e.g. CA/ALA) respectively.
- Nonbonded interactions (soft sphere, LJ, Coulomb, DOPE) are evaluated by enumerating all nonbonded pairs, looking up the atom types, extracting the parameters from the force field, and adding the score to the nonbonded restraint.
- Bonded interactions are built once as a static list of restraints. All of the bond/angle/dihedrals are enumerated, the atom types are looked up in the forcefield, and a restraint is created. For example, a harmonic restraint is generated for each CHARMM bond.
Proposed IMP solution
- Force field class reads in CHARMM forcefield (top.lib and par.lib; not using CHARMM API - parsing the very simple file format ourselves)
- It holds a container of IMP particles; when particles are added to the forcefield container they are assigned an atom type by lookup of existing atom/residue names, and radius and charge attributes (assigned from the force field); bond/angle/dihedral decorators are added.
- atom types are an implementation detail, should be considered opaque; could be simply integer indices into the forcefield arrays.
- radius and charge are stored in the atoms, not atom types, so that they can be tweaked on a per-atom basis (e.g. sidechain modeling, polarizable forcefields).
- Stereochemistry restraint takes a forcefield and a particle container (usually the same container owned by the ff); at evaluate time it enumerates the bonds/angles/dihedrals, looks up the atom types and calculates FF scoring terms. Parameters are looked up in the force field, not in the decorators (since there are a variety of functional forms for a typical forcefield).
Lennard Jones PairScore takes a forcefield, and looks up the atom types at evaluate time.
- CHARMMForceField class could provide convenience factory functions to create restraint objects linked to itself.
DOPE PairScore is a subclass of the existing TypedPairScore, and does its own conversion of atom/residue names to internal atom types.DSR: probably better to just have a function which adds the DOPE pairs to a TypedPairScore. That makes it easier if you want to extend DOPE (and if we don't support extending DOPE, it should not subsclass TypedPairScore).
Force field class has a method to generate xyz coordinates from ICs for a list of particles.DSR: I don't see why this has anything to do with the forcefield.
PDB reader can be given a force field object and will try to impose the FF's topology and atom types as it reads the file (e.g. excluding atoms not in the topology).DSR: The two bits should be separate: a PDB reader which creates a MolecularHierarchy and a function which processes a MolecularHierarchy for a forcefield. Otherwise you have problems if you create things internally.