Modeller logo

Tutorial


Advanced example:
Modeling of a protein-ligand complex based on multiple templates and users specified restraints

All input and output files for this example are available to download, in either zip format (for Windows) or .tar.gz format (for Unix/Linux).

An important aim of modeling is to contribute to understanding of the function of the modeled protein. Inspection of the 4mdh:A template structure (built in the basic modeling tutorial) revealed that loop 93-100, one of the functionally most important part of the enzyme, is more disordered than the rest of the protein. The long active site loop appears to be flexible in the absence of a ligand and could not be seen well in the diffraction map. The unreliability of the template coordinates and the inability of MODELLER to model long insertions is why this loop was poorly modeled in TvLDH, as indicated by PROSAII.

PROSAII profile for model TvLDH.B99990001

PROSAII profile for model TvLDH.B99990001

Since we are interested in understanding differences in specificity between two similar proteins, we need to build precise and accurate models. Therefore, we need to search for another template malate dehydrogenase structure, which may have a lower overall sequence similarity to TvLDH, but a better resolved active site loop. The old and new templates can then be used together to get a model of TvLDH. The active site loop tends to be more defined if the structure is solved together with its physiological ligand and a co-factor. The model based on a template with ligands bound is also expected to be more relevant for the purposes of our study of enzymatic specificity, especially if we also build the model with the ligands.

1emd, a malate dehydrogenase from E. coli, was identified in PDB. While the 1emd sequence shares only 32% sequence identity with TvLDH, the active site loop and its environment are more conserved. The loop in the 1emd structure is well resolved. Moreover, 1emd was solved in the presence of a citrate substrate analog and the NADH cofactor. The new alignment in the PAP format is shown below (file `TvLDH-4mdh-1emd_ed.pap').

 _aln.pos         10        20        30        40        50        60
1emd_ed   -------------------------------------------------------------------- 
4mdhA     -SEPIRVLVTGAAGQIAYSLLYSIGNGSVFGKDQPIILVLLDITPMMGVLDGVLMELQDCALPLLKDV 
TvLDH     MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF 
 _consrvd


 _aln.p   70        80        90       100       110       120       130
1emd_ed   ------------------SAGVRRKPGMDRSDLFNV--------------NAGI-------------- 
4mdhA     IATDKEEIAFKDLDVAILVGSM--------------PRRDGMERKDLLKANVKIFKCQGAALDKYAKK 
TvLDH     VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISS--------------NSVIFKNTGEYLSKWAKP 
 _consrvd                                                   *  *


 _aln.pos  140       150       160       170       180       190       200
1emd_ed   -------------------------------------------------------------------- 
4mdhA     SVKVIVVGNPANTNCLTASKSAPSIPKENFSCLTRLDHNRAKAQIALKLGVTSDDVKNVIIWGNHSST 
TvLDH     SVKVLVIGNPDNTNCEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGES 
 _consrvd


 _aln.pos    210       220       230       240       250       260       270
1emd_ed   -------------------------------------------------------------------- 
4mdhA     QYPDVNHAKVKLQAKEVGVYEAVKDDSWLKGEFITTVQQRGAAVIKARKLSSAMSAAKAICDHVRDIW 
TvLDH     MVADLTQATFTKEGKTQKVVDVLDHD-YVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWL 
 _consrvd


 _aln.pos      280       290       300       310       320       330       340
1emd_ed   -------------------------------------------------------------------- 
4mdhA     FGTPEGEFVSMGIISD-GNSYGVPDDLLYSFPVTIK-DKTWKIVEGLPINDFSREKMDLTAKELAEEK 
TvLDH     FGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEK 
 _consrvd


 _aln.pos        350       360       370       380       390       400
1emd_ed   ----------VKNLVQQVAKTCPKACIGIITNPVNTTVAIAAEVLKKAGVYDKNKLFGVTTLDIIRSN 
4mdhA     ETAFEFLSSA---------------------------------------------------------- 
TvLDH     EIALNHLAQ----------------------------------------------------------- 
 _consrvd


 _aln.p  410       420       430       440       450       460       470
1emd_ed   TFVAELKGKQPGEVEVPVIGGHSGVTILPLLSQVPGVSFTEQEVADLTKRIQNAGTEVVEAKAGGGSA 
4mdhA     -------------------------------------------------------------------- 
TvLDH     -------------------------------------------------------------------- 
 _consrvd


 _aln.pos  480       490       500       510       520       530       540
1emd_ed   TLSMGQAAARFGLSLVRALQGEQGVVECAYVEGDGQYARFFSQPLLLGKNGVEERKSIGTLSAFEQNA 
4mdhA     -------------------------------------------------------------------- 
TvLDH     -------------------------------------------------------------------- 
 _consrvd


 _aln.pos    550       560
1emd_ed   LEGMLDTLKKDIALGQEFVNK/-.. 
4mdhA     ---------------------/.-- 
TvLDH     ---------------------/..- 
 _consrvd

File: TvLDH-4mdh-1emd_ed.pap

The modified alignment refers to an edited 1emd structure (1emd_ed), as a second template. The alignment corresponds to a model that is based on 1emd_ed in its active site loop and on 4mdh:A in the rest of the fold. Four residues on both sides of the active site loop are aligned with both templates to ensure that the loop has a good orientation relative to the rest of the model.

The modeling script below has several changes with respect to `model-single.top'. First, the name of the alignment file assigned to ALNFILE is updated. Next, the variable KNOWNS is redefined to include both templates. Another change is an addition of the `SET HETATM_IO = ON' command to allow reading of the non-standard pyruvate and NADH residues from the input PDB files. The script is shown next (file `model-multiple-hetero.top').

INCLUDE
SET ALNFILE = 'TvLDH-4mdh-1emd_ed.ali'
SET KNOWNS = '4mdhA' '1emd_ed'
SET SEQUENCE = 'TvLDH'
SET STARTING_MODEL = 1
SET ENDING_MODEL = 5
SET HETATM_IO = ON
CALL ROUTINE = 'model'

SUBROUTINE ROUTINE = 'special_restraints'
	ADD_RESTRAINT ATOM_IDS =  'NH1:161' 'O1A:334',;
            RESTRAINT_PARAMETERS = 2 1 1 22 2 2 0 3.5 0.1
	ADD_RESTRAINT ATOM_IDS =  'NH2:161' 'O1B:334',;
            RESTRAINT_PARAMETERS = 2 1 1 22 2 2 0 3.5 0.1
	ADD_RESTRAINT ATOM_IDS =  'NE2:186' 'O2:334',;
            RESTRAINT_PARAMETERS = 2 1 1 22 2 2 0 3.5 0.1
RETURN
END_SUBROUTINE

File: model-multiple-hetero.top

A ligand can be included in a model in two ways by MODELLER. The first case corresponds to the ligand that is not present in the template structure, but is defined in the MODELLER residue topology library. Such ligands include water molecules, metal ions, nucleotides, heme groups, and many other ligands (see question 17 in the the MODELLER FAQ). This situation is not explored further here. The second case corresponds to the ligand that is already present in the template structure. We can assume either that the ligand interacts similarly with the target and the template, in which case we can rely on MODELLER to extract and satisfy distance restraints automatically, or that the relative orientation is not necessarily conserved, in which case the user needs to supply restraints on the relative orientation of the ligand and the target (the conformation of the ligand is assumed to be rigid). The two cases are illustrated by the NADH cofactor and pyruvate modeling, respectively. Both NADH and cofactor are indicated by the `.' characters at the end of each sequence in the alignment file above (the `/' character indicates a chain break). In general, the `.' character in MODELLER indicates an arbitrary generic residue called a ``block'' residue (for details see the section on block residues in the MODELLER manual). The 1emd structure file contains a citrate substrate analog. To obtain a model with pyruvate, the physiological substrate of TvLDH, we convert the citrate analog in 1emd into pyruvate by deleting the group CH(COOH)2, thus obtaining the 1emd_ed template file. A major advantage of using the `.' characters is that it is not necessary to define the residue topology.

To obtain the restraints on pyruvate, we first superpose the structures of several LDH and MDH enzymes solved with ligands. Such a comparison allows to identify absolutely conserved electrostatic interactions involving catalytic residues Arg161 and His186 on one hand, and the oxo groups of the lactate and malate ligands on the other hand. The modeling script can now be expanded by appending a routine that specifies the user defined distance restraints between the conserved atoms of the active site residues and their substrate.

The ADD_RESTRAINT command has two arguments. ATOM_IDS defines the restrained atoms, by specifying their atom types and the residue numbers as listed in the model coordinate file. RESTRAINT_PARAMETERS defines the restraints, by specifying the mathematical form (e.g., harmonic, cosine, cubic spline), modality, the type of the restrained feature (e.g., distance, angle, dihedral angle), the number of atoms in the restraint, and the restraint parameters. In this case, a harmonic upper bound restraint of 3.5±0.1 is imposed on the distances between the specified pairs of atoms. A trick is used to prevent MODELLER from automatically calculating distance restraints on the pyruvate-TvLDH complex; the ligand in the 1emd_ed template is moved beyond the upper bound on the ligand-protein distance restraints (i.e., 10).

The new script produces a model with a significantly improved PROSAII profile. The predicted error in the 90-100 active site loop is much less and practically resolved in the loop region 220-250.

PROSAII profile for model TvLDH.B99990022

PROSAII profile for model TvLDH.B99990022

The overall Z-score is improved from -10.7 to -11.7, which compares well with the template Z-score of -12.7. With this favorable evaluation, we gain confidence in the final model. The model was used for interpreting site-directed mutagenesis experiments aimed at elucidating the determinants of enzyme specificity in this class of enzymes.