Tutorial
Iterative example:
The alignment-modeling-evaluation cycle. The case of the
Haloferax volcanii dihydrofolate reductase.
All input and output files for this example are available to download,
in either zip format (for Windows) or
.tar.gz format (for Unix/Linux).
Several structures of dihydrofolate reductase (DHFR) are known. However,
the structure of DHFR from Haloferax volcanii was not known and its
sequence identity with DHFRs of known structure is rather low at ~30%. A model
of H. volcanii DHFR (HVDFR) was constructed before the experimental
structure was solved. This example illustrates the power of the iterative
alignment-modeling-evaluation approach to comparative modeling.
Of all the available DHFR structures, HVDHFR has the sequence most
similar to DHFR from E. coli. The PDB entry
4dfr corresponds to a high resolution (1.7Å)
E. coli DHFR structure. It contains two copies of the molecule,
named chain A and chain B. According to the authors, the structure for chain
B is of better quality than that of chain A. The following script file aligns
HVDFR and chain B of 4dfr.
from modeller import *
env = environ()
mdl = model(env, file='4dfr.pdb', model_segment=('FIRST:B', 'LAST:B'))
aln = alignment(env)
aln.append_model(mdl, align_codes='4dfr')
aln.append(file='hvdfr.seq', align_codes='hvdfr')
aln.align2d()
aln.write(file='hvdfr-4dfr.ali')
aln.write(file='hvdfr-4dfr.pap', alignment_format='PAP',
alignment_features='INDICES HELIX BETA')
File: align2d-4.py
Some options used in this example include model_segment,
which is used to indicate chain B of 4dfr; and
alignment_features, which is used to output information
such as secondary structure to the alignment file in the PAP format.
_aln.pos 10 20 30 40 50 60
4dfr -MISLIAALAVDRVIGMENAMPW-NLPADLAWFKRNTLDKPVIMGRHTWESIGRPLPGRKNIILSSQ-
hvdfr MELVSVAALAENRVIGRDGELPWPSIPADKKQYRSRIADDPVVLGRTTFESMRDDLPGSAQIVMSRSE
_helix 999999999999 999999999
_beta 9999999999 999999 99999999
_aln.p 70 80 90 100 110 120 130
4dfr -PGTDDRVTWVKSVDEAIAACG--DVPEIMVIGGGRVYEQFLPKAQKLYLTHIDAEVEGDTHFPDYEP
hvdfr RSFSVDTAHRAASVEEAVDIAASLDAETAYVIGGAAIYALFQPHLDRMVLSRVPGEYEGDTYYPEWDA
_helix 9999999999 99999999
_beta 99999 9999999 999999999
_aln.pos 140 150 160
4dfr DDWESVFSEFHDADAQNSHSYCFKILERR
hvdfr AEWELDAETDHEG---FTLQEWVRSASSR
_helix
_beta 999999999999 999999999999
File: hvdfr-4dfr.pap
Using the PIR alignment file hvdfr-4dfr.ali,
an initial model is calculated:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='hvdfr-4dfr.ali',
knowns='4dfr', sequence='hvdfr')
a.starting_model = 1
a.ending_model = 1
a.make()
File: model4.py
Because the sequence identity between 4dfr and
HVDFR is relatively low (30%), the automated alignment is likely to contain
errors. The evaluation of the model with the DOPE potential in MODELLER
shows two regions with positive energy.
DOPE score profile for model inihvdfr.B99990001.pdb
The first region is around residue 85, the second region is at the
C-terminal end of the protein. Referring to the target--template alignment
above (hvdfr-4dfr.pap), it is easy to understand
why the first positive peak appears. The insertion between position 85 and
88 of the alignment was placed in the middle of an α-helix in the
template (the "9" characters on the first line below the sequence
mark the helices). Moving the insertion to the end of the α-helix may
improve the model.
The second problem, which occurs in the C-terminal region of the alignment,
is less clear. The deletion in that region of the alignment corresponds to the
loop between the last two β-strands of 4dfr
(a β-hairpin). Since the profile suggests that this region is in error,
an alternative alignment should be tried. One possibility is that the deletion
is actually longer, making the C-terminal β-hairpin shorter in HVDFR.
One plausible alignment based on these considerations is shown here.
_aln.pos 10 20 30 40 50 60
4dfr M-ISLIAALAVDRVIGMENAMPW-NLPADLAWFKRNTLDKPVIMGRHTWESIGRPLPGRK
hvdfr MELVSVAALAENRVIGRDGELPWPSIPADKKQYRSRIADDPVVLGRTTFESMRDDLPGSA
_helix 999999999999 999999999
_beta 9 999999999 999999 999
_aln.pos 70 80 90 100 110 120
4dfr NIILSSQPGT--DDRVTWVKSVDEAIAACG--DVPEIMVIGGGRVYEQFLPKAQKLYLTH
hvdfr QIVMSRSERSFSVDTAHRAASVEEAVDIAASLDAETAYVIGGAAIYALFQPHLDRMVLSR
_helix 9999999999 99999999
_beta 99999 99999 9999999 9999999
_aln.pos 130 140 150 160
4dfr IDAEVEGDTHFPDYEPDDWESVFSEFHDADAQNSHSYCFKILERR----
hvdfr VPGEYEGDTYYPEWDAAEWELDAETDHE-------GFTLQEWVRSASSR
_helix
_beta 99 999999999999 999999999999
File: hvdfr-4dfr-2.pap
A new model was calculated using this alignment and the script,
modified to use the new alignment (see file
`model5.py'). Its DOPE score profile
is shown in the next figure.
DOPE profile for model hvdfr.B99990001.pdb
The main positive peak disappeared and the new global DOPE score dropped from
-15498.7 to -15720.3.
The process outlined here could be iterated until no improvement in the evaluation
can be achieved. This iterative alignment-building-evaluation process has been developed
in the MOULDER protocol which will be further implemented in MODELLER .
|