[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] Sequence alignment with missing residues



Dear Modeller experts,

 

My apologies for the multiple posts. I’m using Modeller to align sequences, find missing residues in pdb and then perform loop modeling for the missing regions. And I have a small issue with sequence alignment when there’re missing residues. Here’s my .ali file:

 

>P1;6cqx_a_unmod

structureX:6cqx_a_unmod:3:A:+532:A:::-1.00:-1.00

-REDAELLVTVRGGRLRGIRLKTPGGPVSAFLGIPFAEPPMGPRRFLPPEPKQPWSGVVDATTFQSVCYQYVDTL

YPGFEGTEMWNPNRELSEDCLYLNVWTPYPRPTSPTPVLVWIYGGGFYSGASSLDVYDGRFLVQAERTVLVSMNY

RVGAFGFLALPGSREAPGNVGLLDQRLALQWVQENVAAFGGDPTSVTLFGE.AGAASVGMHLLSPPSRGLFHRAV

LQSGAPNGPWATVGMGEARRRATQLAHLVGCPPGG---NDTELVACLRTRPAQVLVNHEWHVLPQESVFRFSFVP

VVDGDFLSDTPEALINAGDFHGLQVLVGVVKDEGSYFLVYGAPGFSKDNESLISRAEFLAGVRVGVPQVSDLAAE

AVVLHYTDWLHPEDPARLREALSDVVGDHNVVCPVAQLAGRLAAQGARVYAYVFEHRASTLSWPLWMGVPHGYEI

EFIFGIPLDPSRNYTAEEKIFAQRLMRYWANFARTGDPNE-----APQWPPYTAGAQQYVSLDLRPLEVRRGLRA

QACAFWNRFLPKLLSA-*

 

>P1;6cqx_A

sequence::: :: :::-1.00:-1.00

GREDAELLVTVRGGRLRGIRLKTPGGPVSAFLGIPFAEPPMGPRRFLPPEPKQPWSGVVDATTFQSVCYQYVDTL

YPGFEGTEMWNPNRELSEDCLYLNVWTPYPRPTSPTPVLVWIYGGGFYSGASSLDVYDGRFLVQAERTVLVSMNY

RVGAFGFLALPGSREAPGNVGLLDQRLALQWVQENVAAFGGDPTSVTLFGESAGAASVGMHLLSPPSRGLFHRAV

LQSGAPNGPWATVGMGEARRRATQLAHLVGCPPGGTGGNDTELVACLRTRPAQVLVNHEWHVLPQESVFRFSFVP

VVDGDFLSDTPEALINAGDFHGLQVLVGVVKDEGSYFLVYGAPGFSKDNESLISRAEFLAGVRVGVPQVSDLAAE

AVVLHYTDWLHPEDPARLREALSDVVGDHNVVCPVAQLAGRLAAQGARVYAYVFEHRASTLSWPLWMGVPHGYEI

EFIFGIPLDPSRNYTAEEKIFAQRLMRYWANFARTGDPNEPRDPKAPQWPPYTAGAQQYVSLDLRPLEVRRGLRA

QACAFWNRFLPKLLSAT*

 

And here’re the alignment commands I’m using:

 

e = Environ()

aln = Alignment(e)

aln.append(file='two_seq.seq', align_codes=('all'))

aln.align(gap_penalties_1d=(-600, -400))

aln.write(file='two_seq.ali')

quit()

 

The first sequence is read from a pdb file, with some residues (“-”). And the second sequence is the full sequence. However, because of the peculiar pattern, the alignment is not correct in this region:

 

GCPPGG---NDTEL (from the pdb file)

PPGGTGGNDTEL (the full sequence)

 

I checked the structure file and then I found that the alignment should be as follows:

 

GCPP---GGNDTEL (from the pdb file)

PPGGTGGNDTEL (the full sequence)

 

Is there any way to avoid this type of “mismatch” in the sequence alignment? Thank you very much for your kind advice in advance.  

 

 

Massive Thanks,

Amy

 

 

--

Amy He

Chemistry Graduate Teaching Assistant

Hadad Research Group

Ohio State University