[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] Re: Missing Residues at the Start, As Well As in the Middle, of the Sequence of a Chain



Hi Siddhartha you're welcome:) and feel free to always inquire if you
followed the instructions and still generated an error(s).


On Thu, May 30, 2024 at 9:10 AM Siddhartha Barua via modeller_usage <
modeller_usage@listsrv.ucsf.edu> wrote:

> Dear Ben (Ben Webb, Modeller Caretaker) and Joel (Subach),
>
> Thanks a lot for your tips!
>
> I tinkered with the alignment and Python script files and got Modeller to
> model the missing residues.
>
> I found two possible solutions to the problem:
>
> *1) Use of a dash at the beginning of the structure-derived sequence
> portion of the alignment file, for each of the residues that were missing
> relative to the full-length protein sequence, as per NCBI's RefSeq
> (Reference Sequence):*
>
> For this, I used the following alignment file (with additional formatting
> at the relevant portions, for emphasis- but I used the plain text version
> for modelling), where I *explicitly specified the starting and ending
> residue positions of the model segment* that had coordinates (except for
> the short 6-residue stretch at S(431)ATDIG(436) (with missing coordinates)):
>
> >P1;5bs8_B
> structure:5bs8_B.pdb:*425:B:675:B*:DNA Gyrase:::
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *A*
> LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
> >P1;5bs8B_fill
> sequence:::::::::
>
> MAAQKKKAQDEYGAASITILEGLEAVRKRPGMYIGSTGERGLHHLIWEVVDNAVDEAMAGYATTVNVVLLEDGGVEVADDGRGIPVATHASGIPTVDVVMTQLHAGGKFDSDAYAISGGLHGVGVSVVNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGAPTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEVVDEVVSDVAEAPKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAGYSESVHTFANTINTHEGGTHEEGFRSALTSVVNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSEPQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKVVVNKAVSSAQARIAARKAR
> *E**LVRRK**SATDIG*
> *GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
> *
>
> To make it easier for me to obtain a string of 424 dashes ("-"s) for the
> above alignment file and then copy and paste this sequence at the start of
> the structure-derived sequence part of the alignment file, without having
> to manually type and count them, I used the following short Python script
> (It can be modified according to the version of Python used, since some of
> the older versions of Python use a different syntax for print statements
> [e.g.: print "hello" vs print("hello")] ):
>
> """This script generates dashes. You need to enter the number of dashes to
> print, when prompted to do so."""
> dashes = ""
>
> n = int(input(("Please enter the number of dashes that you want to print
> as a contiguous stretch of dashes. Enter a non-zero, positive integer: ")))
> for i in range(1, (n + 1)):
>     dashes += "-"
>
> print(dashes)
> print("\n")
> print(f"The number of dashes stored in the variable 'dashes' is
> {len(dashes)}.")
>
> This modelled the long stretch of 424 missing residues at the start of the
> structure-derived sequence portion of the alignment file (the first of the
> two sequences in the file) as a long loop region, without secondary
> structures. I then simply deleted the unnecessary residues at the
> N-terminal part of each Modeller-generated model, in UCSF Chimera (i.e., I
> deleted residues 1-422) and saved the modified PDB file.
>
> 2) Use of only a portion of the full-length protein sequence from NCBI
> (NCBI RefSeq), the residues corresponding to the region 425-675, which
> correspond exactly to the length of the residues present in the
> atom/structure file used (a PDB file generated from the original PDB 5BS8
> by selecting chain B and saving only the selected atoms as a separate PDB
> file), except for the 6 missing residues inside this chain
> (S(431)ATDIG(436)), as the template sequence- the second sequence listed in
> the alignment file:
>
> For this, in the alignment file, I mentioned the model segment bearing
> atom records (coordinates) as 425:B:675:B as shown below:
>
> >P1;5bs8_B
> structure:5bs8_B.pdb:425:B:675:B:DNA Gyrase:::
> *A*
> LVRRK------GLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
> >P1;5bs8B_fill
> sequence:::::::::
> *E*
> LVRRKSATDIGGLPGKLADCRSTDPRKSELYVVEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLKNTEVQAIITALGTGIHDEFDIGKLRYHKIVLMADADVDGQHISTLLLTLLFRFMRPLIENGHVFLAQPPLYKLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQVTLDDAAAADELFSILMGEDVDARRSFITRNAKDVRFLDV*
>
> *In the Python script file, I only replaced the following line in the
> definition of the select_atoms function [def select_atoms(self):]*
>
> *return Selection(self.residue_range('431:B', '436:B'))*
> *with*
> *return Selection(self.residue_range('7:A', '12:A'))*
>
> This specified the portion to be allowed to move during model
> generation/refinement, without allowing the rest of the atoms to move. *The
> residue ranges '431:B', '436:B' and '7:A', '12:A' both refer to
> S(431)ATDIG(436), with respect to the numbering in the full-length sequence
> (NCBI refSeq)*, but in the latter format, it corresponds to the *numbering
> of residues given by Modeller to each of the newly generated models*,
> which *starts with residue number 1*.
>
> The two residues corresponding to residue positions 423 and 424 (as per
> the full-length sequence could then be modelled as a dipeptide using UCSF
> Chimera's Build Structure and then this dipeptide model could be saved as a
> PDB and then opened in UCSF Chimera along with the Modeller-generated model
> and the two chains (Chimera-generated dipeptide and Modeller-generated
> model) could be joined into a single model by forming a peptide bond
> between them using the Join Model function/tool in UCSF Chimera.
>
> Note that the start of the sequence of residues in the PDB 5BS8 at chain b
> that  has atom records/coordinates (sequence *A*LVRRK...) differs from
> the corresponding sequence in the NCBI RefSeq (where it is *E*LVRRK...)
> by the identity of a single residue and Modeller includes E rather than A
> at the start of this sequence, giving preference to the template sequence
> provided as the second sequence in the alignment file. So, if I wanted it
> to be "A" in the model, as in the structure file's sequence, I would need
> to make this change in the alignment file in the second sequence (template
> sequence) listed in the file.
>
> Thanks, and regards,
> Siddhartha
>
> On Wed, May 29, 2024 at 12:30 PM Modeller Caretaker <
> modeller-care@ucsf.edu> wrote:
>
>> On 5/28/24 11:48 PM, Siddhartha Barua via modeller_usage wrote:
>> > *KeyError: 'No such residue: 431:B'*
>>
>> Residues in the model are by default numbered starting at 1 and the
>> chains labeled alphabetically starting at A. Since you only have a
>> single chain, it will be labeled A, not B.
>> See https://salilab.org/modeller/10.5/manual/node23.html
>> If you want to number the residues differently, see
>> https://salilab.org/modeller/10.5/manual/node30.html
>>
>> It looks like you are mistakenly using the template residue numbering
>> here.
>>
>>         Ben Webb, Modeller Caretaker
>> --
>> modeller-care@ucsf.edu             https://salilab.org/modeller/
>> Modeller mail list: https://salilab.org/mailman/listinfo/modeller_usage
>>
>
>
> --
> Siddhartha A. Barua, Ph.D.
> Mb.: +91 7777093994
> _______________________________________________
> modeller_usage mailing list
> modeller_usage@listsrv.ucsf.edu
> https://salilab.org/mm/postorius/lists/modeller_usage.salilab.org/