Profile file

The format of the profile file (text) is as follows:


# Number of sequences:      4
# Length of profile  :     20
# N_PROF_ITERATIONS  :      3
# GAP_PENALTIES_1D   :   -900.0   -50.0
# MATRIX_OFFSET      :    0.0
# RR_FILE            : ${MODINSTALLCVS}/modlib//as1.sim.mat
    1 2ctx                                     X     0    71     1    71     0     0     0    0.    0.0     IRCFITPDITS---KDCPN-
    2 2abx                                     X     0    74     1    74     0     0     0    0.    0.0     IVCHTTATIPS-SAVTCPPG
    3 2nbt                                     X     0    66     1    66     0     0     0    0.    0.0     RTCLISPSS---TPQTCPNG
    4 1fas                                     X     0    61     1    61     0     0     0    0.    0.0     TMCYSHTTTSRAILTNCG--

The first six lines begin with a '#' in the first column and give a few general details of the profile.

The first line gives the number of sequences in the profile. The line should be in the following format: '(24x,i6)'.

The second line gives the number of positions in the profile. This should be in '(24x,i6)' format also.

The third line gives the value of the n_prof_iterations variable. The fourth line gives the value of the gap_penalties_1d variable. The fifth line gives the value of the matrix_offset variable. The sixth line gives the value of the rr_file variable.

The number of sequences in the profile and its length are used to allocate memory for the profile arrays, so they should provide an accurate description of the profile.

The values of the variables described in lines 3 through 6 are not used internally by MODELLER. But the Profile.read() command expects to find a total of six header lines. These records represent useful information when Profile.build() was used to construct the profile.

The remaining lines consist of the alignment of the sequences in the profile. The format of these lines is of the form: '(i5,1x,a40,1x,a1,1x,7(i5,1x),f5.0,1x,g10.2,1x,32767a1)'

The various columns that precede the sequence are:

  1. The index number of the sequence in the profile.

  2. The code of the sequence (similar to Sequence.code).

  3. The type of sequence ('S' for sequence, 'X' for structure). This depends on the original source of the sequences. (See Alignment.to_profile() and SequenceDB.read()).

  4. The iteration in which the sequence was selected as significant. (See Profile.build()).

  5. The length of the database sequence.

  6. The starting position of the target sequence in the alignment.

  7. The ending position of the target sequence in the alignment.

  8. The starting position of the database sequence in the alignment.

  9. The ending position of the database sequence in the alignment.

  10. The number of equivalent positions in the alignment.

  11. The sequence identity of between the target sequence and the database sequence.

  12. The e-value of the alignment. (See Profile.build()).

  13. The sequence alignment.

Many of the fields described above are valid only when the profile that is written out is the result of Profile.build().