File: mast.protein.tcm.txt

package info (click to toggle)
python-biopython 1.73%2Bdfsg-1
links: PTS, VCS
area: main
in suites: buster
size: 57,852 kB
sloc: python: 169,977; xml: 97,539; ansic: 15,653; sql: 1,208; makefile: 159; sh: 63
file content (332 lines) | stat: -rw-r--r-- 16,850 bytes
parent folder | download | duplicates (17)
********************************************************************************
MAST - Motif Alignment and Search Tool
********************************************************************************
	MAST version 3.0 (Release date: 2004/08/18 09:07:01)

	For further information on how to interpret these results or to get
	a copy of the MAST software please access http://meme.sdsc.edu.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
	If you use this program in your research, please cite:

	Timothy L. Bailey and Michael Gribskov,
	"Combining evidence using p-values: application to sequence homology
	searches", Bioinformatics, 14(48-54), 1998.
********************************************************************************


********************************************************************************
DATABASE AND MOTIFS
********************************************************************************
	DATABASE farntrans5.s (peptide)
	Last updated on Mon Aug 16 21:19:59 2004
	Database contains 5 sequences, 1900 residues

	MOTIFS meme.farntrans5.tcm.txt (peptide)
	MOTIF WIDTH BEST POSSIBLE MATCH
	----- ----- -------------------
	  1    30   GGFQGRPNKEVHTCYTYWALAALAILNKLH
	  2    14   INKEKLIQWIKSCQ

	PAIRWISE MOTIF CORRELATIONS:
	MOTIF     1
	----- -----
	   2   0.22
	No overly similar pairs (correlation > 0.60) found.

	Random model letter frequencies (from non-redundant database):
	A 0.073 C 0.018 D 0.052 E 0.062 F 0.040 G 0.069 H 0.022 I 0.056 K 0.058 
	L 0.092 M 0.023 N 0.046 P 0.051 Q 0.041 R 0.052 S 0.074 T 0.059 V 0.064 
	W 0.013 Y 0.033 
********************************************************************************


********************************************************************************
SECTION I: HIGH-SCORING SEQUENCES
********************************************************************************
	- Each of the following 5 sequences has E-value less than 10.
	- The E-value of a sequence is the expected number of sequences
	  in a random database of the same size that would match the motifs as
	  well as the sequence does and is equal to the combined p-value of the
	  sequence times the number of sequences in the database.
	- The combined p-value of a sequence measures the strength of the
	  match of the sequence to all the motifs and is calculated by
	    o finding the score of the single best match of each motif
	      to the sequence (best matches may overlap),
	    o calculating the sequence p-value of each score,
	    o forming the product of the p-values,
	    o taking the p-value of the product.
	- The sequence p-value of a score is defined as the
	  probability of a random sequence of the same length containing
	  some match with as good or better a score.
	- The score for the match of a position in a sequence to a motif
	  is computed by by summing the appropriate entry from each column of
	  the position-dependent scoring matrix that represents the motif.
	- Sequences shorter than one or more of the motifs are skipped.
	- The table is sorted by increasing E-value.
********************************************************************************

SEQUENCE NAME                      DESCRIPTION                   E-VALUE  LENGTH
-------------                      -----------                   -------- ------
BET2_YEAST                         YPT1/SEC4 PROTEINS GERANY...    2.9e-27    325
RATRABGERB                         Rat rab geranylgeranyl tr...    1.4e-25    331
CAL1_YEAST                         RAS PROTEINS GERANYLGERAN...    9.7e-22    376
PFTB_RAT                           PROTEIN FARNESYLTRANSFERA...    7.6e-21    437
RAM1_YEAST                         PROTEIN FARNESYLTRANSFERA...    6.2e-20    431

********************************************************************************



********************************************************************************
SECTION II: MOTIF DIAGRAMS
********************************************************************************
	- The ordering and spacing of all non-overlapping motif occurrences
	  are shown for each high-scoring sequence listed in Section I.
	- A motif occurrence is defined as a position in the sequence whose
	  match to the motif has POSITION p-value less than 0.0001.
	- The POSITION p-value of a match is the probability of
	  a single random subsequence of the length of the motif
	  scoring at least as well as the observed match.
	- For each sequence, all motif occurrences are shown unless there
	  are overlaps.  In that case, a motif occurrence is shown only if its
	  p-value is less than the product of the p-values of the other
	  (lower-numbered) motif occurrences that it overlaps.
	- The table also shows the E-value of each sequence.
	- Spacers and motif occurences are indicated by
	   o -d-    `d' residues separate the end of the preceding motif 
		    occurrence and the start of the following motif occurrence
	   o [n]  occurrence of motif `n' with p-value less than 0.0001.
********************************************************************************

SEQUENCE NAME                      E-VALUE   MOTIF DIAGRAM
-------------                      --------  -------------
BET2_YEAST                          2.9e-27  6_[2]_3_[1]_1_[2]_4_[1]_4_[2]_
                                             3_[1]_1_[2]_3_[1]_21_[1]_1_[2]_
                                             4_[1]_24
RATRABGERB                          1.4e-25  65_[2]_3_[1]_1_[2]_3_[1]_1_[2]_
                                             3_[1]_18_[1]_1_[2]_4_[1]_26
CAL1_YEAST                          9.7e-22  125_[2]_50_[2]_1_[1]_4_[2]_22_
                                             [1]_22_[1]_5_[2]_1
PFTB_RAT                            7.6e-21  120_[2]_3_[1]_4_[2]_3_[1]_1_[2]_
                                             3_[1]_1_[2]_4_[1]_14_[2]_4_[1]_60
RAM1_YEAST                          6.2e-20  144_[1]_5_[2]_4_[1]_1_[2]_4_[1]_
                                             1_[2]_4_[1]_4_[2]_5_[1]_35_[2]_4

********************************************************************************



********************************************************************************
SECTION III: ANNOTATED SEQUENCES
********************************************************************************
	- The positions and p-values of the non-overlapping motif occurrences
	  are shown above the actual sequence for each of the high-scoring
	  sequences from Section I.
	- A motif occurrence is defined as a position in the sequence whose
	  match to the motif has POSITION p-value less than 0.0001 as 
	  defined in Section II.
	- For each sequence, the first line specifies the name of the sequence.
	- The second (and possibly more) lines give a description of the 
	  sequence.
	- Following the description line(s) is a line giving the length, 
	  combined p-value, and E-value of the sequence as defined in Section I.
	- The next line reproduces the motif diagram from Section II.
	- The entire sequence is printed on the following lines.
	- Motif occurrences are indicated directly above their positions in the
	  sequence on lines showing
	   o the motif number of the occurrence,
	   o the position p-value of the occurrence,
	   o the best possible match to the motif, and
	   o columns whose match to the motif has a positive score (indicated 
	     by a plus sign).
********************************************************************************


BET2_YEAST
  YPT1/SEC4 PROTEINS GERANYLGERANYLTRANSFERASE BETA SUBUNIT (EC 2.
  LENGTH = 325  COMBINED P-VALUE = 5.77e-28  E-VALUE =  2.9e-27
  DIAGRAM: 6_[2]_3_[1]_1_[2]_4_[1]_4_[2]_3_[1]_1_[2]_3_[1]_21_[1]_1_[2]_4_[1]_24

           [2]              [1]                            [2]               [1]
           5.2e-05          2.7e-10                        6.6e-10           5.9
           INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ    GGF
             +++ +++++++     ++  + +++ +++ +++++ +++++++ + ++++++++++++++    + +
1    MSGSLTLLKEKHIRYIESLDTNKHNFEYWLTEHLRLNGIYWGLTALCVLDSPETFVKEEVISFVLSCWDDKYGAF

                                    [2]              [1]
     e-14                           4.8e-07          2.3e-17
     QGRPNKEVHTCYTYWALAALAILNKLH    INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILN
     +  +++++++  +  ++++++ +++++    +++ +++++++ ++   + ++++  +++++++ +++++++++++
76   APFPRHDAHLLTTLSAVQILATYDALDVLGKDRKVRLISFIRGNQLEDGSFQGDRFGEVDTRFVYTALSALSILG

         [2]              [1]                                                [1]
         5.1e-07          1.4e-18                                            4.6
     KLH INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH                     GGF
     +++ ++++ ++++++++    ++++  ++++++++++++++++++++++++                     +++
151  ELTSEVVDPAVDFVLKCYNFDGGFGLCPNAESHAAQAFTCLGALAIANKLDMLSDDQLEEIGWWLCERQLPEGGL

                                 [2]               [1]
     e-22                        2.0e-13           3.8e-17
     QGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ    GGFQGRPNKEVHTCYTYWALAALAILNKL
     ++++ ++++++++++++++++++++++ ++ +++++++++++    ++++++++++++++++ ++++++++++ +
226  NGRPSKLPDVCYSWWVLSSLAIIGRLDWINYEKLTEFILKCQDEKKGGISDRPENEVDVFHTVFGVAGLSLMGYD

     
     
     H
     +
301  NLVPIDPIYCMPKSVTSKFKKYPYK


RATRABGERB
  Rat rab geranylgeranyl transferase beta-subunit
  LENGTH = 331  COMBINED P-VALUE = 2.83e-26  E-VALUE =  1.4e-25
  DIAGRAM: 65_[2]_3_[1]_1_[2]_3_[1]_1_[2]_3_[1]_18_[1]_1_[2]_4_[1]_26

                                                                      [2]
                                                                      1.0e-11
                                                                      INKEKLIQWI
                                                                      +++++++ ++
1    MGTQQKDVTIKSDAPDTLLLEKHADYIASYGSKKDDYEYCMSEYLRMSGVYWGLTVMDLMGQLHRMNKEEILVFI

            [1]                            [2]              [1]
            1.6e-14                        1.4e-09          5.4e-19
     KSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWAL
     ++++   ++ +  +++++++ ++  ++++++++++++ +++++++ ++++++   + ++++++++++++++++++
76   KSCQHECGGVSASIGHDPHLLYTLSAVQILTLYDSIHVINVDKVVAYVQSLQKEDGSFAGDIWGEIDTRFSFCAV

                [2]              [1]
                3.8e-12          4.8e-19
     AALAILNKLH INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH
     ++++++++++ ++++++++++++++   ++++++++ ++++++++++++ ++++++++
151  ATLALLGKLDAINVEKAIEFVLSCMNFDGGFGCRPGSESHAGQIYCCTGFLAITSQLHQVNSDLLGWWLCERQLP

      [1]                            [2]               [1]
      3.9e-21                        1.2e-12           9.6e-18
      GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ    GGFQGRPNKEVHTCYTYWALAALAI
      +++++++++++++++++++++++ ++++++ ++++++++++++++    ++++++++++++ +++ ++++++++
226  SGGLNGRPEKLPDVCYSWWVLASLKIIGRLHWIDREKLRSFILACQDEETGGFADRPGDMVDPFHTLFGIAGLSL

     
     
     LNKLH
     +++++
301  LGEEQIKPVSPVFCMPEEVLQRVNVQPELVS


CAL1_YEAST
  RAS PROTEINS GERANYLGERANYLTRANSFERASE (EC 2.5.1.-) (PROTEIN GER
  LENGTH = 376  COMBINED P-VALUE = 1.94e-22  E-VALUE =  9.7e-22
  DIAGRAM: 125_[2]_50_[2]_1_[1]_4_[2]_22_[1]_22_[1]_5_[2]_1


                                                       [2]
                                                       1.8e-08
                                                       INKEKLIQWIKSCQ
                                                        +++++++++++++
76   LDDTENTVISGFVGSLVMNIPHATTINLPNTLFALLSMIMLRDYEYFETILDKRSLARFVSKCQRPDRGSFVSCL

                                            [2]            [1]
                                            4.8e-10        8.7e-14
                                            INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALA
                                            +++++++ ++++++  + + + +++++ +++ ++++
151  DYKTNCGSSVDSDDLRFCYIAVAILYICGCRSKEDFDEYIDTEKLLGYIMSQQCYNGAFGAHNEPHSGYTSCALS

                  [2]                                 [1]
                  5.9e-08                             5.9e-20
     ALAILNKLH    INKEKLIQWIKSCQ                      GGFQGRPNKEVHTCYTYWALAALAIL
     +++++++++      ++++++++++++                      +++++++++ +++++++++++++ ++
226  TLALLSSLEKLSDKFKEDTITWLLHRQVSSHGCMKFESELNASYDQSDDGGFQGRENKFADTCYAFWCLNSLHLL

                               [1]                                [2]
                               4.0e-13                            2.1e-07
     NKLH                      GGFQGRPNKEVHTCYTYWALAALAILNKLH     INKEKLIQWIKSCQ
     ++++                      ++++ + ++++++++++ + +++++++        + ++ +++++++++
301  TKDWKMLCQTELVTNYLLDRTQKTLTGGFSKNDEEDADLYHSCLGSAALALIEGKFNGELCIPQEIFNDFSKRCC


PFTB_RAT
  PROTEIN FARNESYLTRANSFERASE BETA SUBUNIT (EC 2.5.1.-) (CAAX FARNES
  LENGTH = 437  COMBINED P-VALUE = 1.53e-21  E-VALUE =  7.6e-21
  DIAGRAM: 120_[2]_3_[1]_4_[2]_3_[1]_1_[2]_3_[1]_1_[2]_4_[1]_14_[2]_4_[1]_60


                                                  [2]              [1]
                                                  1.3e-07          2.8e-19
                                                  INKEKLIQWIKSCQ   GGFQGRPNKEVHT
                                                  ++ ++++++++ ++   +++++++++ +++
76   EKHFHYLKRGLRQLTDAYECLDASRPWLCYWILHSLELLDEPIPQIVATDVCQFLELCQSPDGGFGGGPGQYPHL

                          [2]              [1]                            [2]
                          2.3e-09          2.1e-14                        1.8e-0
     CYTYWALAALAILNKLH    INKEKLIQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKL
     + +++++++++++++++    ++++++++++ +++   + + ++ +++++++ +++++++++++++++ +  +++
151  APTYAAVNALCIIGTEEAYNVINREKLLQYLYSLKQPDGSFLMHVGGEVDVRSAYCAASVASLTNIITPDLFEGT

                [1]                            [2]               [1]
     8          7.4e-20                        1.8e-08           2.2e-16
     IQWIKSCQ   GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ    GGFQGRPNKEVHTCY
     ++++ +++   +++++ +++++++++++++++++ ++++++  + +++++++++++    ++++++ ++++++++
226  AEWIARCQNWEGGIGGVPGMEAHGGYTFCGLAALVILKKERSLNLKSLLQWVTSRQMRFEGGFQGRCNKLVDGCY

                                  [2]               [1]
                                  5.0e-08           3.1e-15
     TYWALAALAILNKLH              INKEKLIQWIKSCQ    GGFQGRPNKEVHTCYTYWALAALAILNK
     ++++++ + ++++                ++++++++++++++    +++ +++++  +++++++++++++++++
301  SFWQAGLLPLLHRALHAQGDPALSMSHWMFHQQALQEYILMCCQCPAGGLLDKPGKSRDFYHTCYCLSGLSIAQH

     
     
     LH
     +
376  FGSGAMLHDVVMGVPENVLQPTHPVYNIGPDKVIQATTHFLQKPVPGFEECEDAVTSDPATD


RAM1_YEAST
  PROTEIN FARNESYLTRANSFERASE BETA SUBUNIT (EC 2.5.1.-) (CAAX FARN
  LENGTH = 431  COMBINED P-VALUE = 1.24e-20  E-VALUE =  6.2e-20
  DIAGRAM: 144_[1]_5_[2]_4_[1]_1_[2]_4_[1]_1_[2]_4_[1]_4_[2]_5_[1]_35_[2]_4


                                                                          [1]
                                                                          8.8e-1
                                                                          GGFQGR
                                                                          + ++++
76   PALTKEFHKMYLDVAFEISLPPQMTALDASQPWMLYWIANSLKVMDRDWLSDDTKRKIVVKLFTISPSGGPFGGG

                                  [2]               [1]
     7                            6.4e-07           1.0e-13
     PNKEVHTCYTYWALAALAILNKLH     INKEKLIQWIKSCQ    GGFQGRPNKEVHTCYTYWALAALAILNK
     ++++++++ ++++++++++ ++++     ++++++++++ +++    +  ++ ++++++++ +++++++++++++
151  PGQLSHLASTYAAINALSLCDNIDGCWDRIDRKGIYQWLISLKEPNGGFKTCLEVGEVDTRGIYCALSIATLLNI

        [2]               [1]                            [2]               [1]
        2.5e-08           3.1e-17                        4.7e-11           2.4e-
     LH INKEKLIQWIKSCQ    GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ    GGFQG
     ++ + ++++++++++++    +  ++  +++++++++++++++++++++++ ++++++++++++++     ++ +
226  LTEELTEGVLNYLKNCQNYEGGFGSCPHVDEAHGGYTFCATASLAILRSMDQINVEKLLEWSSARQLQEERGFCG

                                  [2]                [1]
     16                           4.9e-09            2.7e-13
     RPNKEVHTCYTYWALAALAILNKLH    INKEKLIQWIKSCQ     GGFQGRPNKEVHTCYTYWALAALAILN
     + ++++++++++++ +++++++++     +++++++++++ ++      +++++++++++++++ +++ ++++++
301  RSNKLVDGCYSFWVGGSAAILEAFGYGQCFNKHALRDYILYCCQEKEQPGLRDKPGAHSDFYHTNYCLLGLAVAE

                                           [2]
                                           9.8e-05
     KLH                                   INKEKLIQWIKSCQ
     +                                     +++++++ + +++
376  SSYSCTPNDSPHNIKCTPDRLIGSSKLTDVNPVYGLPIENVRKIIHYFKSNLSSPS

********************************************************************************


CPU: pmgm2
Time 0.130000 secs.

mast meme.farntrans5.tcm.txt -text -stdout