File: blocks_format.html

package info (click to toggle)
blimps 3.9%2Bds-4
links: PTS, VCS
area: non-free
in suites: sid, trixie
size: 7,204 kB
sloc: ansic: 43,276; csh: 553; perl: 116; makefile: 100; cs: 27; cobol: 23
file content (150 lines) | stat: -rw-r--r-- 5,820 bytes
parent folder | download | duplicates (3)
<TITLE>Blocks Format</TITLE>
<H1><IMG SRC="/blocks/icons/small-blocks.xbm"> Format of a Block</H1>

<!   PRE is required to get multiple spaces, etc.    >
<PRE>
ID   short_identifier; BLOCK
AC   block_number; distance from previous block = (<I>min,max</I>)
DE   description
BL   <I>xxx</I> motif; width=<I>w</I>; seqs=<I>s</I>; 99.5%=<I>n1</I>; strength=<I>n2</I>
sequence_id  (offset) sequence_segment  sequence_weight
.
.
.
//</PRE><HR>

ID line starts a block entry and contains a short identifier for the group
of sequences from which the block was made.
If the block was taken from InterPro, it will be the InterPro group ID.
The identifier is terminated by a semi-colon, and the word "BLOCK"
indicates the entry type.<P>

AC line contains the block number, a seven-character group number for
sequences from which the block was made, followed by a letter (A-Z) 
indicating the order of the block in the sequences.
If the group has only one block, the letter is omitted.
If the block was made from InterPro group IPRnnnnnn, the block number
is IPBnnnnnna.
If the block was converted from Terri Attwood's Prints Database the
block number is PRnnnnna. 
<I>min,max</I> =  minimum,maximum number of amino acids from previous block for
sequences in this block. For the first block in the group, the distance from
the beginning of the sequences.<P>

DE line contains a description of the group of sequences from which
the block was made. If the block was taken from InterPro, it will be
a slightly edited version of the InterPro description.<P>

BL line contains information about the block:<BR>
<I>xxx</I> =  the amino acids in the spaced triplet found by MOTIF upon
which the block is based.<BR>
<I>w</I> =  width of the sequence segments (columns) in the block.<BR>
<I>s</I> =  number of sequence segments (rows) in the block.<BR>
<I>n1</I> =  raw calibration score; 99.5th percentile score of true
negative sequences. Raw search scores are normalized by
dividing by this score and multiplying by 1000.<BR>
<I>n2</I> =  median normalized score of known true positive sequences as
documented in InterPro.<P>

Following the BL line are lines for each sequence with
a segment in the block. The segments may be clustered with clusters
separated by blank lines.
Each segment line contains a sequence identifier,
the offset from the beginning of the sequence
to the block in parentheses, the sequence segment, and a weight for the
segment. The weights are normalized so that
the most distant segment has a weight of 100.<P>

// line terminates a block entry.
<P>
<A HREF="/blocks/help/blocks_release.html">Current Blocks Database Release</A>
<P>
<A HREF="/blocks/help/about_blocks.html">About the Blocks Database</A>
<HR>
<H2>Other Multiple Alignment Formats</H2>
<H3><A NAME="fasta">FASTA Format<A><BR></H3>
Each sequence in the multiple alignment starts with a FASTA
title line containing the sequence name followed by the
aligned sequence residues with dashes representing gaps:
<PRE>
>JC2395
NVSDVNLNK---YIWRTAEKMK---ICDAKKFARQHKIPESKIDEIEHNSPQDAAE----
-------------------------QKIQLLQCWYQSHGKT--GACQALIQGLRKANRCD
IAEEIQAM
>KPEL_DROME
MAIRLLPLPVRAQLCAHLDAL-----DVWQQLATAVKLYPDQVEQISSQKQRGRS-----
-------------------------ASNEFLNIWGGQYN----HTVQTLFALFKKLKLHN
AMRLIKDY
>FASA_MOUSE
NASNLSLSK---YIPRIAEDMT---IQEAKKFARENNIKEGKIDEIMHDSIQDTAE----
-------------------------QKVQLLLCWYQSHGKS--DAYQDLIKGLKKAECRR
TLDKFQDM
</PRE>

<P><HR>
<H3><A NAME="clustal">CLUSTAL/STOCKHOLM Format<A></H3>
<A HREF="http://www2.ebi.ac.uk/clustalw">ClustalW Site</A>.<BR>
The first non-blank line must contain the word "CLUSTAL" or "STOCKHOLM".
Sequences are interleaved on separate lines with gaps represented
by dashes. Each sequence line starts with the sequence name which is
separated from the aligned sequence residues by spaces or tabs.
Each set of interleaved sequence segments is separated by one or more
blank lines. Lines containing sequence conservations symbols (CLUSTAL)
or "//" (STOCKHOLM) are ignored.
<BR>
(Please note: Some WWW sites post-process Clustal output so that
it has a different format than in this example; in this case use
FASTA format).
<PRE>
CLUSTAL W(1.60) multiple sequence alignment



JC2395          NVSDVNLNK---YIWRTAEKMK---ICDAKKFARQHKIPESKIDEIEHNSPQDAAE----
KPEL_DROME      MAIRLLPLPVRAQLCAHLDAL-----DVWQQLATAVKLYPDQVEQISSQKQRGRS-----
FASA_MOUSE      NASNLSLSK---YIPRIAEDMT---IQEAKKFARENNIKEGKIDEIMHDSIQDTAE----


JC2395          -------------------------QKIQLLQCWYQSHGKT--GACQALIQGLRKANRCD
KPEL_DROME      -------------------------ASNEFLNIWGGQYN----HTVQTLFALFKKLKLHN
FASA_MOUSE      -------------------------QKVQLLLCWYQSHGKS--DAYQDLIKGLKKAECRR


JC2395          IAEEIQAM
KPEL_DROME      AMRLIKDY
FASA_MOUSE      TLDKFQDM



</PRE>

<P><HR>
<H3><A NAME="msf">MSF Format<A></H3>
Any comments at the beginning of the file are terminated with a line
starting with two slashes.
Sequences are interleaved on separate lines with gaps represented
by periods. Each sequence line starts with the sequence name which is
separated from the aligned sequence residues by white space:
<PRE>
//


                1                                                   50
JC2395          NVSDVNLNK. ..YIWRTAEK MK...ICDAK KFARQHKIPE SKIDEIEHNS 
KPEL_DROME      MAIRLLPLPV RAQLCAHLDA L.....DVWQ QLATAVKLYP DQVEQISSQK 
FASA_MOUSE      NASNLSLSK. ..YIPRIAED MT...IQEAK KFARENNIKE GKIDEIMHDS

		51                                                 100
JC2395		PQDAAE.... .......... .......... .....QKIQL LQCWYQSHGK
KPEL_DROME	QRGRS..... .......... .......... .....ASNEF LNIWGGQYN.
FASA_MOUSE	IQDTAE.... .......... .......... .....QKVQL LLCWYQSHGK

                101
JC2395		T..GACQALI QGLRKANRCD IAEEIQAM
KPEL_DROME	...HTVQTLF ALFKKLKLHN AMRLIKDY
FASA_MOUSE	S..DAYQDLI KGLKKAECRR TLDKFQDM


<HR>
<A href="/blocks">[Blocks home]</A>