File: blocks_release.html

package info (click to toggle)
blimps 3.9%2Bds-4
  • links: PTS, VCS
  • area: non-free
  • in suites: sid, trixie
  • size: 7,204 kB
  • sloc: ansic: 43,276; csh: 553; perl: 116; makefile: 100; cs: 27; cobol: 23
file content (128 lines) | stat: -rw-r--r-- 5,388 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<HTML>
<TITLE>Current BLOCKS Database Release</TITLE>

<H1><IMG SRC="/blocks/icons/small-blocks.xbm">Current Release of BLOCKS</H1>

<P ALIGN=CENTER>
               BLOCKS Database Version 14.2, March 2006<BR>
       Copyright 2006 by Fred Hutchinson Cancer Research Center<BR>
              1100 Fairview AV N, A1-162, Seattle, WA 98109

<P>
Version 14.2 of the BLOCKS Database consists of 29,767 blocks representing
6149 groups documented in InterPro 12.0 keyed to SWISS-PROT 48.3 and 
TrEMBL 31.3 obtained from the 
<A HREF="http://www.ebi.ac.uk/interpro/">InterPro server</A> .<P>
The BLOCKS Database is based on InterPro entries with sequences from
<A HREF="http://www.expasy.org/sprot/">SWISS_PROT</A>
and
<A HREF="http://www.ebi.ac.uk/trembl/">TrEMBL</A>
and with cross-references to
<A HREF="http://www.expasy.ch/prosite/">PROSITE</A> 
and/or
<A HREF="http://www.bioinf.man.ac.uk/dbbrowser/PRINTS">PRINTS</A>
and/or
<A HREF="http://smart.embl-heidelberg.de">SMART</A>,
and/or
<A HREF="http://pfam.wustl.edu">PFAM</A>
and/or
<A HREF="http://prodes.toulouse.inra.fr/prodom/current/html/home.php">ProDom</A>
entries.
<P>
The BLOCKS Database was constructed by the PROTOMAT system 
(S Henikoff & JG Henikoff, "Automated assembly of protein blocks for
database searching", NAR (1991) 19:6565-6572)
using the MOTIF algorithm
(HO Smith, et al, "Finding sequence motifs in groups of functionally
related proteins", PNAS (1990) 87:826-830)
as implemented in
<A HREF="/blocks/make_blocks.html">Block Maker</A>.
<P>
To avoid using possible false positive sequences added to the
InterPro entries automatically (without human oversight),
BLOCKS were made for each InterPro entry using just the sequences
in SWISS-PROT, and then TrEMBL sequences were added if they fit the
resulting BLOCKS model.
<P>
Version 14.2 is an incremental update. Additional sequences were added
to 5261 entries in Blocks 14.1, 33 entries were dropped and 449 entries 
were added.
InterPro 12.0 consisted of 12,542 entries. The 6149 entries of these
represented in BLOCKS 14.2 were selected as follows:<BR>
<PRE>

12542
-2850 entries with no PROSITE, PRINTS, SMART, PFAM or ProDom component (1)
-3104 entries with fewer than 3 SWISS-PROT sequences eligible for PROTOMAT (2)
 -293 entries participating in InterPro parent/child relationships (3)
  -29 entries with too many sequences to process with PROTOMAT
  -12 entries for which PROTOMAT failed to find blocks
 -105 entries for which final blocks were obviously useless (5)
 6149

<A HREF="/blocks/help/names.prints">
 1372 blocks entries taken from PRINTS</A> (4)

<A HREF="/blocks/help/names.protomat">
 4777 blocks entries made by PROTOMAT
</A>

NOTES:

(1) InterPro now contains entries from several other sources. However,
these five sources tend to define a protein family in terms most amenable
to the BLOCKS model which is short, highly conserved regions. In
particular, PROTOMAT will generally produce unsatisfactory results for
groups comprised of a few, long, globally alignable sequences.

(2) PROTOMAT requires at least 3 sequences to make blocks. To be more
confident that the sequences used are actually members of the InterPro
protein family, we used only sequences from SWISS-PROT. Then, to reduce
redundancy, we use only the longest SWISS-PROT sequence among those
with the same gene name (characters before the "_" in the SWISS-PROT ID)
and similar organism name (first three characters following the "_").
For example, if an InterPro group included SWISS-PROT sequences named
AANT_HDVAM|P25989     LENGTH=214
AANT_HDVD3|P29996     LENGTH=195   
AANT_HDVWO|P29997     LENGTH=205   
only AANT_HDVAM would be used by PROTOMAT.

(3) Several InterPro entries are arranged into parent/child hierarchies
where all the sequences in a child entry are included in the parent
entry. Since PROTOMAT will tend to find the same blocks for the parent
and children, each major branch of a hierarchy is represented by only
one BLOCKS entry.

(4) Because the PRINTS model is the same as the BLOCKS model and PRINTS is
a curated collection of alignments, the PRINTS blocks were used directly 
for InterPro entries with only a PRINTS component as long as the PRINTS blocks 
had at least three sequences from any source. Then additional sequences
were added from TrEMBL if they fit the PRINTS model.

(5) These entries tend to be sites (e.g. IPR000886, IPB001216),
repeats (e.g. IPR000479, IPR001473) and viral proteins (e.g. IPR000208,
IPR000752).
</PRE>
<P>
<I>Please note:</I> The PROSITE pattern is not used in any way to make the 
BLOCKS Database and BLOCKS made from an InterPro PROSITE group may or 
may not contain the PROSITE pattern. 
Similarly, the SMART and PFAM multiple alignments are not used
in any way to make the BLOCKS Database and BLOCKS made from an
InterPro PROSITE, SMART or PFAM group may or may not overlap with the
multiple alignments in those databases.

<HR>
<H1><A NAME="MINUS">BLOCKS without compositionally biased blocks</A></H1>
To avoid the over-representation of compositionally biased blocks in
search results, this subset of the BLOCKS Database excludes several
<A HREF="biased_list.html">biased blocks</A>. It may give better results,
especially with DNA queries.

<HR>
<A href="/blocks">BLOCKS home</A>

<HR>
<A href="contact.html">Contact us</A> <P>
Page last modified <MODIFICATION_DATE>March 2006</MODIFICATION_DATE>
</HTML>