1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907
|
From phd@EMBL-Heidelberg.de Wed Nov 25 10:24:25 1998
Date: Tue, 24 Nov 1998 17:45:25 +0100
From: Protein Prediction <phd@EMBL-Heidelberg.de>
To: eric.beitz@uni-tuebingen.de
Subject: PredictProtein
The following information has been received by the server:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
________________________________________________________________________________
reference predict_h25873 (Tue Nov 24 17:43:21 MET 1998)
from eric.beitz@uni-tuebingen.de
password(###)
resp MAIL
orig HTML
prediction of: -secondary structure (PHDsec)-solvent accessibility (PHDacc)-
return msf format
# no description
MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSIATLAQSVGHISGAHSNPAVT
LGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLLENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRR
RRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD
RMKVWTSGQVEEYDLDADDINSRVEMKPK
________________________________________________________________________________
Result of PROSITE search (Amos Bairoch):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
please quote: A Bairoch, P Bucher & K Hofmann: The PROSITE database,
its status in 1997. Nucl. Acids Res., 1997, 25, 217-221.
________________________________________________________________________________
--------------------------------------------------------
--------------------------------------------------------
Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern: N[^P][ST][^P]
42 NQTL
250 NFSN
Pattern-ID: GLYCOSAMINOGLYCAN PS00002 PDOC00002
Pattern-DE: Glycosaminoglycan attachment site
Pattern: SG.G
135 SGQG
Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern: [ST].[RK]
157 TDR
398 TDR
Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern: [ST].{2}[DE]
118 SLLE
383 SRVE
Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern: G[^EDRKHPFYW].{2}[STAGCN][^P]
30 GSALGF
92 GLSIAT
179 GLLLSC
288 GAIVAS
407 GITSSL
544 GVNSGQ
722 GLSVAL
917 GINPAR
1141 GSALAV
Pattern-ID: PROKAR_LIPOPROTEIN PS00013 PDOC00013
Pattern-DE: Prokaryotic membrane lipoprotein lipid attachment site
Pattern: [^DERK]{6}[LIVMFWSTAG]{2}[LIVMFYSTAGCQ][AGS]C
77 PAVTLGLLLSC
Pattern-ID: MIP PS00221 PDOC00193
Pattern-DE: MIP family signature
Pattern: [HNQA].NP[STA][LIVMF][ST][LIVMF][GSTAFY]
74 HSNPAVTLG
________________________________________________________________________________
Result of ProDom domain search (Corpet, Gouzy, Kahn):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- please quote: ELL Sonnhammer & D Kahn, Prot. Sci., 1994, 3, 482-492
________________________________________________________________________________
--- ------------------------------------------------------------
--- Results from running BLAST against PRODOM domains
---
--- PLEASE quote:
--- F Corpet, J Gouzy, D Kahn (1998). The ProDom database
--- of protein domain families. Nucleic Ac Res 26:323-326.
---
--- BEGIN of BLASTP output
BLASTP 1.4.7 [16-Oct-94] [Build 17:06:52 Oct 31 1994]
Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol.
215:403-10.
Query= prot (#) ppOld, no description /home/phd/server/work/predict_h25873
(269 letters)
Database: /home/phd/ut/prodom/prodom_34_2
53,597 sequences; 6,740,067 total letters.
Searching..................................................done
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
390 p34.2 (45) MIP(6) AQP1(4) GLPF(4) // PROTEIN INTRIN... 270 2.0e-32 1
45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z. 90 3.2e-13 2
45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WAT... 136 6.0e-13 1
304 p34.2 (61) AQP2(10) GLPF(6) MIP(5) // PROTEIN CHANN... 121 9.2e-11 1
45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE I... 80 1.2e-07 2
45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN. 80 1.2e-05 2
2027 p34.2 (15) GLPF(9) AQP3(2) // PROTEIN FACILITATOR ... 60 3.4e-05 2
45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATO... 63 0.024 1
45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5. 61 0.044 1
>390 p34.2 (45) MIP(6) AQP1(4) GLPF(4) // PROTEIN INTRINSIC CHANNEL WATER
AQUAPORIN TONOPLAST MEMBRANE FOR PLASMA LENS
Length = 88
Score = 270 (125.3 bits), Expect = 2.0e-32, P = 2.0e-32
Identities = 47/67 (70%), Positives = 56/67 (83%)
Query: 156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVG 215
T D+RR +GGSAPL IG SVALGHL+ I YTGCG+NPARSFG AV+T NF+NHW++WVG
Sbjct: 22 TDDKRRGSVGGSAPLPIGFSVALGHLIGIPYTGCGMNPARSFGPAVVTGNFTNHWVYWVG 81
Query: 216 PFIGSAL 222
P IG+ L
Sbjct: 82 PIIGAVL 88
Score = 95 (44.1 bits), Expect = 2.3e-06, P = 2.3e-06
Identities = 20/33 (60%), Positives = 23/33 (69%)
Query: 136 GQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSA 168
GQ L +EIIGT QLV CV ATTD +RR G +
Sbjct: 1 GQNLVVEIIGTFQLVYCVFATTDDKRRGSVGGS 33
>45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z.
Length = 96
Score = 90 (41.8 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13
Identities = 18/36 (50%), Positives = 25/36 (69%)
Query: 166 GSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201
G AP+AIGL++ L HL++I T +NPARS A+
Sbjct: 25 GFAPIAIGLALTLIHLISIPVTNTSVNPARSTAVAI 60
Score = 63 (29.2 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13
Identities = 11/25 (44%), Positives = 14/25 (56%)
Query: 210 WIFWVGPFIGSALAVLIYDFILAPR 234
W FWV P +G + LIY +L R
Sbjct: 71 WFFWVVPIVGGIIGGLIYRTLLEKR 95
>45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WATER CHANNEL PROTEIN FOR
RENAL COLLECTING DUCT) (ADH WATER CHANNEL) (AQUAPORIN 2) (COLLECTING DUCT
WATER CHANNEL PROTEIN) (WCH-CD).
Length = 49
Score = 136 (63.1 bits), Expect = 6.0e-13, P = 6.0e-13
Identities = 23/42 (54%), Positives = 34/42 (80%)
Query: 50 VKVSLAFGLSIATLAQSVGHISGAHSNPAVTLGLLLSCQISI 91
+++++AFGL I TL Q++GHISGAH NPAVT+ L+ C +S+
Sbjct: 8 LQIAMAFGLGIGTLVQALGHISGAHINPAVTVACLVGCHVSV 49
>304 p34.2 (61) AQP2(10) GLPF(6) MIP(5) // PROTEIN CHANNEL WATER AQUAPORIN
INTRINSIC DUCT COLLECTING FOR TONOPLAST WCH-CD
Length = 43
Score = 121 (56.1 bits), Expect = 9.2e-11, P = 9.2e-11
Identities = 24/43 (55%), Positives = 31/43 (72%)
Query: 70 ISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAIL 112
ISG H NPAVT+GLL+ + LRAV YI AQ +GA+ +A+L
Sbjct: 1 ISGGHINPAVTIGLLIGGRFPFLRAVFYIAAQLLGAVAGAALL 43
>45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE INTEGRAL PROTEIN.
Length = 69
Score = 80 (37.1 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07
Identities = 17/54 (31%), Positives = 32/54 (59%)
Query: 149 LVLCVLATTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVL 202
L++ V++ R +G A +A+G+++ L +A +G +NPARS G A++
Sbjct: 13 LLMFVISGVATDDRAIGQVAGIAVGMTITLNVFVAGPISGASMNPARSIGPAIV 66
Score = 34 (15.8 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07
Identities = 8/18 (44%), Positives = 11/18 (61%)
Query: 136 GQGLGIEIIGTLQLVLCV 153
GQ L IEII + L+ +
Sbjct: 1 GQSLAIEIIISFLLMFVI 18
>45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN.
Length = 119
Score = 80 (37.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05
Identities = 15/34 (44%), Positives = 24/34 (70%)
Query: 1 MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALG 34
M +EI+ FWR++++E LA ++VFI G+A G
Sbjct: 55 MQAEIRTLEFWRSIISECLASFMYVFIVCGAAAG 88
Score = 39 (18.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05
Identities = 9/17 (52%), Positives = 12/17 (70%)
Query: 53 SLAFGLSIATLAQSVGH 69
+LA GL++ATL Q H
Sbjct: 103 ALASGLAMATLTQCFLH 119
>2027 p34.2 (15) GLPF(9) AQP3(2) // PROTEIN FACILITATOR GLYCEROL UPTAKE
AQUAPORIN DIFFUSION UPTAKE/EFFLUX PEPX 5'REGION ORF1
Length = 55
Score = 60 (27.8 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05
Identities = 17/46 (36%), Positives = 20/46 (43%)
Query: 156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201
T D GG PL +G V + TG INPAR FG +
Sbjct: 10 TDDGNNVPSGGLHPLMVGFLVMGIGMSLGGTTGYAINPARDFGPRI 55
Score = 37 (17.2 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05
Identities = 7/10 (70%), Positives = 8/10 (80%)
Query: 149 LVLCVLATTD 158
L+ CVLA TD
Sbjct: 2 LIACVLALTD 11
>45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATOR PROTEIN.
Length = 26
Score = 63 (29.2 bits), Expect = 0.025, P = 0.024
Identities = 13/23 (56%), Positives = 18/23 (78%)
Query: 205 NFSNHWIFWVGPFIGSALAVLIY 227
++S WI VGP IG+ALAVL++
Sbjct: 1 DWSYAWIPVVGPVIGAALAVLVF 23
>45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5.
Length = 27
Score = 61 (28.3 bits), Expect = 0.045, P = 0.044
Identities = 11/19 (57%), Positives = 18/19 (94%)
Query: 50 VKVSLAFGLSIATLAQSVG 68
++++LAFGL+I TLAQ++G
Sbjct: 8 LQIALAFGLAIGTLAQALG 26
Parameters:
E=0.1
B=500
V=500
-ctxfactor=1.00
Query ----- As Used ----- ----- Computed ----
Frame MatID Matrix name Lambda K H Lambda K H
+0 0 BLOSUM62 0.322 0.138 0.394 same same same
Query
Frame MatID Length Eff.Length E S W T X E2 S2
+0 0 269 269 0.10 69 3 11 22 0.22 33
Statistics:
Query Expected Observed HSPs HSPs
Frame MatID High Score High Score Reportable Reported
+0 0 59 (27.4 bits) 270 (125.3 bits) 14 14
Query Neighborhd Word Excluded Failed Successful Overlaps
Frame MatID Words Hits Hits Extensions Extensions Excluded
+0 0 5349 3124825 609708 2510548 4569 2
Database: /home/phd/ut/prodom/prodom_34_2
Release date: unknown
Posted date: 12:24 PM MET DST May 06, 1998
# of letters in database: 6,740,067
# of sequences in database: 53,597
# of database sequences satisfying E: 9
No. of states in DFA: 564 (111 KB)
Total size of DFA: 226 KB (256 KB)
Time to generate neighborhood: 0.03u 0.00s 0.03t Real: 00:00:00
Time to search database: 9.80u 0.03s 9.83t Real: 00:00:10
Total cpu time: 9.90u 0.06s 9.96t Real: 00:00:10
--- END of BLASTP output
--- ------------------------------------------------------------
---
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
---
--- PLEASE quote:
--- F Corpet, J Gouzy, D Kahn (1998). The ProDom database
--- of protein domain families. Nucleic Ac Res 26:323-326.
---
--- The general WWW page is on:
---- ---------------------------------------
--- http://www.toulouse.inra.fr/prodom.html
---- ---------------------------------------
---
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
---
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=390 ==> multiple alignment, consensus, PDB and PROSITE links of domain 390
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=390 ==> graphical output of all proteins having domain 390
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45663 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45663
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45663 ==> graphical output of all proteins having domain 45663
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45611 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45611
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45611 ==> graphical output of all proteins having domain 45611
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=304 ==> multiple alignment, consensus, PDB and PROSITE links of domain 304
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=304 ==> graphical output of all proteins having domain 304
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45607 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45607
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45607 ==> graphical output of all proteins having domain 45607
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45606 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45606
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45606 ==> graphical output of all proteins having domain 45606
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=2027 ==> multiple alignment, consensus, PDB and PROSITE links of domain 2027
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=2027 ==> graphical output of all proteins having domain 2027
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45615 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45615
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45615 ==> graphical output of all proteins having domain 45615
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45638 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45638
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45638 ==> graphical output of all proteins having domain 45638
---
--- NOTE: if you want to use the link, make sure the entire line
--- is pasted as URL into your browser!
---
--- END of PRODOM
--- ------------------------------------------------------------
________________________________________________________________________________
--- Database used for sequence comparison:
--- SEQBASE RELEASE 34.0 OF EMBL/SWISS-PROT WITH 59021 SEQUENCES
The alignment that has been used as input to the network is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
________________________________________________________________________________
--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
---
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID : identifier of aligned (homologous) protein
--- STRID : PDB identifier (only for known structures)
--- PIDE : percentage of pairwise sequence identity
--- WSIM : percentage of weighted similarity
--- LALI : number of residues aligned
--- NGAP : number of insertions and deletions (indels)
--- LGAP : number of residues in all indels
--- LSEQ2 : length of aligned sequence
--- ACCNUM : SwissProt accession number
--- NAME : one-line description of aligned protein
---
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME
aqp1_rat 100 100 269 0 0 269 P29975 PROXIMAL TUBULE) (AQUAPOR
aqp1_mouse 98 99 269 0 0 269 Q02013 PROXIMAL TUBULE) (AQUAPOR
aqp1_human 93 97 269 0 0 269 P29972 PROXIMAL TUBULE) (AQUAPOR
aqp1_bovin 90 95 269 1 2 271 P47865 PROXIMAL TUBULE) (AQUAPOR
aqp1_sheep 90 94 269 2 3 272 P56401 PROXIMAL TUBULE) (AQUAPOR
aqpa_ranes 78 89 268 2 5 272 P50501 AQUAPORIN FA-CHIP.
aqp2_dasno 49 73 109 1 7 109 P79164 PROTEIN) (WCH-CD) (FRAGME
aqp2_bovin 49 73 109 1 7 109 P79099 PROTEIN) (WCH-CD) (FRAGME
aqp2_canfa 48 72 109 1 7 109 P79144 PROTEIN) (WCH-CD) (FRAGME
aqp2_rabit 48 73 109 1 7 109 P79213 PROTEIN) (WCH-CD) (FRAGME
aqp2_elema 47 72 109 1 7 109 P79168 PROTEIN) (WCH-CD) (FRAGME
aqp2_horse 47 72 109 1 7 109 P79165 PROTEIN) (WCH-CD) (FRAGME
aqp2_proha 47 73 109 1 7 109 P79229 PROTEIN) (WCH-CD) (FRAGME
mip_rat 46 73 259 1 7 261 P09011 LENS FIBER MAJOR INTRINSI
aqp2_oryaf 46 72 109 1 7 109 P79200 PROTEIN) (WCH-CD) (FRAGME
mip_mouse 46 73 261 1 7 263 P51180 LENS FIBER MAJOR INTRINSI
mip_ranpi 45 73 261 1 7 263 Q06019 LENS FIBER MAJOR INTRINSI
mip_bovin 45 73 261 1 7 263 P06624 LENS FIBER MAJOR INTRINSI
mip_human 45 73 261 1 7 263 P30301 LENS FIBER MAJOR INTRINSI
mip_chick 45 72 110 1 1 112 P28238 LENS FIBER MAJOR INTRINSI
aqp5_rat 44 71 262 2 8 265 P47864 AQUAPORIN 5.
aqp5_human 44 71 262 2 8 265 P55064 AQUAPORIN 5.
aqp2_human 44 72 261 2 8 271 P41181 PROTEIN) (WCH-CD).
aqp4_human 43 70 266 2 5 323 P55087 AQUAPORIN 4 (WCH4) (MERCU
aqp4_rat 43 70 266 2 5 323 P47863 AQUAPORIN 4 (WCH4) (MERCU
aqp4_mouse 43 69 265 3 6 322 P55088 AQUAPORIN 4 (WCH4) (MERCU
aqp2_rat 42 71 261 2 8 271 P34080 PROTEIN) (WCH-CD).
aqp2_mouse 42 71 261 2 8 271 P56402 PROTEIN) (WCH-CD).
wc2a_arath 42 67 248 4 12 287 P43286 PLASMA MEMBRANE INTRINSIC
aqp6_human 42 68 260 2 9 282 Q13520 AQUAPORIN 6 (AQUAPORIN-2
wc2c_arath 41 66 248 4 12 285 P30302 INTRINSIC PROTEIN) (WSI-T
wc2b_arath 41 66 248 4 12 285 P43287 PLASMA MEMBRANE INTRINSIC
wc1c_arath 41 65 238 4 10 286 Q08733 (TMP-B).
wc1b_arath 41 65 238 4 10 286 Q06611 (TMP-A).
tipw_lyces 40 65 237 4 10 286 Q08451 (RIPENING-ASSOCIATED MEMB
wc1a_arath 40 64 238 4 10 286 P43285 PLASMA MEMBRANE INTRINSIC
tipw_pea 40 64 237 4 11 289 P25794 RESPONSIVE PROTEIN 7A).
tipa_arath 38 64 250 3 9 268 P26587 TONOPLAST INTRINSIC PROTE
aqua_atrca 38 64 246 4 10 282 P42767 AQUAPORIN.
dip_antma 38 65 242 2 4 250 P33560 PROBABLE TONOPLAST INTRIN
aqpz_ecoli 37 59 220 4 17 231 P48838 AQUAPORIN Z (BACTERIAL NO
tip2_tobac 37 64 242 2 4 250 P24422 TONOPLAST INTRINSIC PROTE
tip1_tobac 37 64 242 2 4 250 P21653 TONOPLAST INTRINSIC PROTE
tipg_arath 33 62 241 2 4 251 P25818 TONOPLAST INTRINSIC PROTE
bib_drome 33 60 260 4 10 700 P23645 NEUROGENIC PROTEIN BIG BR
tipr_arath 33 62 243 2 4 253 P21652 TONOPLAST INTRINSIC PROTE
tipa_phavu 33 62 246 2 4 256 P23958 TONOPLAST INTRINSIC PROTE
tipg_orysa 32 62 240 2 5 250 P50156 TONOPLAST INTRINSIC PROTE
---
--- MAXHOM ALIGNMENT: IN MSF FORMAT
MSF of: /home/phd/server/work/predict_h25873-22040.hssp from: 1 to: 269
/home/phd/server/work/predict_h25873-22040.msfRet MSF: 269 Type: P 24-Nov-98 17:44:5 Check: 3448 ..
Name: predict_h258 Len: 269 Check: 8331 Weight: 1.00
Name: aqp1_rat Len: 269 Check: 8331 Weight: 1.00
Name: aqp1_mouse Len: 269 Check: 7552 Weight: 1.00
Name: aqp1_human Len: 269 Check: 6501 Weight: 1.00
Name: aqp1_bovin Len: 269 Check: 7067 Weight: 1.00
Name: aqp1_sheep Len: 269 Check: 7582 Weight: 1.00
Name: aqpa_ranes Len: 269 Check: 4844 Weight: 1.00
Name: aqp2_dasno Len: 269 Check: 8933 Weight: 1.00
Name: aqp2_bovin Len: 269 Check: 9649 Weight: 1.00
Name: aqp2_canfa Len: 269 Check: 8990 Weight: 1.00
Name: aqp2_rabit Len: 269 Check: 8787 Weight: 1.00
Name: aqp2_elema Len: 269 Check: 9381 Weight: 1.00
Name: aqp2_horse Len: 269 Check: 8993 Weight: 1.00
Name: aqp2_proha Len: 269 Check: 8855 Weight: 1.00
Name: mip_rat Len: 269 Check: 9773 Weight: 1.00
Name: aqp2_oryaf Len: 269 Check: 8554 Weight: 1.00
Name: mip_mouse Len: 269 Check: 9723 Weight: 1.00
Name: mip_ranpi Len: 269 Check: 5937 Weight: 1.00
Name: mip_bovin Len: 269 Check: 1430 Weight: 1.00
Name: mip_human Len: 269 Check: 372 Weight: 1.00
Name: mip_chick Len: 269 Check: 4658 Weight: 1.00
Name: aqp5_rat Len: 269 Check: 9033 Weight: 1.00
Name: aqp5_human Len: 269 Check: 6547 Weight: 1.00
Name: aqp2_human Len: 269 Check: 6209 Weight: 1.00
Name: aqp4_human Len: 269 Check: 2589 Weight: 1.00
Name: aqp4_rat Len: 269 Check: 4412 Weight: 1.00
Name: aqp4_mouse Len: 269 Check: 2845 Weight: 1.00
Name: aqp2_rat Len: 269 Check: 5748 Weight: 1.00
Name: aqp2_mouse Len: 269 Check: 6526 Weight: 1.00
Name: wc2a_arath Len: 269 Check: 4866 Weight: 1.00
Name: aqp6_human Len: 269 Check: 9404 Weight: 1.00
Name: wc2c_arath Len: 269 Check: 6187 Weight: 1.00
Name: wc2b_arath Len: 269 Check: 7328 Weight: 1.00
Name: wc1c_arath Len: 269 Check: 8575 Weight: 1.00
Name: wc1b_arath Len: 269 Check: 9544 Weight: 1.00
Name: tipw_lyces Len: 269 Check: 9283 Weight: 1.00
Name: wc1a_arath Len: 269 Check: 598 Weight: 1.00
Name: tipw_pea Len: 269 Check: 9253 Weight: 1.00
Name: tipa_arath Len: 269 Check: 6544 Weight: 1.00
Name: aqua_atrca Len: 269 Check: 2848 Weight: 1.00
Name: dip_antma Len: 269 Check: 9619 Weight: 1.00
Name: aqpz_ecoli Len: 269 Check: 5641 Weight: 1.00
Name: tip2_tobac Len: 269 Check: 490 Weight: 1.00
Name: tip1_tobac Len: 269 Check: 622 Weight: 1.00
Name: tipg_arath Len: 269 Check: 3231 Weight: 1.00
Name: bib_drome Len: 269 Check: 7687 Weight: 1.00
Name: tipr_arath Len: 269 Check: 4476 Weight: 1.00
Name: tipa_phavu Len: 269 Check: 5563 Weight: 1.00
Name: tipg_orysa Len: 269 Check: 3537 Weight: 1.00
//
1 50
predict_h258 MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_rat MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_mouse MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_human MASEFKKKLF WRAVVAEFLA TTLFVFISIG SALGFKYPVG NNQTAVQDNV
aqp1_bovin MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV
aqp1_sheep MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV
aqpa_ranes MASEFKKKAF WRAVIAEFLA MILFVFISIG AALGFNFPIE EKANQtqDIV
aqp2_dasno ......SVAF SRAVLAEFLA TLIFVFFGLG SALSWPQALP S.......VL
aqp2_bovin ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_canfa ......SVAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_rabit ......SIAF SRAVFAEFLA TLLFVFFGLG SALNWPSALP S.......TL
aqp2_elema ......SIAF SRAVFSEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_horse ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQAMP S.......VL
aqp2_proha ......SIAF SRAVLSEFLA TLLFVFFGLG SALNWPQALP S.......VL
mip_rat ...ELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL
aqp2_oryaf ......SIAF SKAVFSEFLA TLLFVFFGLG SALNWPQALP S.......GL
mip_mouse .MWELRSASF WRAIFAEFFA TLFYVFFGLG ASLRWA.... ...PGPLHVL
mip_ranpi .MWEFRSFSF WRAVFAEFFG TMFYVFFGLG ASLKWAAGPA .......NVL
mip_bovin .MWELRSASF WRAICAEFFA SLFYVFFGLG ASLRWA.... ...PGPLHVL
mip_human .MWELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL
mip_chick .......... .......... .......... .......... ..........
aqp5_rat MKKEVCSLAF FKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL
aqp5_human MKKEVCSVAF LKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL
aqp2_human .MWELRSIAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp4_human AFKGVWTQAF WKAVTAEFLA MLIFVLLSLG STINWG...G TEKPLPVDMV
aqp4_rat AFKGVWTQAF WKAVTAEFLA MLIFVLLSVG STINWG...G SENPLPVDMV
aqp4_mouse AFKGVWTQAF WKAVSAEFLA TLIFVL.GVG STINWG...G SENPLPVDMV
aqp2_rat .MWELRSIAF SRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL
aqp2_mouse .MWELRSIAY CRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL
wc2a_arath DGAELKKWSF YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TDAGGVdgIL
aqp6_human MLACRLWKAI SRALFAEFLA TGLYVFFGVG SVMRWPTALP S.......VL
wc2c_arath DAEELTKWSL YRAVIAEFVA TLLFLYVTVL TVIGYKIQSD TKAGGVdgIL
wc2b_arath DADELTKWSL YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TKAGGVdgIL
wc1c_arath EPGELSSWSF YRAGIAEFIA TFLFLYITVL TVMGVKRA.. PNMCASVGIQ
wc1b_arath EPGELASWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ
tipw_lyces EPGELSSWSF YRAGIAEFMA TFLFLYITIL TVMGLKRSDS LCSSV..GIQ
wc1a_arath EPGELSSWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ
tipw_pea EPSELTSWSF YRAGIAEFIA TFLFLYITVL TVMGVVRESS KCKTV..GIQ
tipa_arath RADEATHPDS IRATLAEFLS TFVFVFAAEG SILSLDKLYW EHAAHAGTni
aqua_atrca DMGELKLWSF WRAAIAEFIA TLLFLYITVA TVIGYKKETD PCASVGL..L
dip_antma SIGDSFSVAS IKAYVAEFIA TLLFVFAGVG SAIAYNKLTS DAALDPAGLV
aqpz_ecoli .........M FRKLAAECFG TFWLVFGGCG SAVLAAGFPE ....LGIGFA
tip2_tobac SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV
tip1_tobac SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV
tipg_arath RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV
bib_drome MQAEIRTLEF WRSIISECLA SFMYVFIVCG AAAGVGVGAS VSSVL....L
tipr_arath RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV
tipa_phavu RTDEATHPDS MRASLAEFAS TFIFVFAGEG SGLALVKIYQ DSAFSAGELL
tipg_orysa SHQEVYHPGA LKAALAEFIS TLIFVFAGQG SGMAFSKLTG GGATTPAGLI
51 100
predict_h258 KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_rat KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_mouse KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_human KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS IFRALMYIIA
aqp1_bovin KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS VLRAIMYIIA
aqp1_sheep KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAIMYIIA
aqpa_ranes KVSLAFGISI ATMAQSVGHV SGAHLNPAVT LGCLLSCQIS ILKAVMYIIA
aqp2_dasno QIALAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_bovin QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAVFYVAA
aqp2_canfa QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_rabit QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_elema QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRATFYLAA
aqp2_horse QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_proha QIAMAFGLAI GTLVQTLGHI SGAHINPAVT IACLVGCHVS FLRALFYLAA
mip_rat QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA
aqp2_oryaf QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRAIFYVAA
mip_mouse QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA
mip_ranpi VIALAFGLVL ATMVQSIGHV SGAHINPAVT FAFLIGSQMS LFRAIFYIAA
mip_bovin QVALAFGLAL ATLVQAVGHI SGAHVNPAVT FAFLVGSQMS LLRAICYMVA
mip_human QVAMAFGLAL ATLVQSVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYMAA
mip_chick .......... .......... .......... .......... ..........
aqp5_rat QISIAFGLAI GTLAQALGPV SGGHINPAIT LALLIGNQIS LLRAVFYVAA
aqp5_human QIALAFGLAI GTLAQALGPV SGGHINPAIT LALLVGNQIS LLRAFFYVAA
aqp2_human QIAMAFGLGI GTLVQALGHI SGAHINPAVT VACLVGCHVS VLRAAFYVAA
aqp4_human LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIAA
aqp4_rat LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYITA
aqp4_mouse LISLCFGLSI ATMVQCLGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIIA
aqp2_rat QIAVAFGLGI GILVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_mouse QIAVAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
wc2a_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LPRALLYIIA
aqp6_human QIAITFNLVT AMAVQVTWKT SGAHANPAVT LAFLVGSHIS LPRAVAYVAA
wc2c_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA
wc2b_arath GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA
wc1c_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYIVM
wc1b_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVYYIVM
tipw_lyces GVAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYMVM
wc1a_arath GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRALYYIVM
tipw_pea GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAIFYMVM
tipa_arath LVALAHAFAL FAAVSAAINV SGGHVNPAVT FGALVGGRVT AIRAIYYWIA
aqua_atrca GIAWSFGGMI FVLVYCTAGI SGGHINPAVT FGLFLARKVS LLRALVYMIA
dip_antma AVAVAHAFAL FVGVSMAANV SGGHLNPAVT LGLAVGGNIT ILTGLFYWIA
aqpz_ecoli GVALAFGLTV LTMAFAVGHI SGGHFNPAVT IGLWAGGRFP AKEVVGYVIA
tip2_tobac AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA
tip1_tobac AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA
tipg_arath AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA
bib_drome ATALASGLAM ATLTQCFLHI SGAHINPAVT LALCVVRSIS PIRAAMYITA
tipr_arath AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA
tipa_phavu ALALAHAFAL FAAVSASMHV SGGHVNPAVS FGALIGGRIS VIRAVYYWIA
tipg_orysa AAAVAHAFAL FVAVSVGANI SGGHVNPAVT FGAFVGGNIT LFRGLLYWIA
101 150
predict_h258 QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV
aqp1_rat QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV
aqp1_mouse QCVGAIVATA ILSGITSSLV DNSLGRNDLA HGVNSGQGLG IEIIGTLQLV
aqp1_human QCVGAIVATA ILSGITSSLT GNSLGRNDLA DGVNSGQGLG IEIIGTLQLV
aqp1_bovin QCVGAIVATA ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV
aqp1_sheep QCVGAIVATV ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV
aqpa_ranes QCLGAVVATA ILSGITSGLE NNSLGLNGLS PGVSAGQGLG VEILVTFQLV
aqp2_dasno QLLGAVAGAA ILHEITPPDV RG........ .......... ..........
aqp2_bovin QLLGAVAGAA LLHEITPPAI RG........ .......... ..........
aqp2_canfa QLLGAVAGAA LLHEITPPHV RG........ .......... ..........
aqp2_rabit QLLGAVAGAA LLHEITPAEV RG........ .......... ..........
aqp2_elema QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
aqp2_horse QLLGAVAGAA LLHEITPPDI RR........ .......... ..........
aqp2_proha QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
mip_rat QLLGAVAGAA VLYSVTPPAV RGNLALNTLH AGVSVGQATT VEIFLTLQFV
aqp2_oryaf QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
mip_mouse QLLGAVAGAA VLYSVTPPAV RGNLALNTLH TGVSVGQATT VEIFLTLQFV
mip_ranpi QLLGAVAGAA VLYGVTPAAI RGNLALNTLH PGVSLGQATT VEIFLTLQFV
mip_bovin QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PGVSVGQATI VEIFLTLQFV
mip_human QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PAVSVGQATT VEIFLTLQFV
mip_chick .......... .......... .......... .......... ..........
aqp5_rat QLVGAIAGAG ILYWLAPLNA RGNLAVNALN NNTTPGKAMV VELILTFQLA
aqp5_human QLVGAIAGAG ILYGVAPLNA RGNLAVNALN NNTTQGQAMV VELILTFQLA
aqp2_human QLLGAVAGAA LLHEITPADI RGDLAVNALS NSTTAGQAVT VELFLTLQLV
aqp4_human QCLGAIIGAG ILYLVTPPSV VGGLGVTMVH GNLTAGHGLL VELIITFQLV
aqp4_rat QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV
aqp4_mouse QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV
aqp2_rat QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV
aqp2_mouse QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV
wc2a_arath QCLGAICGVG FVKAFQSSYY TRYGGgnSLA DGYSTGTGLA AEIIGTFVLV
aqp6_human QLVGATVGAA LLYGVMPGDI RETLGINVVR NSVSTGQAVA VELLLTLQLV
wc2c_arath QCLGAICGVG FVKAFQSSHY VNYGGgnFLA DGYNTGTGLA AEIIGTFVLV
wc2b_arath QCLGAICGVG FRQSFQSSYY DRYGGgnSLA DGYNTGTGLA AEIIGTFVLV
wc1c_arath QCLGAICGAG VVKGFQPNPY QtgGGANTVA HGYTKGSGLG AEIIGTFVLV
wc1b_arath QCLGAICGAG VVKGFQPKQY QagGGANTIA HGYTKGSGLG AEIIGTFVLV
tipw_lyces QCLGAICGAG VVKGFMVGPY QrgGGANVVN PGYTKGDGLG AEIIGTFVLV
wc1a_arath QCLGAICGAG VVKGFQPKQY QagGGANTVA HGYTKGSGLG AEIIGTFVLV
tipw_pea QVLGAICGAG VVKGFEGKQR FGDLNgnFVA PGYTKGDGLG AEIVGTFILV
tipa_arath QLLGAILACL LLRLTTNGMR PVGFR...LA SGVGAVNGLV LEIILTFGLV
aqua_atrca QCAGAICGVG LVKAFMKGPY NqgGGANSVA LGYNKGTAFG AELIGTFVLV
dip_antma QCLGSTVACL LLKFVTNGL. ..SVPTHGVA AGMDAIQGVV MEIIITFALV
aqpz_ecoli QVVGGIVAAA LLYLIASGKT GFDAAASGFA sgYSMLSALV VELVLSAGFL
tip2_tobac QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGFQGVV MEIIITFALV
tip1_tobac QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGLQGVV MEIIITFALV
tipg_arath QLLGSVVACL ILKFATGGLA VPAFG...LS AGVGVLNAFV FEIVMTFGLV
bib_drome QCGGGIAGAA LLYGVTVPGY QGNLQAasHS AALAAWERFG VEFILTSLVV
tipr_arath QLLGSVVACL ILKFATGGLA VPPFG...LS AGVGVLNAFV FEIVMTFGLV
tipa_phavu QLLGSIVAAL VLRLVTNNMR PSGF...HVS PGVGVGHMFI LEVVMTFGLM
tipg_orysa QLLGSTVACF LLRFSTGGLA TGTFGL.... TGVSVWEALV LEIVMTFGLV
151 200
predict_h258 LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_rat LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_mouse LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_human LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_bovin LCVLATTDRR RRDLGGSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS
aqp1_sheep LCVLATTDRR RrdLGDSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS
aqpa_ranes LCVVAVTDRR RHDVSGSVPL AIGLSVALGH LIAIDYTGCG MNPARSFGSA
aqp2_dasno .......... .......... .......... .......... ..........
aqp2_bovin .......... .......... .......... .......... ..........
aqp2_canfa .......... .......... .......... .......... ..........
aqp2_rabit .......... .......... .......... .......... ..........
aqp2_elema .......... .......... .......... .......... ..........
aqp2_horse .......... .......... .......... .......... ..........
aqp2_proha .......... .......... .......... .......... ..........
mip_rat LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
aqp2_oryaf .......... .......... .......... .......... ..........
mip_mouse LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
mip_ranpi LCIFATYDER RNGRLGSVSL AIGFSLTLGH LFGLYYTGAS MNPARSFAPA
mip_bovin LCIFATYDER RNGRLGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
mip_human LCIFATYDER RNGQLGSVAL AVGFSLALGH LFGMYYTGAG MNPARSFAPA
mip_chick ........DR HDGRPGSAAL PVGFSLALGH LFGIPFTGAG MNPARSFAPA
aqp5_rat LCIFSSTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA
aqp5_human LCIFASTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA
aqp2_human LCIFASTDER RGENPGTPAL SIGFSVALGH LLGIHYTGCS MNPARSLAPA
aqp4_human FTIFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp4_rat FTIFASCDSK RTDVTGSVAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp4_mouse FTVFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp2_rat LCIFASTDER RGDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA
aqp2_mouse LCIFASTDER RSDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA
wc2a_arath YTVFSATDPK RSavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
aqp6_human LCVFASTDSR QTS..GSPAT MIGISWALGH LIGILFTGCS MNPARSFGPA
wc2c_arath YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
wc2b_arath YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAS
wc1c_arath YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
wc1b_arath YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
tipw_lyces YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
wc1a_arath YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITATG INPARSLGAA
tipw_pea YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
tipa_arath YVVYStiDPK RGSLGIIAPL AIGLIVGANI LVGGPFSGAS MNPARAFGPA
aqua_atrca YTVFSATDPK RSavPILAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
dip_antma YTVYAtaDPK KGSLGVIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
aqpz_ecoli LVIHGATDKF APA..GFAPI AIGLALTLIH LISIPVTNTS VNPARSTAVA
tip2_tobac YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
tip1_tobac YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
tipg_arath YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA
bib_drome LCYFVSTDPM KKFMGNS.AA SIGCAYSACC FVSMPYLN.. ..PARSLGPS
tipr_arath YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA
tipa_phavu YTVYGtiDPK RGAVSYIAPL AIGLIVGANI LVGGPFDGAC MNPALAFGPS
tipg_orysa YTVYAtvDPK KGSLGTIAPI AIGFIVGANI LVGGAFDGAS MNPAVSFGPA
201 250
predict_h258 VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_rat VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_mouse VLTRNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_human VITHNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqp1_bovin VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqp1_sheep VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqpa_ranes VLTKNFTYHW IFWVGPMIGG AAAAIIYDFI LAPRTSDLTD RMKVWTNGQV
aqp2_dasno .......... .......... .......... .......... ..........
aqp2_bovin .......... .......... .......... .......... ..........
aqp2_canfa .......... .......... .......... .......... ..........
aqp2_rabit .......... .......... .......... .......... ..........
aqp2_elema .......... .......... .......... .......... ..........
aqp2_horse .......... .......... .......... .......... ..........
aqp2_proha .......... .......... .......... .......... ..........
mip_rat ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP
aqp2_oryaf .......... .......... .......... .......... ..........
mip_mouse ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP
mip_ranpi VLTRNFTNHW VYWVGPIIGG ALGGLVYDFI LFPRMRGLSE RLSILKGARP
mip_bovin ILTRNFTNHW VYWVGPVIGA GLGSLLYDFL LFPRLKSVSE RLSILKGSRP
mip_human ILTGNFTNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSISE RLSVLKGAKP
mip_chick VITRNFTNHW VFWAGPLLGA ALAALLYELA LCPRARSMAE RLAV.LRGEP
aqp5_rat VVMNRFssHW VFWVGPIVGA MLAAILYFYL LFPSSLSLHD RVAVVKGTYE
aqp5_human VVMNRFsaHW VFWVGPIVGA VLAAILYFYL LFPNSLSLSE RVAIIKGTYE
aqp2_human VVTGKFDDHW VFWIGPLVGA ILGSLLYNYV LFPPAKSLSE RLAVLKGLEp
aqp4_human VIMGNWENHW IYWVGPIIGA VLAGGLYEYV FCPDVEFKRR FKEAFSKaqT
aqp4_rat VIMGNWENHW IYWVGPIIGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT
aqp4_mouse VIMGNWANHW IYWVGPIMGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT
aqp2_rat VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSAKSLQE RLAVLKGLEp
aqp2_mouse VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSTKSLQE RLAVLKGLEp
wc2a_arath VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV..
aqp6_human IIIGKFTVHW VFWVGPLMGA LLASLIYNFV LFPDTKTLAQ RLAILTGTVE
wc2c_arath VIFnpWDDHW IFWVGPFIGA TIAAFYHQFV LRASGSKSLG SFRSAANV..
wc2b_arath VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV..
wc1c_arath IIYnaWDDHW IFWVGPFIGA ALAALYHQLV IRAIPFKSRS ..........
wc1b_arath IIFnaWDDHW VFWVGPFIGA ALAALYHVIV IRAIPFKSRS ..........
tipw_lyces IIYnaWNDHW IFWVGPMIGA ALAAIYHQII IRAMPFHRS. ..........
wc1a_arath IIYnsWDDHW VFWVGPFIGA ALAALYHVVV IRAIPFKSRS ..........
tipw_pea IVFngWNDHW IFWVGPFIGA ALAALYHQVV IRAIPFKSK. ..........
tipa_arath LVGWRWHDHW IYWVGPFIGS ALAALIYEYM VIPTEPPTHH AHGVHQPLAP
aqua_atrca VIyrVWDDHW IFWVGPFVGA LAAAAYHQYV LRAAAIKALG SFRSNPTN..
dip_antma VASGDFSQNW IYWAGPLIGG ALAGFIYGDV FITAHAPLPT SEDYA.....
aqpz_ecoli IFQgaLEQLW FFWVVPIVGG IIGGLIYRTL LEKRD..... ..........
tip2_tobac VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA.....
tip1_tobac VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA.....
tipg_arath VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHEQLP TTDY......
bib_drome FVLNKWDSHW VYWFGPLVGG MASGLVYEYI FNSRNRNLRH NKGSIDNDSS
tipr_arath VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHTSSS NHRLLN....
tipa_phavu LVGWQWHQHW IFWVGPLLGA ALAALVYEYA VIPIEPPPHH HQPLATEDY.
tipg_orysa LVSWSWESQW VYWVGPLIGG GLAGVIYEVL FISHTHEQLP TTDY......
251 269
predict_h258 EEYDLDADDI NSRVEMKPK
aqp1_rat EEYDLDADDI NSRVEMKPK
aqp1_mouse EEYDLDADDI NSRVEMKPK
aqp1_human EEYDLDADDI NSRVEMKPK
aqp1_bovin EEYDLDADDI NSRVEMKPK
aqp1_sheep EEYDLDADDI NSRVEMKPK
aqpa_ranes EEYELDGDD. NTRVEMKPK
aqp2_dasno .......... .........
aqp2_bovin .......... .........
aqp2_canfa .......... .........
aqp2_rabit .......... .........
aqp2_elema .......... .........
aqp2_horse .......... .........
aqp2_proha .......... .........
mip_rat SDSNGQPEGT GEPVELKTQ
aqp2_oryaf .......... .........
mip_mouse SDSNGQPEGT GEPVELKTQ
mip_ranpi AEPEGQQEAT GEPIELKTQ
mip_bovin SESNGQPEVT GEPVELKTQ
mip_human DVSNGQPEVT GEPVELNTQ
mip_chick PAAAPPPEPP AEPLELKTQ
aqp5_rat PEEDWEDHRE ERKKTIELT
aqp5_human PDEDWEEQRE ERKKTMELT
aqp2_human tDWEEREVRR RQSVELHSP
aqp4_human KGSYMEVEDN RSQVETDDL
aqp4_rat KGSYMEVEDN RSQVETEDL
aqp4_mouse KGSYMEVEDN RSQVETEDL
aqp2_rat tDWEEREVRR RQSVELHSP
aqp2_mouse tDWEEREVRR RQSVELHSP
wc2a_arath .......... .........
aqp6_human VGTGARAGAE PLKKESQPG
wc2c_arath .......... .........
wc2b_arath .......... .........
wc1c_arath .......... .........
wc1b_arath .......... .........
tipw_lyces .......... .........
wc1a_arath .......... .........
tipw_pea .......... .........
tipa_arath EDY....... .........
aqua_atrca .......... .........
dip_antma .......... .........
aqpz_ecoli .......... .........
tip2_tobac .......... .........
tip1_tobac .......... .........
tipg_arath .......... .........
bib_drome SIHSEDELNY DMDMEKPNK
tipr_arath .......... .........
tipa_phavu .......... .........
tipg_orysa .......... .........
________________________________________________________________________________
Prediction of:
- secondary structure, by PHDsec
- solvent accessibility, by PHDacc
- and helical transmembrane regions, by PHDhtm
PHD: Profile fed neural network systems from HeiDelberg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Burkhard Rost
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Predict-Help@EMBL-Heidelberg.DE
All rights reserved.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Secondary structure prediction by PHDsec:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Burkhard Rost
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Rost@EMBL-Heidelberg.DE
All rights reserved.
About the network method
~~~~~~~~~~~~~~~~~~~~~~~
The network procedure is described in detail in:
1) Rost, Burkhard; Sander, Chris:
Prediction of protein structure at better than 70% accuracy.
J. Mol. Biol., 1993, 232, 584-599.
A brief description is given in:
Rost, Burkhard; Sander, Chris:
Improved prediction of protein secondary structure by use of se-
quence profiles and neural networks.
Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.
The PHD mail server is described in:
2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:
PHD - an automatic mail server for protein secondary structure
prediction.
CABIOS, 1994, 10, 53-60.
The latest improvement steps (up to 72%) are explained in:
3) Rost, Burkhard; Sander, Chris:
Combining evolutionary information and neural networks to predict
protein secondary structure.
Proteins, 1994, 19, 55-72.
To be quoted for publications of PHD output:
Papers 1-3 for the prediction of secondary structure and the pre-
diction server.
About the input to the network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The prediction is performed by a system of neural networks.
The input is a multiple sequence alignment. It is taken from an HSSP
file (produced by the program MaxHom:
Sander, Chris & Schneider, Reinhard: Database of Homology-Derived
Structures and the Structural Meaning of Sequence Alignment.
Proteins, 1991, 9, 56-68.
For optimal results the alignment should contain sequences with varying
degrees of sequence similarity relative to the input protein.
The following is an ideal situation:
+-----------------+----------------------+
| sequence: | sequence identity |
+-----------------+----------------------+
| target sequence | 100 % |
| aligned seq. 1 | 90 % |
| aligned seq. 2 | 80 % |
| ... | ... |
| aligned seq. 7 | 30 % |
+-----------------+----------------------+
Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A careful cross validation test on some 250 protein chains (in total
about 55,000 residues) with less than 25% pairwise sequence identity
gave the following results:
++================++-----------------------------------------+
|| Qtotal = 72.1% || ("overall three state accuracy") |
++================++-----------------------------------------+
+----------------------------+-----------------------------+
| Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |
| Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |
| Qloop (% of observed)=79% | Qloop (% of predicted)=72% |
+----------------------------+-----------------------------+
..........................................................................
These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| number of correctly predicted residues
|Qtotal = --------------------------------------- (*100)
| number of all residues
|
| no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
| no of all res observed to be in helix
|
|
| no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
| no of all residues predicted to be in helix
..........................................................................
Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues. However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well. Computing first the three state
accuracy for each protein chain, and then averaging over 250 chains
yields the following average:
+-------------------------------====--+
| Qtotal/averaged over chains = 72.2% |
+-------------------------------====--+
| standard deviation = 9.3% |
+-------------------------------------+
..........................................................................
Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthews correlation coefficient:
+---------------------------------------------+
| Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |
+---------------------------------------------+
..........................................................................
Average length of predicted secondary structure segments:
. +------------+----------+
. | predicted | observed |
+-----------+------------+----------+
| Lhelix = | 10.3 | 9.3 |
| Lstrand = | 5.0 | 5.3 |
| Lloop = | 7.2 | 5.9 |
+-----------+------------+----------+
..........................................................................
The accuracy matrix in detail:
+---------------------------------------+
| number of residues with H, E, L |
+---------+------+------+------+--------+
| |net H |net E |net L |sum obs |
+---------+------+------+------+--------+
| obs H |12447 | 1255 | 3990 | 17692 |
| obs E | 949 | 7493 | 3750 | 12192 |
| obs L | 2604 | 2875 |19962 | 25441 |
+---------+------+------+------+--------+
| sum Net |16000 |11623 |27702 | 55325 |
+---------+------+------+------+--------+
Note: This table is to be read in the following manner:
12447 of all residues predicted to be in helix, were observed to
be in helix, 949 however belong to observed strands, 2604 to
observed loop regions. The term "observed" refers to the DSSP
assignment of secondary structure calculated from 3D coordinates
of experimentally determined structures (Dictionary of Secondary
Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
2577-2637).
Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The network predicts the three secondary structure types using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all"). However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit can be used to derive a "reliability index". This index is given
for each residue along with the prediction. The index is scaled to
have values between 0 (lowest reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | | | | | | |
| Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|
| | | | | | | | | | | |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|
| E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|
| | | | | | | | | | | |
| H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|
| E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
The above table gives the cumulative results, e.g. 62.5% of all
residues have a reliability of at least 5. The overall three-state
accuracy for this subset of almost two thirds of all residues is 82.9%.
For this subset, e.g., 83.1% of the observed helices are correctly
predicted, and 86.9% of all residues predicted to be in helix are
correct.
..........................................................................
The following table gives the non-cumulative quantities, i.e. the
values per reliability index range. These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10.5| 12.5| 15.7| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | | | | | |
| Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|
| | | | | | | | | | |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|
| E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|
| | | | | | | | | | |
| H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|
| E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
For example, for residues with Relindex = 5 64% of all predicted betha-
strand residues are correctly identified.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Solvent accessibility prediction by PHDacc:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Burkhard Rost
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Rost@EMBL-Heidelberg.DE
All rights reserved.
About the network method
~~~~~~~~~~~~~~~~~~~~~~~
The network for prediction of secondary structure is described in
detail in:
Rost, Burkhard; Sander, Chris:
Prediction of protein structure at better than 70% accuracy.
J. Mol. Biol., 1993, 232, 584-599.
The analysis of the prediction of solvent exposure is given in:
Rost, Burkhard; Sander, Chris:
Conservation and prediction of solvent accessibility in protein
families. Proteins, 1994, 20, 216-226.
To be quoted for publications of PHD exposure prediction:
Both papers quoted above.
Definition of accessibility
~~~~~~~~~~~~~~~~~~~~~~~~~~
For training the residue solvent accessibility the DSSP (Dictionary of
Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,
2577-2637) values of accessible surface area have been used. The
prediction provides values for the relative solvent accessibility. The
normalisation is the following:
| ACCESSIBILITY (from DSSP in Angstrom)
|RELATIVE_ACCESSIBILITY = ------------------------------------- * 100
| MAXIMAL_ACC (amino acid type i)
where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.
The maximal values are:
+----+----+----+----+----+----+----+----+----+----+----+----+
| A | B | C | D | E | F | G | H | I | K | L | M |
| 106| 160| 135| 163| 194| 197| 84| 184| 169| 205| 164| 188|
+----+----+----+----+----+----+----+----+----+----+----+----+
| N | P | Q | R | S | T | V | W | X | Y | Z |
| 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|
+----+----+----+----+----+----+----+----+----+----+----+
Notation: one letter code for amino acid, B stands for D or N; Z stands
for E or Q; and X stands for undetermined.
The relative solvent accessibility can be used to estimate the number
of water molecules (W) in contact with the residue:
W = ACCESSIBILITY /10
The prediction is given in 10 states for relative accessibility, with
RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)
where PREDICTED_ACC = 0 - 9.
Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A careful cross validation test on some 238 protein chains (in total
about 62,000 residues) with less than 25% pairwise sequence identity
gave the following results:
Correlation
...........
The correlation between observed and predicted solvent accessibility
is:
-----------
corr = 0.53
-----------
This value ought to be compared to the worst and best case prediction
scenario: random prediction (corr = 0.0) and homology modelling
(corr = 0.66). (Note: homology modelling yields a relative accurate
prediction in 3D if, and only if, a significantly identical sequence
has a known 3D structure.)
3-state accuracy
................
Often the relative accessibility is projected onto, e.g., 3 states:
b = buried (here defined as < 9% relative accessibility),
i = intermediate ( 9% <= rel. acc. < 36% ),
e = exposed ( rel. acc. >= 36% ).
A projection onto 3 states or 2 states (buried/exposed) enables the
compilation of a 3- and 2-state prediction accuracy. PHD reaches an
overall 3-state accuracy of:
Q3 = 57.5%
(compared to 35% for random prediction and 70% for homology modelling).
In detail:
+-----------------------------------+-------------------------+
| Qburied (% of observed)=77% | Qb (% of predicted)=60% |
| Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |
| Qexposed (% of observed)=78% | Qe (% of predicted)=56% |
+-----------------------------------+-------------------------+
10-state accuracy
.................
The network predicts relative solvent accessibility in 10 states, with
state i (i = 0-9) corresponding to a relative solvent accessibility of
i*i %. The 10-state accuracy of the network is:
Q10 = 24.5%
..........................................................................
These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| number of correctly predicted residues
|Q3 = --------------------------------------- (*100)
| number of all residues
|
| no of res. correctly predicted to be buried
|Qburied (% of obs) = ------------------------------------------- (*100)
| no of all res. observed to be buried
|
|
| no of res. correctly predicted to be buried
|Qburied (% of pred)= ------------------------------------------- (*100)
| no of all residues predicted to be buried
..........................................................................
Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues. However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well. Computing first the correlation
between observed and predicted accessibility for each protein chan, and
then averaging over all 238 chains yields the following average:
+-------------------------------====--+
| corr/averaged over chains = 0.53 |
+-------------------------------====--+
| standard deviation = 0.11 |
+-------------------------------------+
..........................................................................
Further details of performance accuracy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The accuracy matrix in detail:
..............................
-------+----------------------------------------------------+-----------
\ PHD | 0 1 2 3 4 5 6 7 8 9 | SUM %obs
-------+----------------------------------------------------+-----------
OBS 0 | 8611 140 8 44 82 169 772 334 27 0 | 10187 16.6
OBS 1 | 4367 164 0 50 106 231 738 346 44 3 | 6049 9.8
OBS 2 | 3194 168 1 68 125 303 951 513 42 7 | 5372 8.7
OBS 3 | 2760 159 8 80 136 327 1246 746 58 19 | 5539 9.0
OBS 4 | 2312 144 2 72 166 396 1615 1245 124 19 | 6095 9.9
OBS 5 | 1873 96 3 84 138 425 1979 1834 187 27 | 6646 10.8
OBS 6 | 1387 67 1 60 80 278 2237 2627 231 51 | 7019 11.4
OBS 7 | 1082 35 0 32 56 225 1871 3107 302 60 | 6770 11.0
OBS 8 | 660 25 0 27 43 136 1206 2374 325 87 | 4883 7.9
OBS 9 | 325 20 2 27 29 74 648 1159 366 214 | 2864 4.7
-------+----------------------------------------------------+-----------
SUM |26571 1018 25 544 961 2564 13263 14285 1706 487 |
-------+----------------------------------------------------+-----------
Note: This table is to be read in the following manner:
8611 of all residues predicted to be in exposed by 0%, were
observed with 0% relative accessibility. However, 325 of all
residues predicted to have 0% are observed as completely exposed
(obs = 9 -> rel. acc. >= 81%). The term "observed" refers to the
DSSP compilation of area of solvent accessibility calculated from
3D coordinates of experimentally determined structures (Diction-
ary of Secondary Structure of Proteins: Kabsch & Sander (1983)
Biopolymers, 22, 2577-2637).
Accuracy for each amino acid:
.............................
+---+------------------------------+-----+-------+------+
|AA | Q3 b%o b%p i%o i%p e%o e%p | Q10 | corr | N |
+---+------------------------------+-----+-------+------+
| A | 59.0 87 60 2 38 66 57 | 31 | 0.530 | 5054 |
| C | 62.0 91 67 5 39 25 21 | 34 | 0.244 | 893 |
| D | 56.5 21 45 6 49 94 57 | 20 | 0.321 | 3536 |
| E | 60.8 9 40 3 41 98 61 | 21 | 0.347 | 3743 |
| F | 63.3 94 67 9 46 29 37 | 27 | 0.366 | 2436 |
| G | 52.1 75 51 1 31 67 53 | 22 | 0.405 | 4787 |
| H | 50.9 63 53 23 45 71 50 | 18 | 0.442 | 1366 |
| I | 64.9 95 68 6 41 30 38 | 34 | 0.360 | 3437 |
| K | 66.6 2 11 2 37 98 67 | 23 | 0.267 | 3652 |
| L | 61.6 93 65 8 44 31 40 | 31 | 0.368 | 5016 |
| M | 60.1 92 64 5 39 45 44 | 29 | 0.452 | 1371 |
| N | 55.5 45 45 8 38 87 59 | 17 | 0.410 | 2923 |
| P | 53.0 48 48 9 39 83 56 | 18 | 0.364 | 2920 |
| Q | 54.3 27 44 7 44 92 56 | 20 | 0.344 | 2225 |
| R | 49.9 15 47 36 47 76 51 | 18 | 0.372 | 2765 |
| S | 55.6 69 53 3 51 81 56 | 22 | 0.464 | 3981 |
| T | 51.8 61 51 8 38 78 53 | 21 | 0.432 | 3740 |
| V | 61.1 93 65 5 40 39 42 | 34 | 0.418 | 4156 |
| W | 56.2 85 62 20 49 29 27 | 21 | 0.318 | 891 |
| Y | 49.7 73 52 33 49 36 38 | 19 | 0.359 | 2301 |
+---+------------------------------+-----+-------+------+
Abbreviations:
AA: amino acid in one-letter code
b%o, i%o, e%o: = Qburied, Qintermediate, Qexposed (% of observed),
i.e. percentage of correct prediction in each state, see above
b%p, i%p, e%p: = Qburied, Qintermediate, Qexposed (% of predicted),
i.e. probability of correct prediction in each state, see above
b%o: = Qburied (% of observed), see above
Q10: percentage of correctly predicted residues in each of the 10
states of predicted relative accessibility.
corr: correlation between predicted and observed rel. acc.
N: number of residues in data set
Accuracy for different secondary structure:
...........................................
+--------+------------------------------+----+-------+-------+
| type | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | N |
+--------+------------------------------+----+-------+-------+
| helix | 59.5 79 64 8 44 80 56 | 27 | 0.574 | 20100 |
| strand | 61.3 84 73 9 46 69 37 | 35 | 0.524 | 13356 |
| loop | 54.4 64 43 11 44 78 61 | 18 | 0.442 | 27968 |
+--------+------------------------------+----+-------+-------+
Abbreviations as before.
Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The network predicts the 10 states for relative accessibility using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all"). However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit (with the constraint that the second largest output is compiled
among all units at least 2 positions off the maximal unit) can be used
to derive a "reliability index". This index is given for each residue
along with the prediction. The index is scaled to have values between
0 (lowest reliability), and 9 (highest).
The accuracies (Q3, corr, asf.) to be expected for residues with values
above a particular value of the index are given below as well as the
fraction of such residues (%res).:
+---+------------------------------+----+-------+-------+
|RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res |
+---+------------------------------+----+-------+-------+
| 0 | 57.5 77 60 9 44 78 56 | 24 | 0.535 | 100.0 |
| 1 | 59.1 76 63 9 45 82 57 | 25 | 0.560 | 91.2 |
| 2 | 61.7 79 66 4 47 87 58 | 27 | 0.594 | 77.1 |
| 3 | 66.6 87 70 1 51 89 63 | 30 | 0.650 | 57.1 |
| 4 | 70.0 89 72 0 83 91 67 | 32 | 0.686 | 45.8 |
| 5 | 72.9 92 75 0 0 93 70 | 34 | 0.722 | 35.6 |
| 6 | 76.3 95 77 0 0 93 75 | 36 | 0.769 | 24.7 |
| 7 | 79.0 97 79 0 0 93 78 | 39 | 0.803 | 16.0 |
| 8 | 80.9 98 80 0 0 91 81 | 43 | 0.824 | 9.6 |
| 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 |
+---+------------------------------+----+-------+-------+
Abbreviations as before.
The above table gives the cumulative results, e.g. 45.8% of all
residues have a reliability of at least 4. The correlation for this
most reliably predicted half of the residues is 0.686, i.e. a value
comparable to what could be expected if homology modelling were
possible. For this subset of 45.8% of all residues, 89% of the buried
residues are correctly predicted, and 72% of all residues predicted to
be buried are correct.
..........................................................................
The following table gives the non-cumulative quantities, i.e. the
values per reliability index range. These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.
+---+------------------------------+----+-------+-------+
|RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res |
+---+------------------------------+----+-------+-------+
| 0 | 40.9 79 40 16 41 21 40 | 14 | 0.175 | 8.8 |
| 1 | 45.4 61 46 28 44 48 44 | 17 | 0.278 | 14.1 |
| 2 | 47.4 53 52 10 46 80 44 | 19 | 0.343 | 19.9 |
| 3 | 52.9 75 59 4 50 77 47 | 23 | 0.439 | 11.4 |
| 4 | 60.0 81 63 0 83 84 56 | 25 | 0.547 | 10.1 |
| 5 | 65.2 82 70 0 0 93 62 | 28 | 0.607 | 10.9 |
| 6 | 71.3 90 72 0 0 94 70 | 31 | 0.692 | 8.8 |
| 7 | 76.0 94 76 0 0 95 75 | 34 | 0.762 | 6.3 |
| 8 | 80.5 97 81 0 0 94 79 | 39 | 0.808 | 3.8 |
| 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 |
+---+------------------------------+----+-------+-------+
For example, for residues with RI = 4 83% of all predicted intermediate
residues are correctly predicted as such.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prediction of helical transmembrane segments by PHDhtm:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Author: Burkhard Rost
EMBL, Heidelberg, FRG
Meyerhofstrasse 1, 69 117 Heidelberg
Internet: Rost@EMBL-Heidelberg.DE
All rights reserved.
About the network method
~~~~~~~~~~~~~~~~~~~~~~~
The PHD mail server is described in:
Rost, Burkhard; Sander, Chris; Schneider, Reinhard:
PHD - an automatic mail server for protein secondary structure
prediction.
CABIOS, 1994, 10, 53-60.
To be quoted for publications of PHDhtm output:
Rost, Burkhard; Casadio, Rita; Fariselli, Piero; Sander, Chris:
Prediction of helical transmembrane segments at 95% accuracy.
Protein Science, 1995, 4, 521-533.
Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A cross validation test on 69 helical trans-membrane proteins (in total
about 30,000 residues) with less than 25% pairwise sequence identity
gave the following results:
++================++-----------------------------------------+
|| Qtotal = 94.7% || ("overall two state accuracy") |
++================++-----------------------------------------+
+----------------------------+-----------------------------+
| Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% |
| Qloop (% of observed)=96% | Qloop (% of predicted)=97% |
+----------------------------+-----------------------------+
..........................................................................
These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| number of correctly predicted residues
|Qtotal = --------------------------------------- (*100)
| number of all residues
|
| no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
| no of all res observed to be in helix
|
|
| no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
| no of all residues predicted to be in helix
..........................................................................
Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Matthews correlation coefficient:
+---------------------------------------------+
| Chelix = 0.84, Cloop = 0.84 |
+---------------------------------------------+
..........................................................................
Average length of predicted secondary structure segments:
| +------------+----------+
| | predicted | observed |
+-----------+------------+----------+
| Lhelix = | 24.6 | 22.2 |
+-----------+------------+----------+
..........................................................................
The accuracy matrix in detail:
+---------------------------------+
| number of residues with H, L |
+---------+------+-------+--------+
| |net H | net L |sum obs |
+---------+------+-------+--------+
| obs H | 5214 | 492 | 5706 |
| obs L | 1050 | 22423 | 23473 |
+---------+------+-------+--------+
| sum Net | 6264 | 22915 | 29179 |
+---------+------+-------+--------+
Note: This table is to be read in the following manner:
5214 of all residues predicted to be in a helical trans-membrane
region, were observed to be in the lipid bilayer, 1050 however
were observed either inside or outside of the protein, i.e. in
loop (or non-membrane) regions. The term "observed" refers to DSSP
assignment of secondary structure calculated from 3D coordinates
of experimentally determined structures (Dictionary of Secondary
Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
2577-2637) where these were available. For all other proteins,
the assignment of trans-membrane segments has been taken from the
Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT
protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992).
..........................................................................
Overlap between predicted and observed segments:
+-----------------+---------------+----------------+
| segment overlap | % of observed | % of predicted |
| Sov helix | 95.6% | 95.5% |
| Sov loop | 83.6% | 97.2% |
+-----------------+---------------+----------------+
| Sov total | 86.0% | 96.8% |
+-----------------+---------------+----------------+
Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.
As helical trans-membrane segments are longer than globular heli-
ces, correctly predicted segments can easily be made out. PHDhtm
misses 5 out of 258 observed segments, predicts 6 where non is
observed and 3 times the predicted helical segment overlaps two
observed regions. Thus, in total more than 95% of all segments
are correctly predicted.
..........................................................................
Entropy of prediction (information measure):
+-----------------+
| I = 0.64 |
+-----------------+
(For comparison: homology modelling of globular proteins in three
states: I=0.62.)
Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.
Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The network predicts two states: helical trans-membrane region and rest
using two output units. The prediction is assigned by choosing the ma-
ximal unit ("winner takes all"). However, the real numbers of the out-
put units contain additional information.
E.g. the difference between the two output units can be used to derive
a "reliability index". This index is given for each residue along with
the prediction. The index is scaled to have values between 0 (lowest
reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| | | | | | | | | | | |
| Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8|
| | | | | | | | | | | |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7|
| L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0|
| | | | | | | | | | | |
| H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1|
| L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
The above table gives the cumulative results, e.g. 92.3% of all
residues have a reliability of at least 5. The overall two-state
accuracy for this subset is 97.2%. For this subset, e.g., 95.7% of
the observed helical trans-membrane residues are correctly predicted,
and 89.7% of all residues predicted to be in helical trans-membrane
segment are correct.
The resulting network (PHD) prediction is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
________________________________________________________________________________
PHD: Profile fed neural network systems from HeiDelberg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prediction of:
secondary structure, by PHDsec
solvent accessibility, by PHDacc
and helical transmembrane regions, by PHDhtm
Author:
Burkhard Rost
EMBL, 69012 Heidelberg, Germany
Internet: Rost@EMBL-Heidelberg.DE
All rights reserved.
The network systems are described in:
PHDsec: B Rost & C Sander: JMB, 1993, 232, 584-599.
B Rost & C Sander: Proteins, 1994, 19, 55-72.
PHDacc: B Rost & C Sander: Proteins, 1994, 20, 216-226.
PHDhtm: B Rost et al.: Prot. Science, 1995, 4, 521-533.
Some statistics
~~~~~~~~~~~~~~~
Percentage of amino acids:
+--------------+--------+--------+--------+--------+--------+
| AA: | L | A | S | G | I |
| % of AA: | 13.0 | 10.0 | 9.7 | 8.9 | 8.6 |
+--------------+--------+--------+--------+--------+--------+
| AA: | V | R | T | F | D |
| % of AA: | 7.8 | 5.2 | 4.5 | 4.5 | 4.5 |
+--------------+--------+--------+--------+--------+--------+
| AA: | N | Q | E | P | K |
| % of AA: | 4.1 | 3.0 | 3.0 | 2.6 | 2.6 |
+--------------+--------+--------+--------+--------+--------+
| AA: | Y | M | W | H | C |
| % of AA: | 1.9 | 1.9 | 1.5 | 1.5 | 1.5 |
+--------------+--------+--------+--------+--------+--------+
Percentage of secondary structure predicted:
+--------------+--------+--------+--------+
| SecStr: | H | E | L |
| % Predicted: | 43.9 | 16.7 | 39.4 |
+--------------+--------+--------+--------+
According to the following classes:
all-alpha: %H>45 and %E< 5; all-beta : %H<5 and %E>45
alpha-beta : %H>30 and %E>20; mixed: rest,
this means that the predicted class is: mixed class
PHD output for your protein
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tue Nov 24 17:44:57 1998
Jury on: 10 different architectures (version 5.94_317 ).
Note: differently trained architectures, i.e., different versions can
result in different predictions.
About the protein
~~~~~~~~~~~~~~~~~
HEADER /home/phd/server/work/predict_h25873-220
COMPND
SOURCE
AUTHOR
SEQLENGTH 269
NCHAIN 1 chain(s) in predict_h25873-22040 data set
NALIGN 48
(=number of aligned sequences in HSSP file)
Abbreviations: PHDsec
~~~~~~~~~~~~~~~~~~~~~
sequence:
AA : amino acid sequence
secondary structure:
HEL: H=helix, E=extended (sheet), blank=other (loop)
PHD: Profile network prediction HeiDelberg
Rel: Reliability index of prediction (0-9)
detail:
prH: 'probability' for assigning helix
prE: 'probability' for assigning strand
prL: 'probability' for assigning loop
note: the 'probabilites' are scaled to the interval 0-9, e.g.,
prH=5 means, that the first output node is 0.5-0.6
subset:
SUB: a subset of the prediction, for all residues with an expected
average accuracy > 82% (tables in header)
note: for this subset the following symbols are used:
L: is loop (for which above " " is used)
".": means that no prediction is made for this residue, as the
reliability is: Rel < 5
Abbreviations: PHDacc
~~~~~~~~~~~~~~~~~~~~~
SS : secondary structure
HEL: H=helix, E=extended (sheet), blank=other (loop)
solvent accessibility:
3st: relative solvent accessibility (acc) in 3 states:
b = 0-9%, i = 9-36%, e = 36-100%.
PHD: Profile network prediction HeiDelberg
Rel: Reliability index of prediction (0-9)
O_3: observed relative acc. in 3 states: B, I, E
note: for convenience a blank is used intermediate (i).
P_3: predicted relative accessibility in 3 states
10st:relative accessibility in 10 states:
= n corresponds to a relative acc. of n*n %
subset:
SUB: a subset of the prediction, for all residues with an expected
average correlation > 0.69 (tables in header)
note: for this subset the following symbols are used:
"I": is intermediate (for which above " " is used)
".": means that no prediction is made for this residue, as the
reliability is: Rel < 4
Abbreviations: PHDhtm
~~~~~~~~~~~~~~~~~~~~~
secondary structure:
HL: T=helical transmembrane region, blank=other (loop)
PHD: Profile network prediction HeiDelberg
PHDF:filtered prediction, i.e., too long transmembrane segments
are split, too short ones are deleted
Rel: Reliability index of prediction (0-9)
detail:
prH: 'probability' for assigning helical transmembrane region
prL: 'probability' for assigning loop
note: the 'probabilites' are scaled to the interval 0-9, e.g.,
prH=5 means, that the first output node is 0.5-0.6
subset:
SUB: a subset of the prediction, for all residues with an expected
average accuracy > 82% (tables in header)
note: for this subset the following symbols are used:
L: is loop (for which above " " is used)
".": means that no prediction is made for this residue, as the
reliability is: Rel < 5
protein: predict length 269
....,....1....,....2....,....3....,....4....,....5....,....6
AA |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI|
PHD sec | HHHHHHHHHHHHHHHHHHHHHHHHHHEE HHHHHHHHHHHHH|
Rel sec |998443148899999999999998997676530312469989998623353579999999|
detail:
prH sec |001223468899999999999998888777653112210000000145566788999999|
prE sec |000011000000000000000001001111233542100000000000323211000000|
prL sec |998665420100000000000000000011112244578988998753100000000000|
subset: SUB sec |LLL.....HHHHHHHHHHHHHHHHHHHHHHH......LLLLLLLLL...H.HHHHHHHHH|
ACCESSIBILITY
3st: P_3 acc |eeeebee bbb bbbbbbbbbbbbbbbbbbbbbebeee eeeeeeeeebbbbbbbbbbbb|
10st: PHD acc |997706650005000000000000000000000607775779776677000000000000|
Rel acc |735421110541467608662789996343122133420454330023453975664547|
subset: SUB acc |e.ee.....bb.bbbb.bbb.bbbbbb.b.......e..eee......bb.bbbbbbbbb|
....,....7....,....8....,....9....,....10...,....11...,....12
AA |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL|
PHD sec |HHHHHHHHHE HHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH |
Rel sec |999996412122653167703135552356779999999999999999999998467213|
detail:
prH sec |998986544334223477843456665567779999999999999999999998611343|
prE sec |001001123420010000145432101221110000000000000000000000000000|
prL sec |000001232245765521000000123210000000000000000000000000278555|
subset: SUB sec |HHHHHH......LL..HHH....HHH..HHHHHHHHHHHHHHHHHHHHHHHHHH.LL...|
ACCESSIBILITY
3st: P_3 acc |bbbbebbbebbbbbb bbbbbbbbbbbebbbbbbbbbbbbbbbbbbbbbbbbeebbeeeb|
10st: PHD acc |000060006000000500000000000600000000000000000000000067006760|
Rel acc |456515321655013144869663400154551757478936465465467713401400|
subset: SUB acc |bbbb.b...bbb....bbbbbbb.b...bbbb.bbbbbbb.bbbbbbbbbbb..b..e..|
....,....13...,....14...,....15...,....16...,....17...,....18
AA |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH|
PHD sec | HHH EEEEEEEEEEEEEEEEEEE E E HHHHHH|
Rel sec |359985212134223651899898866789799875436658889963211351457756|
detail:
prH sec |320002345432332111000000000000100000221120000000001113567767|
prE sec |100000000000011014899888877789789886100000000013544222221111|
prL sec |568986543466545763100000011100000112567768889975454564210111|
subset: SUB sec |.LLLLL.........LL.EEEEEEEEEEEEEEEEEE..LLLLLLLLL.....L..HHHHH|
ACCESSIBILITY
3st: P_3 acc |eeebbbebbbeebeebeebbbbbbbbbbbbbbbbbbbeeeeeeeebbbbbbbbbbbbbbb|
10st: PHD acc |677000600077076077000000000000000000077767767000000000000000|
Rel acc |133100124043040233247198656399879530035414413123255869586654|
subset: SUB acc |........b.e..e.....bb.bbbbb.bbbbbb....ee.ee......bbbbbbbbbbb|
....,....19...,....20...,....21...,....22...,....23...,....24
AA |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD|
PHD sec |HEEEE E HHHEEEE EEEEEE HHHHHHHHHHHHHEEEEE |
Rel sec |321341126989622145152653534229996251699999999973147525556642|
detail:
prH sec |521100000000145432463121122000000114789999999875421111121124|
prE sec |244564431000000000015765121358997510000000000013467642110000|
prL sec |233234457889754567411012655530002364200000000010010136667765|
subset: SUB sec |........LLLLL....H.H.EE.L....EEEE.L.HHHHHHHHHHH...EE.LLLLL..|
ACCESSIBILITY
3st: P_3 acc |bbbbebbbbbbebb bbbbbbbbeebeebbbbbbbbbbbbbbbbbbbbbbbbeeeee ee|
10st: PHD acc |000060000006005000000007606600000000000000000000000076777577|
Rel acc |754424240102242141047612131118967874356346635751777031345044|
subset: SUB acc |bbbb.b.b.....b..b..bbb.......bbbbbbb.bb.bbb.bbb.bbb....ee.ee|
....,....25...,....26...,....27...,....28...,....29...,....30
AA |RMKVWTSGQVEEYDLDADDINSRVEMKPK|
PHD sec |HHHHHH |
Rel sec |66775259975467555457776422699|
detail:
prH sec |77887520012221222221111100000|
prE sec |00000000000000000000001233200|
prL sec |11112379987678777678887655799|
subset: SUB sec |HHHHH.LLLLL.LLLLL.LLLLL...LLL|
ACCESSIBILITY
3st: P_3 acc |ebebbeeeeeeeeeeeeeeeeeebeeeee|
10st: PHD acc |60700787677777677777767067789|
Rel acc |10411563134335144444514212559|
subset: SUB acc |..e..ee...e..e.eeeeee.e...eee|
PHDhtm Helical transmembrane prediction
note: PHDacc and PHDsec are reliable for water-
soluble globular proteins, only. Thus,
please take the predictions above with
particular caution wherever transmembrane
helices are predicted by PHDhtm!
PHDhtm
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION: SYMBOLS
--- AA : amino acid in one-letter code
--- PHD htm : HTM's predicted by the PHD neural network
--- system (T=HTM, ' '=not HTM)
--- Rel htm : Reliability index of prediction (0-9, 0 is low)
--- detail : Neural network output in detail
--- prH htm : 'Probability' for assigning a helical trans-
--- membrane region (HTM)
--- prL htm : 'Probability' for assigning a non-HTM region
--- note: 'Probabilites' are scaled to the interval
--- 0-9, e.g., prH=5 means, that the first
--- output node is 0.5-0.6
--- subset : Subset of more reliable predictions
--- SUB htm : All residues for which the expected average
--- accuracy is > 82% (tables in header).
--- note: for this subset the following symbols are used:
--- L: is loop (for which above ' ' is used)
--- '.': means that no prediction is made for this,
--- residue as the reliability is: Rel < 5
--- other : predictions derived based on PHDhtm
--- PHDFhtm : filtered prediction, i.e., too long HTM's are
--- split, too short ones are deleted
--- PHDRhtm : refinement of neural network output
--- PHDThtm : topology prediction based on refined model
--- symbols used:
--- i: intra-cytoplasmic
--- T: transmembrane region
--- o: extra-cytoplasmic
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION
....,....1....,....2....,....3....,....4....,....5....,....6
AA |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI|
PHD htm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTT|
detail: | |
prH htm |000000000001136788999999999988875321110000000123678889999988|
prL htm |999999999998863211000000000011124678889999999876321110000011|
other: | |
PHDFhtm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTT|
PHDRhtm | TTTTTTTTTTTTTTTTTT TTTTTTTTTTT|
PHDThtm |iiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTToooooooooooooooooTTTTTTTTTTT|
subset: | |
SUB htm |............................................................|
....,....7....,....8....,....9....,....10...,....11...,....12
AA |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL|
PHD htm |TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
detail: | |
prH htm |888888877777666677788888888888888888888888888888888876543211|
prL htm |111111122222333322211111111111111111111111111111111123456788|
other: | |
PHDFhtm |TTTTTTTTTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
PHDRhtm |TTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTT |
PHDThtm |TTTTTTTTiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTTTTTTTTToooooooooooooo|
subset: | |
SUB htm |............................................................|
....,....13...,....14...,....15...,....16...,....17...,....18
AA |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH|
PHD htm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTT|
detail: | |
prH htm |000000000001234567788888999988887643211111111235788899998888|
prL htm |999999999998765432211111000011112356788888888764211100001111|
other: | |
PHDFhtm | TTTTTTTTTTTTTTTTTTT TTTTTTTTTTTTT|
PHDRhtm | TTTTTTTTTTTTTTTTTT TTTTTTTTTTTT|
PHDThtm |ooooooooooooooooTTTTTTTTTTTTTTTTTTiiiiiiiiiiiiiiTTTTTTTTTTTT|
subset: | |
SUB htm |............................................................|
....,....19...,....20...,....21...,....22...,....23...,....24
AA |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD|
PHD htm |TTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
detail: | |
prH htm |888887765443432233334566777777788888888888888888887542100000|
prL htm |111112234556567766665433222222211111111111111111112457899999|
other: | |
PHDFhtm |TTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
PHDRhtm |TTTTTT TTTTTTTTTTTTTTTTTTT |
PHDThtm |TTTTTToooooooooooooooooooooooooTTTTTTTTTTTTTTTTTTTiiiiiiiiii|
subset: | |
SUB htm |............................................................|
....,....25...,....26...,....27...,....28...,....29...,....30
AA |RMKVWTSGQVEEYDLDADDINSRVEMKPK|
PHD htm | |
detail: | |
prH htm |00000000000000000000000000000|
prL htm |99999999999999999999999999999|
other: | |
PHDFhtm | |
PHDRhtm | |
PHDThtm |iiiiiiiiiiiiiiiiiiiiiiiiiiiii|
subset: | |
SUB htm |.............................|
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION END
---
________________________________________________________________________________
________________________________________________________________________________
-----------------------------------------------------------------------------
--- PredictProtein: NEWS from January, 1997 ---
--- ---
--- Dear user, ---
--- ---
--- as of January 1, 1997, EMBL has effectively decided to not ---
--- support the PredictProtein service by personal resources. I do ---
--- maintain the program, so to speak, in my private time. However, ---
--- my contract obliges me to do science, instead. Unfortunately, ---
--- the computer environment at EMBL is at the same time starting ---
--- to become increasingly unstable. Consequence of these two re- ---
--- cent developments is that the PredictProtein service is not as ---
--- stable as it was. ---
--- ---
--- I apologise for the problems this may cause. In particular, ---
--- I apologise for my inability to reply to the 20-30 daily, per- ---
--- sonal mails, and suggest to re-submit requests after 24 hours! ---
--- ---
--- Hoping that I shall find a more convenient solution for the ---
--- future of the PredictProtein I remain with my best regards, ---
--- ---
--- Burkhard Rost ---
-----------------------------------------------------------------------------
--- PredictProtein: NEWS from April, 1998 ---
--- ---
-------------------------------- ---
--- MOVING PredictProtein ---
--- There appears to be light on the horizon! PP will may be having ---
--- many hickups over the next months (as I shall leave EMBL). How- ---
--- ever, the server seems to have a fair chance of survival thanks ---
--- to a major support that is being raised by Columbia University, ---
--- New York, U.S.A.). I hope that this will settle the issue for ---
--- the years to come ... ---
-------------------------------- ---
--- WARNING ---
--- After a major rewriting of most of the PP code over the last, ---
--- I am afraid that not all errors have been traced by me, yet. ---
--- Thus, please have mercy and report any bug you'll encounter! ---
--- THANKS, Burkhard Rost ---
-------------------------------- ---
--- NEW PREDICTION DEFAULTS ---
--- * Coiled-coil regions: now by default the program COILS written by ---
--- Andrei Lupas is run on your sequence. An output is returned if a ---
--- coiled-coil region has been detected. ---
--- * Functional sequence motifs: now by default the PROSITE database ---
--- written by Amos Bairoch, Philip Bucher and Kay Hofmann is scanned ---
--- for sequence motifs. An output is returned if any motif has been ---
--- detected. ---
-------------------------------- ---
--- see http://www.embl-heidelberg.de/predictprotein/ppNews.html ---
--- for a description of the following new options. ---
--- NEW INPUT OPTION ---
--- * Your input sequence(s) in FASTA-list format ("# FASTA list ") ---
--- NEW OUTPUT OPTIONS ---
--- * Return also BLASTP output ("return blast") ---
--- * Return prediction additionally in RDB format ("return phd rdb") ---
--- * Return topits hssp ("return topits hssp") ---
--- * Return topits strip ("return topits strip") ---
--- * Return topits own ("return topits own") ---
--- * Return no coils ("return no coils") ---
--- * Return no prosite ("return no prosite") ---
-----------------------------------------------------------------------------
|