1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045
|
.. Copyright (c) 2013-2020, SIB - Swiss Institute of Bioinformatics and
.. Biozentrum - University of Basel
..
.. Licensed under the Apache License, Version 2.0 (the "License");
.. you may not use this file except in compliance with the License.
.. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.
Structural Data
================================================================================
.. currentmodule:: promod3.loop
The :class:`StructureDB` serves as a container for structural backbone and
sequence data. Custom accessor objects can be implemented that relate
arbitrary features to structural data. Examples provided by |project| include
accession using matching stem geometry (see: :class:`FragDB`) or sequence
features (see: :class:`Fragger`).
Besides backbone and sequence data, derived features can
optionally be stored. E.g. sequence profiles or secondary structure information.
Optional data includes:
* The phi/psi dihedral angles
* The secondary structure state as defined by dssp
* The solvent accessibility in square Angstrom
* The amino acid frequencies as given by an input sequence profile
* The residue depth - The residue depth is defined as the minimum distance of
a residue towards any of the exposed residues.
Distances are calculated using CB positions (artificially constructed in case
of glycine) and exposed is defined as:
relative solvent accessibility > 25% and at least one atom being exposed
to the OUTER surface. To determine whether an atom is part of that outer
surface, the full structure is placed into a 3D grid and a flood fill
algorithm is used to determine the atoms of interest.
Internal cavities are excluded by using this approach. This is a simplified
version of the residue depth as discussed in [chakravarty1999]_ and gets
directly calculated when structural information is added to the StructureDB.
* The amino acid frequency derived from structural alignments as described
in [zhou2005]_ - Since the calculation of such a profile already requires a
StructureDB, we end up in a hen and egg problem here... When adding
structural information to the StructureDB, the according memory gets
just allocated and set to zero. The usage of this information
is therefore only meaningful if you calculate these profiles
and manually set them (or load the provided default database).
Defining Chains and Fragments
--------------------------------------------------------------------------------
.. class:: CoordInfo()
The CoordInfo gets automatically generated when new chains are added to
a :class:`StructureDB`. It contains internal information of how a
connected stretch of residues is stored in the database.
.. attribute:: id
An id string specified when adding the corresponding stretch to the
structure db
.. attribute:: chain_name
A chain name string specified when adding the corresponding stretch to the
structure db
.. attribute:: offset
All residues of the added stretch are stored in a linear memory layout.
The offset parameter tells us where it exactly starts in the global data
structure. (:class:`int`)
.. attribute:: size
The number of residues in that stretch (:class:`int`)
.. attribute:: start_resnum
Residue number of first residue in the added stretch. The residue number
is relative to the SEQRES provided in the input profile when adding the
stuff to the structure db. (:class:`int`)
.. attribute:: shift
Translation from original coordinates that has been applied before storing
structural information in db. (:class:`ost.geom.Vec3`)
.. class:: FragmentInfo(chain_index, offset, length)
The FragmentInfo defines any fragment in the :class:`StructureDB`. If you
implement your own accessor object, thats the information you want to store.
:param chain_index: Fills :attr:`chain_index`
:param offset: Fills :attr:`offset`
:param length: Fills :attr:`length`
.. attribute:: chain_index
The index of the chain (defined by :class:`CoordInfo`) in the
:class:`StructureDB` this particle belongs to. (:class:`int`)
.. attribute:: offset
Index of residue in **chain** the fragment starts. (0-based, :class:`int`)
.. attribute:: length
Length of the fragment (:class:`int`)
The Structure Database
--------------------------------------------------------------------------------
The following code example demonstrates how to create a structural database
and fill it with content.
.. literalinclude:: ../../../tests/doc/scripts/loop_structure_db.py
Calculating the structural profiles is expensive and heavily depends on
the size of the database used as source. If you want to do this for a larger
database, you might want to consider two things:
1. Use a database of limited size to generate the actual profiles (something
in between 5000 and 10000 nonredundant chains is enough)
2. Use the :class:`ost.seq.ProfileDB` to gather profiles produced from jobs
running in parallel
.. class:: StructureDBDataType
The StructureDBDataType enum has to be passed at initialization of a
StructureDB in order to define what data you want to store additionally
to backbone coordinates and sequence.
For the bare minimum (only backbone coordinates and sequence), use Minimal.
If you want to store all data possible, use All. If you only want a subset,
you can combine some of the datatypes with a bitwise or operation
(see example script for :class:`StructureDB`). One important note:
If you enable AAFrequenciesStruct, the actual information is not automatically
assigned. Only the according memory is allocated and set to zero, the actual
information must be assigned manually (see example script again...).
Minimal, All, Dihedrals, SolventAccessibilities, ResidueDepths, DSSP,
AAFrequencies, AAFrequenciesStruct
.. class:: StructureDB(data_to_store)
Generates an empty :class:`StructureDB` that can be filled with content
through :func:`AddCoordinates`. The information extracted there is defined by
*data_to_store*. Have a look at the :class:`StructureDBDataType`
documentation and at the example script...
:param data_to_store: Specifies what data to store in the database, several
flags can be combined with a bitwise or operator.
:type data_to_store: :class:`StructureDBDataType`
.. staticmethod:: Load(filename)
LoadPortable(filename)
Loads raw binary file generated with :meth:`Save` (optimized for fast
reading) / portable file generated with :meth:`SavePortable` (slower but
less machine-dependent).
:param filename: Path to the file from which to load the database.
:type filename: :class:`str`
:returns: The loaded data base
:rtype: :class:`StructureDB`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened or if
file cannot be parsed (see :ref:`here <portableIO>` for details).
.. method:: Save(filename)
SavePortable(filename)
Saves a raw / portable binary representation. Use portable files for
distribution and convert locally to raw files. See :ref:`here <portableIO>`
for details.
:param filename: Path to the file where the database will be saved
:type filename: :class:`str`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened
.. method:: HasData(data_type)
Checks, whether requested data type is available in the current database.
:param data_type: Data type to check
:type data_type: :class:`StructureDBDataType`
:returns: Whether the requested datatype is available
:rtype: :class:`bool`
.. method:: AddCoordinates(id, chain_name, ent, seqres, prof=None, \
only_longest_stretch=True)
This method takes an entity and adds coordinates and the sequence
of one of its chains to the structural database. Additionally, all
data as specified at the initialization of the database is extracted
fully automatically by considering the full *ent* (e.g. when
calculating solvent accessibilities etc.).
The only exception is AAFrequencies, where a valid sequence profile
is expected in *prof* that has matching sequence with *seqres*
All residues in chain with name *chain_name* must have residue numbers
that directly relate them to the *seqres* with an indexing scheme
starting from one.
If this is not the case, an error gets thrown. You might want to
consider to use :meth:`ost.seq.Renumber` for proper numbering.
Based on consecutive numbering and additionally checking for valid
peptide bonds, connected stretches are identified
and every added connected stretch gets its own entry with
:class:`CoordInfo` as a descriptor.
To avoid cluttering the database with many small fragments, the flag:
*only_longest_stretch* can be used. Set it to False if all
connected stretches of chain with name *chain_name* should be added.
There is one final catch you have to consider: Due to the internal
lossy data compression for the positions, the extent in x, y and
z - direction for every connected stretch is limited to 655A. This should
be sufficient for most structures, but stretches exceeding this maximum
are discarded. For storing the structural data given these restraints,
a translation is applied that gets stored as the *shift* attribute
in the according :class:`CoordInfo` object.
:param id: identifier of the added structure (e.g. pdb id)
:param chain_name: Name of the chain in *ent* you want to add
:param ent: The full entity that must contain a chain named
as specified by *chain_name*.
:param seqres: The reference sequence of chain with name *chain_name*
:param prof: Profile information for the chain with name
*chain_name*. The profile sequence must match *seqres*.
:param only_longest_stretch: Flag whether you want to add only the longest
connected stretch of residues are all connected
stretches of residues
:type id: :class:`str`
:type chain_name: :class:`str`
:type ent: :class:`ost.mol.EntityHandle` /
:class:`ost.mol.EntityView`
:type seqres: :class:`ost.seq.SequenceHandle`
:type prof: :class:`ost.seq.ProfileHandle`
:type only_longest_strech: :class:`bool`
:returns: indices of added stretches in db
:rtype: :class:`list` of `int`
:raises: :exc:`~exceptions.RuntimeError` if the residues in chain with
name *chain_name* do not match *seqres* given the
residue numbers, when AAFrequencies have to to be extracted and
the sequence in *prof* does not match the *seqres* or *prof* is
invalid.
.. method:: RemoveCoordinates(coord_idx)
Removes coordinates at specified location and all its associated data. This
has an impact on the offset values of all :class:`CoordInfo` objects
that are internally stored afterwards and on the actual coord indices
(all shifted by one). So make sure that you adapt your data access
accordingly!
:param coord_idx: Specifies coordinates to be removed
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if *coord_idx* is invalid
.. method:: GetCoordIdx(id, chain_name)
:returns: The :class:`StructureDB` indices (in [0, :meth:`GetNumCoords`-1])
of all coords (connected stretches) with matching
*id* / *chain_name*.
:rtype: :class:`list` of :class:`int`
:param id: Identifier given when calling :meth:`AddCoordinates`
:param chain_name: Name of chain given when calling :meth:`AddCoordinates`
:type pdb_id: :class:`str`
:type chain_name: :class:`str`
.. method:: GetCoordInfo(idx)
:returns: Object describing the stretch of connected residues with
index *idx*.
:rtype: :class:`CoordInfo`
:param idx: The :class:`StructureDB` index (in [0, :meth:`GetNumCoords`-1])
:type idx: :class:`int`
.. method:: GetNumCoords()
:returns: Number of connected stretches of residues that have been added to
the database.
:rtype: :class:`int`
.. method:: PrintStatistics()
Prints out some information about the db.
.. method:: GetBackboneList(fragment, sequence)
GetBackboneList(n_stem, c_stem, fragment, sequence="")
GetBackboneList(coord_idx, sequence="")
GetBackboneList(n_stem, c_stem, coord_idx, sequence="")
:returns: Backbone list with positions extracted from *fragment* or
full entry at *coord_idx*
:rtype: :class:`BackboneList`
:param fragment: Fragment definition from which to extract positions.
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract positions.
:type coord_idx: :class:`int`
:param sequence: Sequence of the returned backbone list. If not
set, the original sequence at specified location in the
database is used.
:type sequence: :class:`str`
:param n_stem: Positions on which the backbone list's N-terminus should be
superposed onto.
:type n_stem: :class:`ost.mol.ResidueHandle`
:param c_stem: Positions on which the backbone list's C-terminus should be
superposed onto.
:type c_stem: :class:`ost.mol.ResidueHandle`
:raises: :exc:`~exceptions.RuntimeError` if the length of *sequence* does
not match with the desired backbone list, if *sequence* contains
a character which does not belong to the 20 proteinogenic amino
acids or if *fragment* or *coord_idx* is invalid. Fragment can
be invalid when it does not fully fit into one of the connected
stretches of residues in the database.
.. method:: GetSequence(fragment)
GetSequence(coord_idx)
:returns: The sequence of *fragment* or full entry at *coord_idx*
:rtype: :class:`str`
:param fragment: Fragment definition from which to extract the sequence.
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the sequence
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if fragment or coord_idx is
invalid. Fragment can be invalid when it does not fully fit into
one of the connected stretches of residues in the database.
.. method:: GetDSSPStates(fragment)
GetDSSPStates(coord_idx)
:returns: The dssp states of *fragment* or full entry at *coord_idx*
:rtype: :class:`str`
:param fragment: Fragment definition from which to extract the states.
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the dssp states
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain dssp
data or if fragment/ coord_idx is invalid. Fragment can be invalid
when it does not fully fit into one of the connected stretches of
residues in the database.
.. method:: GetDihedralAngles(fragment)
GetDihedralAngles(coord_idx)
:returns: The phi and psi dihedral angles of every residue of *fragment*
or full entry at *coord_idx*
:rtype: :class:`list` of pairs (:class:`tuple`)
:param fragment: Fragment definition from which to extract the dihedrals.
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the dihedral angles
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain
dihedral angle data or if fragment/ coord_idx is invalid.
Fragment can be invalid when it does not fully fit into one of the
connected stretches of residues in the database.
.. method:: GetResidueDepths(fragment)
GetResidueDepths(coord_idx)
:returns: Residue depth for each residue of *fragment* or full entry
at *coord_idx*
:rtype: :class:`list` of :class:`float`
:param fragment: Fragment definition from which to extract the residue
depths
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the residue depths
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain
residue depth data or if fragment/ coord_idx is invalid.
Fragment can be invalid when it does not fully fit into one of the
connected stretches of residues in the database.
.. method:: GetSolventAccessibilitites(fragment)
GetSolventAccessibilitites(coord_idx)
:returns: Solvent accessibility for each residue of *fragment* or full entry
at *coord_idx* in square A as calculated by
:meth:`~ost.mol.alg.Accessibility` when adding the structure to
the database.
:rtype: :class:`list` of :class:`float`
:param fragment: Fragment definition from which to extract the solvent
accessibilities
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the solvent
accessibilities
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain
solvent accessibility data or if fragment/ coord_idx is invalid.
Fragment can be invalid when it does not fully fit into one of the
connected stretches of residues in the database.
.. method:: GetSequenceProfile(fragment)
GetSequenceProfile(coord_idx)
:returns: The sequence profile for the residues defined by *fragment* or
full entry at *coord_idx* with the BLOSUM62 probabilities as NULL
model.
:rtype: :class:`ost.seq.ProfileHandle`
:param fragment: Fragment definition from which to extract the sequence
profile
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the sequence profile
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain
sequence profile data or if fragment/ coord_idx is invalid.
Fragment can be invalid when it does not fully fit into one of the
connected stretches of residues in the database.
.. method:: GetStructureProfile(fragment)
GetStructureProfile(coord_idx)
:returns: The structure profile for the residues defined by *fragment* or
full entry at *coord_idx* with the BLOSUM62 probabilities as NULL
model.
:rtype: :class:`ost.seq.ProfileHandle`
:param fragment: Fragment definition from which to extract the structure
profile
:type fragment: :class:`FragmentInfo`
:param coord_idx: Idx of entry from which to extract the structure profile
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if database does not contain
structure profile data or if fragment/ coord_idx is invalid.
Fragment can be invalid when it does not fully fit into one of the
connected stretches of residues in the database.
.. method:: GenerateStructureProfile(bb_list, residue_depths)
Calculates a structure profile for *bb_list* with given *residue_depths*
using the full internal data of this StructureDB.
:param bb_list: Positions for which to calculate the structural profile
:type bb_list: :class:`BackboneList`
:param residue_depths: The residue depth for each residue in *bb_list*
as you would extract it from any StructureDB
containing that data.
:type residue_depths: :class:`list` of :class:`float`
:returns: The structure profile for the input with the BLOSUM62
probabilities as NULL model.
:rtype: :class:`ost.seq.ProfileHandle`
:raises: :exc:`~exceptions.RuntimeError` if *bb_list* and
*residue_depths* differ in size, when their size is 0
or when database does not contain residue depth data.
.. method:: SetStructureProfile(coord_idx, prof)
Takes the *prof* and sets the corresponding StructureProfile
frequencies in entry with *coord_idx*
:param prof: Source of profile frequencies
:param coord_idx: StructureDB index of entry for which to set frequencies
(in [0, :meth:`GetNumCoords`-1])
:type prof: :class:`ost.seq.ProfileHandle`
:type coord_idx: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if *coord_idx* does not match
any entry in the db, when the size of the *prof* does not
exactly match the size of entry at *coord_idx* or when database
does not contain aa frequency struct data.
.. method:: GetSubDB(indices)
:return: A new database containing only the structural infos specified by
your input. This might be useful if you're testing stuff and want
to make sure that you have no close homologue in the database.
:rtype: :class:`StructureDB`
:param indices: Indices of chains to be added to the sub database (in [0,
:meth:`GetNumCoords`-1])
:type indices: :class:`list`
:raises: :exc:`~exceptions.RuntimeError` if you provide an invalid index
Finding Fragments based on Geometric Features
--------------------------------------------------------------------------------
The fragment database allows to organize, search and access the information
stored in a structural database (:class:`StructureDB`). In its current form it
groups fragments in bins according to their length (incl. stems) and the
geometry of their N-stem and C-stem (described by 4 angles and the distance
between the N-stem C atom and the C-stem N atom). It can therefore be searched
for fragments matching a certain geometry of N and C stems. The bins are
accessed through a hash table, making searching the database ultra fast.
This example illustrates how to create a custom FragDB based on a StructureDB:
.. literalinclude:: ../../../tests/doc/scripts/loop_frag_db.py
.. class:: FragDB(dist_bin_size, angle_bin_size)
:param dist_bin_size: Size of the distance parameter binning in A
:param angle_bin_size: Size of the angle parameter binning in degree
:type dist_bin_size: :class:`float`
:type angle_bin_size: :class:`int`
.. staticmethod:: Load(filename)
LoadPortable(filename)
Loads raw binary file generated with :meth:`Save` (optimized for fast
reading) / portable file generated with :meth:`SavePortable` (slower but
less machine-dependent).
:param filename: Path to the file from which to load the database.
:type filename: :class:`str`
:returns: The loaded database
:rtype: :class:`FragDB`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened or if
file cannot be parsed (see :ref:`here <portableIO>` for details).
.. method:: Save(filename)
SavePortable(filename)
Saves a raw / portable binary representation. Use portable files for
distribution and convert locally to raw files. See :ref:`here <portableIO>`
for details.
:param filename: path to the file where the database will be saved
:type filename: :class:`str`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened.
.. method:: GetAngularBinSize()
The size of the bins for the 4 angles describing the stem geometry and used
to organize the fragments in the database.
:return: The bin size in degrees
:rtype: :class:`int`
.. method:: GetDistBinSize()
The size of the bins for the distance describing the stem geometry and used
to organize the fragments in the database.
:return: The bin size
:rtype: :class:`float`
.. method:: AddFragments(fragment_length, rmsd_cutoff, structure_db)
Iterates over all fragments of length **fragment_length** in
**structure_db** and adds them to the fragment database.
Fragments will be skipped if there is already a fragment in the database
that has an RMSD smaller than **rmsd_cutoff**, where RMSD is calculated
upon superposing the stem residues.
As the fragments are added they are organized in bins described by their
length and the geometry of their N and C stem.
:param fragment_length: The length of the fragments that should be added to the databse
:param rmsd_cutoff: The minimal RMSD between two fragments in the fragment database
:param structure_db: Database delivering the structural info
:type fragment_length: :class:`int`
:type rmsd_cutoff: :class:`float`
:type structure_db: :class:`StructureDB`
.. method:: PrintStatistics()
Prints statistics about the fragment database, notably:
1. the number of different stem groups (number of bins used to group the
fragments according to the geometry of their stem residues)
2. The total number of fragments in the database
3. The minimal and maximal number of fragments found in a stem group.
.. method:: GetNumStemPairs()
GetNumStemPairs(loop_length)
Returns the number of stem groups (number of bins used to group the
fragments according to the geometry of their stem residues) for the whole db
or for fragments of a given length.
:param loop_length: The length of the fragments
:type loop_length: :class:`int`
:returns: The number of groups
:rtype: :class:`int`
.. method:: GetNumFragments()
GetNumFragments(loop_length)
Returns the number of fragments in the database in total or for fragments of
a given length.
:param loop_length: The length of the fragments
:type loop_length: :class:`int`
:returns: Number of fragments
:rtype: :class:`int`
.. method:: HasFragLength(loop_length)
:param loop_length: The length of the fragments
:type loop_length: :class:`int`
:returns: True if fragments of given length exist.
:rtype: :class:`bool`
.. method:: MaxFragLength()
:returns: Maximal fragment length contained in db.
:rtype: :class:`int`
.. method:: SearchDB(n_stem, c_stem, frag_size, extra_bins=0)
Search the database for fragments matching the geometry of the **n_stem**
and **c_stem** and of the same length as the **frag_size**.
:param n_stem: The N-stem
:type n_stem: :class:`ost.mol.ResidueHandle`
:param c_stem: The C-stem
:type c_stem: :class:`ost.mol.ResidueHandle`
:param frag_size: Number of residues of the fragment
:type frag_size: :class:`int`
:param extra_bins: Whether to extend the search to include fragments from
*extra_bins* additional bins surrounding the bin given by
the *n_stem* and *c_stem* geometry. If odd, we extend to
the closer bin, otherwise symmetrically.
:type extra_bins: :class:`int`
:returns: A list of :class:`FragmentInfo` objects. These objects are related
to the structural database with which you called the AddFragments
function.
Finding Fragments based on Sequence Features
--------------------------------------------------------------------------------
In some cases you might want to use the :class:`StructureDB` to search
for fragments that possibly represent the structural conformation of interest.
The :class:`Fragger` searches a :class:`StructureDB` for n fragments,
that maximize a certain score and gathers a set of fragments with a guaranteed
structural diversity based on an rmsd threshold. You can use the :class:`Fragger`
wrapped in a full fletched pipeline implemented in
:class:`~promod3.modelling.FraggerHandle` or search for fragments from scratch
using an arbitrary linear combination of scores:
* **SeqID**:
Calculates the fraction of amino acids being identical when comparing
a potential fragment from the :class:`StructureDB` and the target sequence
* **SeqSim**:
Calculates the avg. substitution matrix based sequence similarity of amino acids
when comparing a potential fragment from the :class:`StructureDB` and the target
sequence
* **SSAgree**:
Calculates the avg. agreement of the predicted secondary structure by PSIPRED [Jones1999]_
and the dssp [kabsch1983]_ assignment stored in the :class:`StructureDB`.
The Agreement term is based on a probabilistic approach also used in HHSearch [soding2005]_.
* **TorsionProbability**:
Calculates the avg. probability of observing the phi/psi dihedral angles of a potential
fragment from the :class:`StructureDB` given the target sequence. The probabilities are
extracted from the :class:`TorsionSampler` class.
* **SequenceProfile**:
Calculates the avg. profile score between the amino acid frequencies of a potential
fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment
in between them. The scores are calculated as L1 distances between the profile columns.
* **StructureProfile**:
Calculates the avg. profile score between the amino acid frequencies of a potential
fragment from the :class:`StructureDB` and a target profile assuming a gapfree alignment
in between them. The scores are calculated as L1 distances between the profile columns.
In this case, the amino acid frequencies extracted from structural alignments are used.
.. literalinclude:: ../../../tests/doc/scripts/loop_fragger.py
.. class:: Fragger(seq)
A Fragger object to search a :class:`StructureDB` for fragments with **seq**
as target sequence. You need to add some score components before you can
finally call the Fill function.
:param seq: Sequence of fragments to be searched
:type seq: :class:`str`
.. method:: Fill(db, rmsd_thresh, num_fragments)
Searches **db** for **num_fragments** fragments with maximal
score based on the defined score components.
There will be no pair of fragments with RMSD below **rmsd_thresh**.
:param db: Source of structural data
:param rmsd_thresh: To guarantee structural diversity
:param num_fragments: Number of fragments to be extracted
:type db: :class:`StructureDB`
:type rmsd_thresh: :class:`float`
:type num_fragments: :class:`int`
.. method:: AddSeqIDParameters(w)
Add SeqID score component with linear weight **w**
:param w: linear weight
:type w: :class:`float`
.. method:: AddSeqSimParameters(w, subst)
Add SeqSim score component with linear weight **w**
:param w: linear weight
:param subst: Substitution matrix to calculate sequence similarity
:type w: :class:`float`
:type subst: :class:`ost.seq.SubstWeightMatrix`
.. method:: AddSSAgreeParameters(w, psipred_prediction)
Add SSAgree score component with linear weight **w**
:param w: linear weight
:param psipred_prediction: Psipred prediction for fraggers target_sequence
:type w: :class:`str`
:type psipred_prediction: :class:`PsipredPrediction`
.. method:: AddTorsionProbabilityParameters(w, torsion_sampler, \
aa_before, aa_after)
AddTorsionProbabilityParameters(w, torsion_sampler_list, \
aa_before, aa_after)
Add TorsionProbability component of linear weight **w**
:param w: linear weight
:type w: :class:`float`
:param torsion_sampler: Torsion sampler to be used for all residues.
:type torsion_sampler: :class:`TorsionSampler`
:param torsion_sampler_list: One torsion sampler for each residue.
:type torsion_sampler_list: :class:`list` of :class:`TorsionSampler`
:param aa_before: Name (3 letter code) of the residue before the sequence
linked to this object.
:type aa_before: :class:`str`
:param aa_after: Name (3 letter code) of the residue after the sequence
linked to this object.
:type aa_after: :class:`str`
.. method:: AddSequenceProfileParameters(w, prof)
Add SequenceProfile component of linear weight **w**
:param w: linear weight
:param prof: Profile for the fraggers target_sequence
:type w: :class:`float`
:type prof: :class:`ost.seq.ProfileHandle`
.. method:: AddStructureProfileParameters(w, prof)
Add StructureProfile component of linear weight **w**
:param w: linear weight
:param prof: Profile for the fraggers target_sequence
:type w: :class:`float`
:type prof: :class:`ost.seq.ProfileHandle`
.. method:: __len__()
:returns: Number of fragments stored in fragger.
.. method:: __getitem__(index)
:param index: Item to extract
:returns: Fragment at given position
:rtype: :class:`BackboneList`
.. method:: GetFragmentInfo(index)
:param index: Index of fragment
:type index: :class:`int`
:returns: :class:`FragmentInfo` of fragment at position **index**
.. method:: GetScore(index)
Returns the final score of fragment defined by **index**
:param index: Index of fragment
:type index: :class:`int`
:returns: Score of fragment at position **index**
.. method:: GetScore(parameter_index, index)
Returns the single feature score defined by **parameter_index** of
fragment defined by **index** with parameter indexing based on the order
you added them to the :class:`Fragger`
:param parameter_index: Index of score (0-indexed in order of score
components that were added)
:param index: Index of fragment
:type parameter_index: :class:`int`
:type index: :class:`int`
.. class:: FraggerMap
A simple storable map of Fragger objects. The idea is that one can use the map
to cache fragger lists that have already been generated.
You can use :meth:`Contains` to check if an item with a given key
(:class:`int`) already exists and access items with the [] operator (see
:meth:`__getitem__` and :meth:`__setitem__`).
Serialization is meant to be temporary and is not guaranteed to be portable.
.. method:: Load(filename, db)
Loads raw binary file generated with :meth:`Save`.
:param filename: Path to the file.
:type filename: :class:`str`
:param db: Source of structural data used when filling the fragments.
:type db: :class:`StructureDB`
:returns: The loaded map.
:rtype: :class:`FraggerMap`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened.
.. method:: Save(filename)
Saves raw binary representation of this map. Only fragment infos and scores
are stored and not the parameters for scoring. The coordinates are to be
reread from a structure db.
:param filename: Path to the file.
:type filename: :class:`str`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened.
.. method:: LoadBB(filename)
Loads raw binary file generated with :meth:`SaveBB`.
:param filename: Path to the file.
:type filename: :class:`str`
:returns: The loaded map.
:rtype: :class:`FraggerMap`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened.
.. method:: SaveBB(filename)
Saves raw binary representation of this map. Only fragments and scores
are stored and not the parameters for scoring. Here, we also store the
coordinates. This file will hence be much larger than the one saved with
:meth:`Save`.
:param filename: Path to the file.
:type filename: :class:`str`
:raises: :exc:`~exceptions.RuntimeError` if file cannot be opened.
.. method:: Contains(id)
:return: True, iff a fragger object for this id is already in the map.
:rtype: :class:`bool`
.. method:: __getitem__(id)
__setitem__(id)
Allow read/write access (with [*id*]) to fragger object with given ID.
The PsipredPrediction class
--------------------------------------------------------------------------------
.. class:: PsipredPrediction
A container for the secondary structure prediction by PSIPRED [Jones1999]_.
.. method:: PsipredPrediction()
Constructs empty container
.. method:: PsipredPrediction(prediction, confidence)
Constructs container with given content
:param prediction: Secondary structure prediction as element in ['H','E','C']
:param confidence: Confidence of prediction as element in [0,9]
:type prediction: :class:`list`
:type confidence: :class:`list`
:raises: :exc:`~exceptions.RuntimeError` if size of **prediction** and
**confidence** are inconsistent or if they contain an invalid
element
.. method:: FromHHM(filename)
Static function to Load a :class:`PsipredPrediction` object from hhm file,
as they are provided by the hhsearch suite
:param filename: Name of file
:type filename: :class:`str`
.. method:: FromHoriz(filename)
Static function to Load a :class:`PsipredPrediction` object from horiz file,
as they are produced by the psipred executable
:param filename: Name of file
:type filename: :class:`str`
.. method:: Add(prediction, confidence)
Adds and appends a single residue psipred prediction at the end
:param prediction: Prediction, must be one in ['H','E','C']
:param confidence: Confidence of prediction, must be in [0,9]
:type prediction: :class:`str`
:type confidence: :class:`int`
:raises: :exc:`~exceptions.RuntimeError` if input contains invalid elements
.. method:: Extract(from, to)
Extracts content and returns a sub-:class:`PsipredPrediction` with range **from**
to **to**, not including **to** itself
:param from: Idx to start
:param to: Idx to end
:type from: :class:`int`
:type to: :class:`int`
:returns: :class:`PsipredPrediction` with the specified range
:raises: :exc:`~exceptions.RuntimeError` if **from** or **to** are invalid
.. method:: GetPrediction(idx)
:param idx: Index to get prediction from
:type idx: :class:`int`
:returns: Psipred prediction at pos **idx**
:raises: :exc:`~exceptions.RuntimeError` if **idx** is invalid
.. method:: GetConfidence(idx)
:param idx: Index to get confidence from
:type idx: :class:`int`
:returns: Psipred confidence at pos **idx**
:raises: :exc:`~exceptions.RuntimeError` if **idx** is invalid
.. method:: GetPredictions()
Get all the predictions in the container
:returns: :class:`list` containing all the predictions in the container
.. method:: GetConfidences()
Get all the confidences in the container
:returns: :class:`list` containing all the confidences in the container
.. method:: __len__()
:returns: Number of elements in container
|