File: connectivity.rst

package info (click to toggle)
openstructure 2.11.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 206,240 kB
  • sloc: cpp: 188,571; python: 36,686; ansic: 34,298; fortran: 3,275; sh: 312; xml: 146; makefile: 29
file content (190 lines) | stat: -rw-r--r-- 7,851 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
Connectivity
================================================================================

.. currentmodule:: ost.conop


Motivation
--------------------------------------------------------------------------------

The connectivity of atoms is notoriously difficult to come by for biological 
macromolecules. PDB files, the de facto standard exchange format for structural 
information allows bonds to be specified in CONECT records. However, they are not
mandatory. Many programs, especially the ones not depending on connectivity of
atoms, do not write CONECT records. As a result, programs and structural biology 
frameworks can't rely on connectivity information to be present. The connectivity
information needs to be derived in the program itself.

Loader heuristics are great if you are the one that implemented them but are 
problematic if you are just the user of a software that has them. As time goes 
on, these heuristics become buried in thousands of lines of code and they are 
often hard yet impossible to trace back.

Different clients of the framework have different requirements. A visualisation 
software wants to read in a PDB files as is without making any changes. A 
script in an automated pipeline, however, does want to either strictly reject 
files that are incomplete or fill-in missing structural features. All these 
aspects are implemented in the conop module, separated from the loading of the 
PDB file, giving clients a fine grained control over the loading process. The
conop logic can thus be reused in code requiring the presence of 

The conop module defines a :class:`Processor` interface, to run connectivity 
algorithms, that is to connect the atoms with bonds and perform basic clean up 
of erroneous structures. The clients of the conop module can specify how the 
Processor should treat unknown amino acids, missing atoms and chemically 
infeasible bonds.

Processors
--------------------------------------------------------------------------------

The exact behaviour for a processor is implementation-specific. So far, two
classes implement the processor interface: A heuristic and a rule-based
processor. The processors mainly differ in the source of their connectivity
information. 

The `HeuristicProcessor` uses a hard-coded heuristic connectivity
table for the 20  standard amino acids as well as nucleotides. For other
compounds such as ligands the `HeuristicProcessor` runs a distance-based
connectivity algorithm that connects two atoms if they belong to the same or
two consecutive residues, and are within a 
:func:`reasonable distance <ost.conop.IsBondFeasible>` of each other.

The `RuleBasedProcessor` uses the :doc:`compound library <compoundlib>`,
a connectivity library containing all molecular components present in the
PDB files on PDB.org. The library can easily be extended with custom
connectivity information, if required.


.. class:: Processor

  .. attribute:: check_bond_feasibility

    Whether an additional bond feasibility check is performed. Disabled by
    default. If turned on, atoms are only connected by bonds if they are within
    a reasonable distance (as defined by :func:`IsBondFeasible`).

    :type: :class:`bool`

  .. attribute:: assign_torsions

    Whether backbone torsions should be added to the backbone. Enabled by
    default. If turned on, PHI, PSI and OMEGA torsions are assigned to the
    peptide residues. See also :func:`AssignBackboneTorsions`.

    :type: :class:`bool`

  .. attribute:: connect

    Whether to connect atoms by bonds. Enabled by default. Turn this off if you
    would like to speed up the loading process and do not require connectivity
    information to be present in your structures. Note though that
    :attr:`peptide_bonds` may be ignored if this is turned off.

    :type: :class:`bool`

  .. attribute:: peptide_bonds

    Whether to connect residues by peptide bonds. Enabled by default. This also
    sets the :attr:`~ost.mol.ResidueHandle.is_protein` property of residues when
    peptide bonds are created. Turn this off if you would like to create your
    own peptide bonds.

    :type: :class:`bool`

  .. attribute:: zero_occ_treatment

    Controls the behaviour of importing atoms with zero occupancy. By default,
    this is set to warn.

    :type: :class:`ConopAction`

  .. attribute:: connect_hetatm

    :type: :class:`bool`

    Whether to connect atoms that are both hetatms. Enabled by default.
    Disabling can be useful if there are compounds which are not covered
    by the PDB component dictionary and you prefer to create your own
    connectivity for those.

  .. method:: Process(ent)
  
    Processess the entity *ent* according to the current options.


.. class:: HeuristicProcessor(check_bond_feasibility=False, \
                              assign_torsions=True, connect=True, \
                              peptide_bonds=True, \
                              connect_hetatm=True, \
                              zero_occ_treatment=CONOP_WARN)
   
  The :class:`HeuristicProcessor` implements the :class:`Processor` interface.
  Refer to its documentation for methods and accessors common to all processor.

  :param check_bond_feasibility: Sets :attr:`~Processor.check_bond_feasibility`
  :param assign_torsions: Sets :attr:`~Processor.assign_torsions`
  :param connect: Sets :attr:`~Processor.connect`
  :param peptide_bonds: Sets :attr:`~Processor.peptide_bonds`
  :param connect_hetatm: Sets :attr:`~Processor.connect_hetatm`
  :param zero_occ_treatment: Sets :attr:`~Processor.zero_occ_treatment`


.. class:: RuleBasedProcessor(compound_lib, fix_elements=True, \
                              strict_hydrogens=False, \
                              unknown_res_treatment=CONOP_WARN, \
                              unknown_atom_treatment=CONOP_WARN, \
                              check_bond_feasibility=False, \
                              assign_torsions=True, connect=True, \
                              peptide_bonds=True, connect_hetatm=True, \
                              zero_occ_treatment=CONOP_WARN)
   
  The :class:`RuleBasedProcessor` implements the :class:`Processor` interface.
  Refer to its documentation for methods and accessors common to all processor.

  :param compound_lib: The compound library to use
  :type compound_lib:  :class:`CompoundLib`
  :param fix_elements: Sets :attr:`fix_elements`
  :param strict_hydrogens: Sets :attr:`strict_hydrogens`
  :param unknown_res_treatment: Sets :attr:`unk_atom_treatment`
  :param unknown_atom_treatment: Sets :attr:`unk_res_treatment`
  :param check_bond_feasibility: Sets :attr:`~Processor.check_bond_feasibility`
  :param assign_torsions: Sets :attr:`~Processor.assign_torsions`
  :param connect: Sets :attr:`~Processor.connect`
  :param peptide_bonds: Sets :attr:`~Processor.peptide_bonds`
  :param connect_hetatm: Sets :attr:`~Processor.connect_hetatm`
  :param zero_occ_treatment: Sets :attr:`~Processor.zero_occ_treatment`

  .. attribute:: fix_elements

    Whether the element of the atom should be changed to the atom defined in the
    compound library. Enabled by default.

    :type: :class:`bool`

  .. attribute:: strict_hydrogens

    Whether to use strict hydrogen naming rules outlined in the compound library.
    Disabled by default.

    :type: :class:`bool`

  .. attribute:: unk_atom_treatment

    Treatment upon encountering an unknown atom. Warn by default.

    :type: :class:`ConopAction`

  .. attribute:: unk_res_treatment

    Treatment upon encountering an unknown residue. Warn by default.

    :type: :class:`ConopAction`


.. class:: ConopAction

  Defines actions to take when certain events happen during processing. Possible
  values:

    ``CONOP_WARN``, ``CONOP_SILENT``, ``CONOP_REMOVE``, ``CONOP_REMOVE_ATOM``,
    ``CONOP_REMOVE_RESIDUE``, ``CONOP_FATAL``