File: conop.dox

package info (click to toggle)
openstructure 2.11.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 206,240 kB
  • sloc: cpp: 188,571; python: 36,686; ansic: 34,298; fortran: 3,275; sh: 312; xml: 146; makefile: 29
file content (98 lines) | stat: -rw-r--r-- 4,289 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
namespace ost { namespace conop {
/**
\page module_conop The conop module

\section conop_module_intro The conop module

The main task of the conop module is to connect atoms with bonds. While the 
bond class is also part of the base module, the conop module deals with setting
up the correct bonds between atoms.

\subsection conop_motivation Motivation

Traditionally the connectivity between atoms has not been reliably described in
 a PDB file. Different programs adopted various ways of finding out if two atoms
 are connected. One way chosen is to rely on proper naming of the atoms. For 
example, the backbone atoms of the standard amino acids are named as N, CA, C 
and O and if atoms with these name appear in the same residue they are shown 
connected. Another way is to apply additional heuristics to find out if a peptide
 bond between two consecutive residues is formed. Breaks in the backbone are 
indicated, e.g., by introducing a discontinuity in the numbering of the residue.

Loader heuristics are great if you are the one that implemented them but are 
problematic if you are just the user of a software that has them. As time goes 
on, these heuristics become buried in thousands of lines of code and they are 
often hard yet impossible to trace back.

Different clients of the framework have different requirements. A visualisation 
software wants to read in a PDB files as is without making any changes. A 
script in an automated pipeline, however, does want to either strictly reject 
files that are incomplete or fill-in missing structural features. All these 
aspects are implemented in the conop module, separated from the loading of the 
PDB file, giving clients a fine grained control over the loading process. 

\subsection builder_interface The Builder interface

The conop module defines a Builder interface, to run connectivity algorithms, 
that is to connect the atoms with bonds and perform basic clean up of 
errorneous structures. The clients of the conop module can specify how the 
Builder should treat unknown amino acids, missing atoms and chemically 
infeasible bonds.

So far, two classes implement the Builder interface: A heuristic and a 
rule-based processor. The builders mainly differ in the source of their 
connectivity information. The HeuristicBuilder uses a hard-coded heuristic 
connectivity table for the 20 standard amino acids as well as nucleotides. 
For other compounds such as ligands the HeuristicBuilder runs a distance-based 
connectivity algorithm that connects two atoms if they are closer than a 
certain threshold. The RuleBasedBuilder uses a connectivity library containing 
all molecular components present in the PDB files on PDB.org. The library can 
easily be extended with custom connectivity information, if required. By 
default the heuristic builder is used, however the builder may be switched by
setting the !RuleBasedBuilder as the default. To do so, one has first to 
create a new instance of a !RuleBasedBuilder and register it in the builder 
registry of the conop module. In Python, this can be achieved with

\code
from ost import conop
compound_lib=conop.CompoundLib.Load('...')
rbb=conop.RuleBasedBuilder(compound_lib)
conop.Conopology.Instance().RegisterBuilder(rbb,'rbb')
conop.Conopology.Instance().SetDefaultBuilder('rbb')
\endcode

All subsequent calls to io::LoadEntity will make use of the RuleBasedBuilder 
instead of the heuristic builder. See \ref convert_mmcif "here" for more 
information on how to create the neccessary files to use the rule-based processor.

\subsection connecting_atoms Connecting atoms

The high level interface is exposed by the Conopoloy singleton instance:

\code
from ost import conop

cc=conop.Conopology.Instance()

ent=BuildRawModel(...)
cc.ConnectAll(cc.GetBuilder(), ent)
\endcode


For fine grained control, the builder interface may be used directly.


\subsection convert_mmcif Convert MM CIF dictionary

The CompoundLib may be created from a MM CIF dictionary. The latest dictionary 
can be found on the <a href="http://www.wwpdb.org/ccd.html">wwPDB site</a>. 

After downloading the file in MM CIF use the chemdict_tool to convert the MM CIF 
dictionary into our internal format.

\code
chemdict_tool create components.cif compounds.chemlib
\endcode

*/
}}