File: compoundlib.rst

package info (click to toggle)
openstructure 2.11.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 206,240 kB
  • sloc: cpp: 188,571; python: 36,686; ansic: 34,298; fortran: 3,275; sh: 312; xml: 146; makefile: 29
file content (327 lines) | stat: -rw-r--r-- 10,415 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
.. currentmodule:: ost.conop

The compound library
================================================================================

Compound libraries contain information on chemical compounds, such as their 
connectivity, chemical class and one-letter-code. The compound library has 
several uses, but the most important one is to provide the connectivy 
information for the :class:`rule-based processor <RuleBasedBuilder>`. 

The compound definitions for standard PDB files are taken from the 
components.cif dictionary provided by the PDB. The dictionary is updated with 
every PDB release and augmented with the compound definitions of newly 
crystallized compounds. Follow :ref:`these instructions <mmcif-convert>` to
build the compound library.

In general, compound libraries built with older versions of OST are compatible
with newer version of OST, so it may not be necessary to rebuild a new one.
However, some functionality may not be available. Currently, compound libraries
built with OST 1.5.0 or later can be loaded.

.. function:: GetDefaultLib()

  Get the default compound library. This is set by :func:`SetDefaultLib`.

  If you obtained OpenStructure as a container or you
  :ref:`compiled <cmake-flags>` it with a specified ``COMPOUND_LIB`` flag,
  this function will return a compound library.

  You can override the default compound library by pointing the
  ``OST_COMPOUNDS_CHEMLIB`` environment variable to a valid compound library
  file.

  :return: Default compound library.
  :rtype:  :class:`CompoundLib` or None if no library set

.. function:: SetDefaultLib(lib)

  :param lib: Library to be set as default compound library.
  :type lib:  :class:`CompoundLib`


.. class:: CompoundLib

  .. staticmethod:: Load(database, readonly=True)
  
    Load the compound lib from database with the given name.
    
    :param readonly: Whether the library should be opened in read-only mode. It 
      is important to note that only one program at the time has write access to 
      compound library. If multiple programs try to open the compound library in 
      write mode, the programs can deadlock.
    :type readonly: :class:`bool`
    
    :returns: The loaded compound lib or None if it failed.
    
  .. staticmethod:: Create(database)
    
    Create a new compound library
    
  .. method:: FindCompound(id, dialect='PDB')

    Lookup compound by its three-letter-code, e.g ALA. If no compound with that
    name exists, the function returns None. Compounds are cached after they
    have been loaded with FindCompound. To delete the compound cache, use
    :meth:`ClearCache`.
    
    :returns: The found compound
    :rtype: :class:`Compound`

  .. method:: FindCompounds(query, by, dialect='PDB')

    Lookup one or more compound by SMILES string, InChI code, InChI key or
    formula.

    The compound library is queried for exact string matches. Many SMILES
    strings can represent the same compound, so this function is only useful
    for SMILES strings coming from the PDB (or canonical SMILES from the
    OpenEye Toolkits). This is also the case for InChI codes, although to a
    lesser extent.

    Obsolete compounds will be sorted at the back of the list. However, there
    is no guarantee that the first compound is active.

    :param query: the string to lookup.
    :type query: :class:`string`
    :param by: the key into which to lookup for the query. One of: "smiles",
      "inchi_code", "inchi_key" or "formula".
    :type by: :class:`string`
    :param dialect: the dialect to select for (typically "PDB", or "CHARMM" if
      your compound library was built with charmm support).
    :type dialect: :class:`string`
    :returns: A list of found compounds, or an empty list if no compound was
      found.
    :rtype: :class:`list` or :class:`Compound`

  .. method:: Copy(dst_filename)
  
    Copy database to dst_filename. The new library will be an exact copy of the 
    database. The special name `:memory:` will create an in-memory version of 
    the database. At the expense of memory, database lookups will become much 
    faster.
    
    :returns: The copied compound library
    
    :rtype: :class:`CompoundLib`

  .. method:: ClearCache()
  
    Clear the compound cache.

  .. method:: SetChemLibInfo()

     When creating the new library the current date and the Version of OST used
     are stored into the table chemlib_info.

  .. method:: GetOSTVersionUsed()

     :return: OST version (ost_version_used from the table chemlib_info)
     :rtype:  :class:`str`

  .. method:: GetCreationDate()

     :return: creation date (creation_date from the table chemlib_info)
     :rtype:  :class:`str`


.. class:: Compound

  Holds the description of a chemical compound, such as three-letter-code, and
  chemical class.

  .. attribute:: id
  
    Alias for :attr:`three_letter_code`
    
  .. attribute:: three_letter_code
  
    Three-letter code of the residue, e.g. ALA for alanine. The three-letter 
    code is unique for each compound, always in uppercase letters and is between  
    1 and 3 characters long.
    
    code is always uppercase.
    
  .. attribute:: one_letter_code
  
    The one letter code of the residue, e.g. 'G' for glycine. If undefined, the 
    one letter code of the residue is set to '?'

  .. attribute:: formula
  
    The chemical composition, e.g. 'H2 O' for water. The elements are listed in 
    alphabetical order.
    
  .. attribute:: dialect
  
    The dialect of the compound.
    
  .. attribute:: atom_specs

    The atom definitions of this compound. Read-only.

    :type: list of :class:`AtomSpec`
          
  .. attribute:: bond_specs
  
    The bond definitions of this compound. Read-only.
    
    :type: list of :class:`BondSpec`
    
  .. attribute:: chem_class
  
    The :class:`~ost.mol.ChemClass` of this compound. Read-only.
    
    :type: :class:`str`
    
  .. attribute:: chem_type
  
    The :class:`~ost.mol.ChemType` of this compound. Read-only.
    
    :type: :class:`str`
    
  .. attribute:: inchi
  
    The InChI code of this compound, e.g  '1S/H2O/h1H2' for water, or an empty
    string if missing.
    Read-only.
    
    :type: :class:`str`
    
  .. attribute:: inchi_key
  
    The InChIKey of this compound, e.g.
    'XLYOFNOQVPJJNP-UHFFFAOYSA-N' for water, or an empty string if missing.
    Read-only.
    
    :type: :class:`str`

  .. attribute:: smiles

    The SMILES string of this compound, e.g 'O' for water, or an empty string
    if missing. Read-only.

    The string is read from the canonical SMILES produced by the
    OpenEye OEToolkits.

    :type: :class:`str`

  .. attribute:: obsolete

    Whether the component has been obsoleted by the PDB.

    :type: :class:`bool`

  .. attribute:: replaced_by

    If the component has been obsoleted by the PDB, this is the three-letter
    code of the compound that replaces it. This is not set for all obsolete
    compounds.

    :type: :class:`str`
    

.. class:: AtomSpec

  Definition of an atom
  
  .. attribute:: element
  
    The element of the atom
    
  .. attribute:: name
  
    The primary name of the atom
    
  .. attribute:: alt_name
  
    Alternative atom name. If the atom has only one name, this is identical to 
    :attr:`name`
    
  .. attribute:: is_leaving
  
    Whether this atom is required for a residue to be complete. The best example 
    of a leaving atom is the *OXT* atom of amino acids that gets lost when a 
    peptide bond is formed.

  .. attribute:: charge

    The charge of the atom.

.. class:: BondSpec

  Definition of a bond
  
  .. attribute:: atom_one
    
    The first atom of the bond, encoded as index into the 
    :attr:`Compound.atom_specs` array.
    
  .. attribute:: atom_two
  
    The second atom of the bond, encoded as index into the 
    :attr:`Compound.atom_specs` array.
    
  .. attribute:: order
  
    The bond order, 1 for single bonds, 2 for double-bonds and 3 for 
    triple-bonds
    

Example: Translating SEQRES entries
--------------------------------------------------------------------------------

In this example we will translate the three-letter-codes given in the SEQRES record to one-letter-codes. Note that this automatically takes care of modified amino acids such as selenium-methionine.


.. code-block:: python

  compound_lib=conop.CompoundLib.Load('compounds.chemlib')
  seqres='ALA GLY MSE VAL PHE'
  sequence=''
  for tlc in seqres.split():
    compound=compound_lib.FindCompound(tlc)
    if compound:
       sequence+=compound.one_letter_code
  print(sequence) # prints 'AGMVF'

.. _mmcif-convert:

Creating a compound library
--------------------------------------------------------------------------------

The simplest way to create compound library is to use the :program:`chemdict_tool`. The programs allows you to import the chemical 
description of the compounds from a mmCIF dictionary, e.g. the components.cif dictionary provided by the PDB. The latest dictionary for can be downloaded from the `wwPDB site <http://www.wwpdb.org/ccd.html>`_. The files are rather large, it is therefore recommended to download the gzipped version.

After downloading the file use :program:`chemdict_tool` to convert the MMCIF  dictionary into our internal format.  

.. code-block:: bash
  
  chemdict_tool create <components.cif> <compounds.chemlib>

Notes:

- The :program:`chemdict_tool` only understands `.cif` and `.cif.gz` files. If you have would like to use other sources for the compound definitions, consider writing a script by using the :doc:`compound library <compoundlib>` API.
- This also loads compounds which are reserved or obsoleted by the PDB to maximize compatibility with older PDB files. You can change that and skip obsolete entries with the `-o` flag, and reserved entries with the `-i` flag.

.. code-block:: bash

  chemdict_tool create <components.cif> <compounds.chemlib> -i -o

If you are working with CHARMM trajectory files, you will also have to add the 
definitions for CHARMM. Assuming your are in the top-level source directory of 
OpenStructure, this can be achieved by:

.. code-block:: bash

  chemdict_tool update modules/conop/data/charmm.cif <compounds.chemlib> charmm


Once your library has been created, you need to tell cmake where to find it and 
make sure it gets staged.


.. code-block:: bash
  
  cmake -DCOMPOUND_LIB=compounds.chemlib
  make