1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327
|
.. currentmodule:: ost.conop
The compound library
================================================================================
Compound libraries contain information on chemical compounds, such as their
connectivity, chemical class and one-letter-code. The compound library has
several uses, but the most important one is to provide the connectivy
information for the :class:`rule-based processor <RuleBasedBuilder>`.
The compound definitions for standard PDB files are taken from the
components.cif dictionary provided by the PDB. The dictionary is updated with
every PDB release and augmented with the compound definitions of newly
crystallized compounds. Follow :ref:`these instructions <mmcif-convert>` to
build the compound library.
In general, compound libraries built with older versions of OST are compatible
with newer version of OST, so it may not be necessary to rebuild a new one.
However, some functionality may not be available. Currently, compound libraries
built with OST 1.5.0 or later can be loaded.
.. function:: GetDefaultLib()
Get the default compound library. This is set by :func:`SetDefaultLib`.
If you obtained OpenStructure as a container or you
:ref:`compiled <cmake-flags>` it with a specified ``COMPOUND_LIB`` flag,
this function will return a compound library.
You can override the default compound library by pointing the
``OST_COMPOUNDS_CHEMLIB`` environment variable to a valid compound library
file.
:return: Default compound library.
:rtype: :class:`CompoundLib` or None if no library set
.. function:: SetDefaultLib(lib)
:param lib: Library to be set as default compound library.
:type lib: :class:`CompoundLib`
.. class:: CompoundLib
.. staticmethod:: Load(database, readonly=True)
Load the compound lib from database with the given name.
:param readonly: Whether the library should be opened in read-only mode. It
is important to note that only one program at the time has write access to
compound library. If multiple programs try to open the compound library in
write mode, the programs can deadlock.
:type readonly: :class:`bool`
:returns: The loaded compound lib or None if it failed.
.. staticmethod:: Create(database)
Create a new compound library
.. method:: FindCompound(id, dialect='PDB')
Lookup compound by its three-letter-code, e.g ALA. If no compound with that
name exists, the function returns None. Compounds are cached after they
have been loaded with FindCompound. To delete the compound cache, use
:meth:`ClearCache`.
:returns: The found compound
:rtype: :class:`Compound`
.. method:: FindCompounds(query, by, dialect='PDB')
Lookup one or more compound by SMILES string, InChI code, InChI key or
formula.
The compound library is queried for exact string matches. Many SMILES
strings can represent the same compound, so this function is only useful
for SMILES strings coming from the PDB (or canonical SMILES from the
OpenEye Toolkits). This is also the case for InChI codes, although to a
lesser extent.
Obsolete compounds will be sorted at the back of the list. However, there
is no guarantee that the first compound is active.
:param query: the string to lookup.
:type query: :class:`string`
:param by: the key into which to lookup for the query. One of: "smiles",
"inchi_code", "inchi_key" or "formula".
:type by: :class:`string`
:param dialect: the dialect to select for (typically "PDB", or "CHARMM" if
your compound library was built with charmm support).
:type dialect: :class:`string`
:returns: A list of found compounds, or an empty list if no compound was
found.
:rtype: :class:`list` or :class:`Compound`
.. method:: Copy(dst_filename)
Copy database to dst_filename. The new library will be an exact copy of the
database. The special name `:memory:` will create an in-memory version of
the database. At the expense of memory, database lookups will become much
faster.
:returns: The copied compound library
:rtype: :class:`CompoundLib`
.. method:: ClearCache()
Clear the compound cache.
.. method:: SetChemLibInfo()
When creating the new library the current date and the Version of OST used
are stored into the table chemlib_info.
.. method:: GetOSTVersionUsed()
:return: OST version (ost_version_used from the table chemlib_info)
:rtype: :class:`str`
.. method:: GetCreationDate()
:return: creation date (creation_date from the table chemlib_info)
:rtype: :class:`str`
.. class:: Compound
Holds the description of a chemical compound, such as three-letter-code, and
chemical class.
.. attribute:: id
Alias for :attr:`three_letter_code`
.. attribute:: three_letter_code
Three-letter code of the residue, e.g. ALA for alanine. The three-letter
code is unique for each compound, always in uppercase letters and is between
1 and 3 characters long.
code is always uppercase.
.. attribute:: one_letter_code
The one letter code of the residue, e.g. 'G' for glycine. If undefined, the
one letter code of the residue is set to '?'
.. attribute:: formula
The chemical composition, e.g. 'H2 O' for water. The elements are listed in
alphabetical order.
.. attribute:: dialect
The dialect of the compound.
.. attribute:: atom_specs
The atom definitions of this compound. Read-only.
:type: list of :class:`AtomSpec`
.. attribute:: bond_specs
The bond definitions of this compound. Read-only.
:type: list of :class:`BondSpec`
.. attribute:: chem_class
The :class:`~ost.mol.ChemClass` of this compound. Read-only.
:type: :class:`str`
.. attribute:: chem_type
The :class:`~ost.mol.ChemType` of this compound. Read-only.
:type: :class:`str`
.. attribute:: inchi
The InChI code of this compound, e.g '1S/H2O/h1H2' for water, or an empty
string if missing.
Read-only.
:type: :class:`str`
.. attribute:: inchi_key
The InChIKey of this compound, e.g.
'XLYOFNOQVPJJNP-UHFFFAOYSA-N' for water, or an empty string if missing.
Read-only.
:type: :class:`str`
.. attribute:: smiles
The SMILES string of this compound, e.g 'O' for water, or an empty string
if missing. Read-only.
The string is read from the canonical SMILES produced by the
OpenEye OEToolkits.
:type: :class:`str`
.. attribute:: obsolete
Whether the component has been obsoleted by the PDB.
:type: :class:`bool`
.. attribute:: replaced_by
If the component has been obsoleted by the PDB, this is the three-letter
code of the compound that replaces it. This is not set for all obsolete
compounds.
:type: :class:`str`
.. class:: AtomSpec
Definition of an atom
.. attribute:: element
The element of the atom
.. attribute:: name
The primary name of the atom
.. attribute:: alt_name
Alternative atom name. If the atom has only one name, this is identical to
:attr:`name`
.. attribute:: is_leaving
Whether this atom is required for a residue to be complete. The best example
of a leaving atom is the *OXT* atom of amino acids that gets lost when a
peptide bond is formed.
.. attribute:: charge
The charge of the atom.
.. class:: BondSpec
Definition of a bond
.. attribute:: atom_one
The first atom of the bond, encoded as index into the
:attr:`Compound.atom_specs` array.
.. attribute:: atom_two
The second atom of the bond, encoded as index into the
:attr:`Compound.atom_specs` array.
.. attribute:: order
The bond order, 1 for single bonds, 2 for double-bonds and 3 for
triple-bonds
Example: Translating SEQRES entries
--------------------------------------------------------------------------------
In this example we will translate the three-letter-codes given in the SEQRES record to one-letter-codes. Note that this automatically takes care of modified amino acids such as selenium-methionine.
.. code-block:: python
compound_lib=conop.CompoundLib.Load('compounds.chemlib')
seqres='ALA GLY MSE VAL PHE'
sequence=''
for tlc in seqres.split():
compound=compound_lib.FindCompound(tlc)
if compound:
sequence+=compound.one_letter_code
print(sequence) # prints 'AGMVF'
.. _mmcif-convert:
Creating a compound library
--------------------------------------------------------------------------------
The simplest way to create compound library is to use the :program:`chemdict_tool`. The programs allows you to import the chemical
description of the compounds from a mmCIF dictionary, e.g. the components.cif dictionary provided by the PDB. The latest dictionary for can be downloaded from the `wwPDB site <http://www.wwpdb.org/ccd.html>`_. The files are rather large, it is therefore recommended to download the gzipped version.
After downloading the file use :program:`chemdict_tool` to convert the MMCIF dictionary into our internal format.
.. code-block:: bash
chemdict_tool create <components.cif> <compounds.chemlib>
Notes:
- The :program:`chemdict_tool` only understands `.cif` and `.cif.gz` files. If you have would like to use other sources for the compound definitions, consider writing a script by using the :doc:`compound library <compoundlib>` API.
- This also loads compounds which are reserved or obsoleted by the PDB to maximize compatibility with older PDB files. You can change that and skip obsolete entries with the `-o` flag, and reserved entries with the `-i` flag.
.. code-block:: bash
chemdict_tool create <components.cif> <compounds.chemlib> -i -o
If you are working with CHARMM trajectory files, you will also have to add the
definitions for CHARMM. Assuming your are in the top-level source directory of
OpenStructure, this can be achieved by:
.. code-block:: bash
chemdict_tool update modules/conop/data/charmm.cif <compounds.chemlib> charmm
Once your library has been created, you need to tell cmake where to find it and
make sure it gets staged.
.. code-block:: bash
cmake -DCOMPOUND_LIB=compounds.chemlib
make
|