1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
|
Usage
*****
Usage of the library for output consists of first creating a hierarchy of
Python objects that together describe the system, and then dumping that
hierarchy to an mmCIF file.
For a complete worked example, see the
`simple docking example <https://github.com/ihmwg/python-ihm/blob/main/examples/simple-docking.py>`_.
The top level of the hierarchy in IHM is the :class:`ihm.System`. All other
objects are referenced from a System object.
Datasets
========
Any data used anywhere in the modeling (including in validation) can be
referenced with an :class:`ihm.dataset.Dataset`. For example,
electron microscopy data is referenced with
:class:`ihm.dataset.EMDensityDataset` and small angle scattering data with
:class:`ihm.dataset.SASDataset`.
A dataset uses an
:class:`ihm.location.Location` object to describe where it is stored.
Typically this is an :class:`ihm.location.DatabaseLocation` for something
that's deposited in a experiment-specific database such as PDB, EMDB, PRIDE,
or EMPIAR, or :class:`ihm.location.InputFileLocation` for something that's
stored as a simple file, either on the local disk or at a location described
with a DOI such as `Zenodo <https://zenodo.org>`_ or a publication's
supplementary information. See the
`locations example <https://github.com/ihmwg/python-ihm/blob/main/examples/locations.py>`_
for more examples.
System architecture
===================
The architecture of the system is described with a number of classes:
- :class:`ihm.Entity` describes each unique sequence.
- :class:`ihm.AsymUnit` describes each asymmetric unit (chain) in the system.
For example, a homodimer would consist of two asymmetric units, both
pointing to the same entity, while a heterodimer contains two entities.
It is also possible for an entity to exist with no asymmetric units pointing
to it - this typically corresponds to something seen in an experiment (such
as a cross-linking study) which was not modeled. Note that the IHM
extension currently contains no support for symmetry, so two chains that
are symmetrically related should each be represented as an "asymmetric"
unit.
- :class:`ihm.Assembly` groups asymmetric units and/or entities, or parts of
them. Assemblies are used to describe which parts of the system correspond
to each input source of data, or that were modeled.
- :class:`ihm.representation.Representation` describes how each part of the
system was represented in the modeling, for example
:class:`as atoms <ihm.representation.AtomicSegment>` or
:class:`as coarse-grained spheres <ihm.representation.FeatureSegment>`.
Restraints and sampling
=======================
Restraints, that score or otherwise fit the computational model against
the input data, can be created as :class:`ihm.restraint.Restraint` objects.
These generally take as input a :class:`~ihm.dataset.Dataset` pointing to
the input data, and an :class:`~ihm.Assembly` describing which part of the
model the data corresponds to. For example, there are restraints for
:class:`3D EM <ihm.restraint.EM3DRestraint>` and
:class:`small angle scattering <ihm.restraint.SASRestraint>`.
:class:`ihm.protocol.Protocol` objects describe how models were generated
from the input data. A protocol can consist of
:class:`multiple steps <ihm.protocol.Step>`, such as molecular dynamics or
Monte Carlo, followed by one or more analyses, such as clustering, filtering,
rescoring, or validation, described by :class:`ihm.analysis.Analysis` objects.
These objects generally take an :class:`~ihm.Assembly` to indicate what part
of the system was considered and a
:class:`group of datasets <ihm.dataset.DatasetGroup>` to show which data
guided the modeling or analysis.
Model coordinates
=================
:class:`ihm.model.Model` objects give the actual coordinates of the final
generated models. These point to the :class:`~ihm.Assembly` of what was
modeled, the :class:`~ihm.protocol.Protocol` describing how the modeling
was done, and the :class:`~ihm.representation.Representation` showing how
the model was represented.
Models can be grouped together for any purpose using the
:class:`ihm.model.ModelGroup` class. If a given group describes an ensemble
of models, the :class:`ihm.model.Ensemble` class allows for additional
information on the ensemble to be provided, such as
:class:`localization densities <ihm.model.LocalizationDensity>` of parts of
the system and precision. Due to size, generally only representative models
of an ensemble are deposited in mmCIF, but the :class:`~ihm.model.Ensemble`
class allows the full ensemble to be referred to, for example in a more
compact binary format (e.g. DCD) deposited at a given DOI. Groups of models
can also be shown as corresponding to different states of the system using
the :class:`ihm.model.State` class.
Metadata
========
Metadata can also be added to the system, such as
- :class:`ihm.Citation`: publication(s) that describe this modeling or the
methods used in it.
- :class:`ihm.Software`: software packages used to process the experimental
data, generate intermediate inputs, do the modeling itself, and/or
process the output.
- :class:`ihm.Grant`: funding support for the modeling.
- :class:`ihm.reference.UniProtSequence`: information on a sequence used
in modeling, in UniProt.
Residue numbering
=================
The library keeps track of several numbering schemes to reflect the reality
of the data used in modeling:
- *Internal numbering*. Residues are always numbered sequentially starting at
1 in an :class:`~ihm.Entity`. All references to residues or residue ranges in
the library use this numbering. For polymers, this internal numbering matches
the ``seq_id`` used in the mmCIF dictionary, while for branched entities,
this matches ``num`` in the dictionary. (For other types of entities
(non-polymers, waters) ``seq_id`` is not used in mmCIF,
but the residues are still numbered sequentially from 1 in this library.)
- *Author-provided numbering*. If a different numbering scheme is used by the
authors, for example to correspond to the numbering of the original sequence
that is modeled, this can be given as an author-provided numbering for
one or more asymmetric units. See the ``auth_seq_id_map`` and
``orig_auth_seq_id_map`` parameters to :class:`~ihm.AsymUnit`. (The mapping
between author-provided and internal numbering is given in tables such
as ``pdbx_poly_seq_scheme`` in the mmCIF file.) Two maps are provided as
PDB provides for two distinct author-provided schemes; the "original"
author-provided numbering ``orig_auth_seq_id_map`` is entirely unrestricted
but is only used internally, while ``auth_seq_id_map`` must follow certain
PDB rules (and generally matches the residue numbers used in legacy PDB
files). In most cases, only ``auth_seq_id_map`` is used.
- *Starting model numbering*. If the initial state of the modeling is given
by one or more PDB files, the numbering of residues in those files may not
line up with the internal numbering. In this case an offset from starting
model numbering to internal numbering can be provided - see the ``offset``
parameter to :class:`~ihm.startmodel.StartingModel`.
- *Reference sequence numbering*. The modeled sequence may differ from that
in a database such as UniProt, which is itself numbered sequentially from 1
(for example, the modeled sequence may be a subset of the UniProt sequence,
such that the first modeled residue is not the first residue in UniProt).
The correspondence between the internal and reference sequences is given
with :class:`ihm.reference.Alignment` objects.
Output
======
Once the hierarchy of classes is complete, it can be freely inspected or
modified. All the classes are simple lightweight Python objects, generally
with the relevant data available as member variables. For example, modeling
packages such as `IMP <https://integrativemodeling.org>`_ will typically
generate an IHM hierarchy from their own internal data models, but in many
cases some information relevant to IHM (such as
the :class:`associated publication <ihm.Citation>`) cannot be determined
automatically and can be filled in by adding more objects to the hierarchy.
The complete hierarchy can be written out to an mmCIF or BinaryCIF file using
the :func:`ihm.dumper.write` function.
Input
=====
Hierarchies of IHM classes can also be read from mmCIF or BinaryCIF files.
This is done using the :func:`ihm.reader.read` function, which returns a list of
:class:`ihm.System` objects.
|