File: usage.rst

package info (click to toggle)
python-ihm 2.7-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 3,368 kB
  • sloc: python: 30,422; ansic: 5,990; sh: 24; makefile: 20
file content (170 lines) | stat: -rw-r--r-- 8,592 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
Usage
*****

Usage of the library for output consists of first creating a hierarchy of
Python objects that together describe the system, and then dumping that
hierarchy to an mmCIF file.

For a complete worked example, see the
`simple docking example <https://github.com/ihmwg/python-ihm/blob/main/examples/simple-docking.py>`_.

The top level of the hierarchy in IHM is the :class:`ihm.System`. All other
objects are referenced from a System object.

Datasets
========

Any data used anywhere in the modeling (including in validation) can be
referenced with an :class:`ihm.dataset.Dataset`. For example,
electron microscopy data is referenced with
:class:`ihm.dataset.EMDensityDataset` and small angle scattering data with
:class:`ihm.dataset.SASDataset`.

A dataset uses an
:class:`ihm.location.Location` object to describe where it is stored.
Typically this is an :class:`ihm.location.DatabaseLocation` for something
that's deposited in a experiment-specific database such as PDB, EMDB, PRIDE,
or EMPIAR, or :class:`ihm.location.InputFileLocation` for something that's
stored as a simple file, either on the local disk or at a location described
with a DOI such as `Zenodo <https://zenodo.org>`_ or a publication's
supplementary information. See the
`locations example <https://github.com/ihmwg/python-ihm/blob/main/examples/locations.py>`_
for more examples.

System architecture
===================

The architecture of the system is described with a number of classes:

 - :class:`ihm.Entity` describes each unique sequence.
 - :class:`ihm.AsymUnit` describes each asymmetric unit (chain) in the system.
   For example, a homodimer would consist of two asymmetric units, both
   pointing to the same entity, while a heterodimer contains two entities.
   It is also possible for an entity to exist with no asymmetric units pointing
   to it - this typically corresponds to something seen in an experiment (such
   as a cross-linking study) which was not modeled. Note that the IHM
   extension currently contains no support for symmetry, so two chains that
   are symmetrically related should each be represented as an "asymmetric"
   unit.
 - :class:`ihm.Assembly` groups asymmetric units and/or entities, or parts of
   them. Assemblies are used to describe which parts of the system correspond
   to each input source of data, or that were modeled.
 - :class:`ihm.representation.Representation` describes how each part of the
   system was represented in the modeling, for example
   :class:`as atoms <ihm.representation.AtomicSegment>` or
   :class:`as coarse-grained spheres <ihm.representation.FeatureSegment>`.

Restraints and sampling
=======================

Restraints, that score or otherwise fit the computational model against
the input data, can be created as :class:`ihm.restraint.Restraint` objects.
These generally take as input a :class:`~ihm.dataset.Dataset` pointing to
the input data, and an :class:`~ihm.Assembly` describing which part of the
model the data corresponds to. For example, there are restraints for
:class:`3D EM <ihm.restraint.EM3DRestraint>` and
:class:`small angle scattering <ihm.restraint.SASRestraint>`.

:class:`ihm.protocol.Protocol` objects describe how models were generated
from the input data. A protocol can consist of
:class:`multiple steps <ihm.protocol.Step>`, such as molecular dynamics or
Monte Carlo, followed by one or more analyses, such as clustering, filtering,
rescoring, or validation, described by :class:`ihm.analysis.Analysis` objects.
These objects generally take an :class:`~ihm.Assembly` to indicate what part
of the system was considered and a
:class:`group of datasets <ihm.dataset.DatasetGroup>` to show which data
guided the modeling or analysis.

Model coordinates
=================

:class:`ihm.model.Model` objects give the actual coordinates of the final
generated models. These point to the :class:`~ihm.Assembly` of what was
modeled, the :class:`~ihm.protocol.Protocol` describing how the modeling
was done, and the :class:`~ihm.representation.Representation` showing how
the model was represented.

Models can be grouped together for any purpose using the
:class:`ihm.model.ModelGroup` class. If a given group describes an ensemble
of models, the :class:`ihm.model.Ensemble` class allows for additional
information on the ensemble to be provided, such as
:class:`localization densities <ihm.model.LocalizationDensity>` of parts of
the system and precision. Due to size, generally only representative models
of an ensemble are deposited in mmCIF, but the :class:`~ihm.model.Ensemble`
class allows the full ensemble to be referred to, for example in a more
compact binary format (e.g. DCD) deposited at a given DOI. Groups of models
can also be shown as corresponding to different states of the system using
the :class:`ihm.model.State` class.

Metadata
========

Metadata can also be added to the system, such as

 - :class:`ihm.Citation`: publication(s) that describe this modeling or the
   methods used in it.
 - :class:`ihm.Software`: software packages used to process the experimental
   data, generate intermediate inputs, do the modeling itself, and/or
   process the output.
 - :class:`ihm.Grant`: funding support for the modeling.
 - :class:`ihm.reference.UniProtSequence`: information on a sequence used
   in modeling, in UniProt.

Residue numbering
=================

The library keeps track of several numbering schemes to reflect the reality
of the data used in modeling:

 - *Internal numbering*. Residues are always numbered sequentially starting at
   1 in an :class:`~ihm.Entity`. All references to residues or residue ranges in
   the library use this numbering. For polymers, this internal numbering matches
   the ``seq_id`` used in the mmCIF dictionary, while for branched entities,
   this matches ``num`` in the dictionary. (For other types of entities
   (non-polymers, waters) ``seq_id`` is not used in mmCIF,
   but the residues are still numbered sequentially from 1 in this library.)
 - *Author-provided numbering*. If a different numbering scheme is used by the
   authors, for example to correspond to the numbering of the original sequence
   that is modeled, this can be given as an author-provided numbering for
   one or more asymmetric units. See the ``auth_seq_id_map`` and
   ``orig_auth_seq_id_map`` parameters to :class:`~ihm.AsymUnit`. (The mapping
   between author-provided and internal numbering is given in tables such
   as ``pdbx_poly_seq_scheme`` in the mmCIF file.) Two maps are provided as
   PDB provides for two distinct author-provided schemes; the "original"
   author-provided numbering ``orig_auth_seq_id_map`` is entirely unrestricted
   but is only used internally, while ``auth_seq_id_map`` must follow certain
   PDB rules (and generally matches the residue numbers used in legacy PDB
   files). In most cases, only ``auth_seq_id_map`` is used.
 - *Starting model numbering*. If the initial state of the modeling is given
   by one or more PDB files, the numbering of residues in those files may not
   line up with the internal numbering. In this case an offset from starting
   model numbering to internal numbering can be provided - see the ``offset``
   parameter to :class:`~ihm.startmodel.StartingModel`.
 - *Reference sequence numbering*. The modeled sequence may differ from that
   in a database such as UniProt, which is itself numbered sequentially from 1
   (for example, the modeled sequence may be a subset of the UniProt sequence,
   such that the first modeled residue is not the first residue in UniProt).
   The correspondence between the internal and reference sequences is given
   with :class:`ihm.reference.Alignment` objects.

Output
======

Once the hierarchy of classes is complete, it can be freely inspected or
modified. All the classes are simple lightweight Python objects, generally
with the relevant data available as member variables. For example, modeling
packages such as `IMP <https://integrativemodeling.org>`_ will typically
generate an IHM hierarchy from their own internal data models, but in many
cases some information relevant to IHM (such as
the :class:`associated publication <ihm.Citation>`) cannot be determined
automatically and can be filled in by adding more objects to the hierarchy.

The complete hierarchy can be written out to an mmCIF or BinaryCIF file using
the :func:`ihm.dumper.write` function.

Input
=====

Hierarchies of IHM classes can also be read from mmCIF or BinaryCIF files.
This is done using the :func:`ihm.reader.read` function, which returns a list of
:class:`ihm.System` objects.