1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335
|
--$Revision: 6.0 $
--**********************************************************************
--
-- Biological Macromolecule 3-D Structure Data Types for MMDB,
-- A Molecular Modeling Database
--
-- Definitions for structural models
--
-- By Hitomi Ohkawa, Jim Ostell, Chris Hogue and Steve Bryant
--
-- National Center for Biotechnology Information
-- National Institutes of Health
-- Bethesda, MD 20894 USA
--
-- July, 1996
--
--**********************************************************************
MMDB-Structural-model DEFINITIONS ::=
BEGIN
EXPORTS Biostruc-model, Model-id, Model-coordinate-set-id;
IMPORTS Chem-graph-pntrs, Atom-pntrs, Chem-graph-alignment,
Sphere, Cone, Cylinder, Brick, Transform FROM MMDB-Features
Biostruc-id FROM MMDB
Pub FROM NCBI-Pub;
-- A structural model maps chemical components into a measured three-
-- dimensional space. PDB-derived biostrucs generally contain 4 models,
-- corresponding to "views" of the structure of a biomolecular assemble with
-- increasing levels of complexity. Model types indicate the complexity of the
-- view.
-- The model named "NCBI all atom" represents a view suitable for most
-- computational biology applications. It provides complete atomic coordinate
-- data for a "single best" model, omitting statistical disorder information
-- and/or ensemble structure descriptions provided in the source PDB file.
-- Construction of the single best model is based on the assumption that the
-- contents of the "alternate conformation" field from pdb imply no correlation
-- among the occupancies of multiple sites assigned to sets of atoms: the best
-- site is chosen only on the basis of highest occupancy. Note, however, that
-- alternate conformation sets where correlation is implied are generally
-- constrained in crystallographic refinement to have uniform occupancy, and
-- will thus be selected as a set. For ensemble models the model which assigns
-- coordinates to the most atoms is chosen. If numbers of coordinates are the
-- same, the model occurring first in the PDB file is selected. The single
-- best model includes complete coordinates for all nonpolymer components, but
-- omits those classified as "solvent". Model type is 3 for this model.
-- The model named "NCBI backbone" represents a simple view intended for
-- graphic displays and rapid transmission over a network. It includes only
-- alpha carbon or backbone phosphate coordinates for biopolymers. It is based
-- on selection of alpha-carbon and backbone phosphate atoms from the "NCBI
-- all atom" model. The model type is set to 2. An even simpler model gives
-- only a cartoon representation, using cylinders corresponding to secondary
-- structure elements. This is named "NCBI vector", and has model type 1.
-- The models named "PDB Model 1", "PDB Model 2", etc. represent the complete
-- information provided by PDB, including full descriptions of statistical
-- disorder. The name of the model is based on the contents of the PDB MODEL
-- record, with a default name of "PDB Model 1" for PDB files which contain
-- only a single model. Construction of these models is based on the
-- assumption that contents of the PDB "alternate conformation" field are
-- intended to imply correlation among the occupancies of atom sets flagged by
-- the same identifier. The special flag " " (blank) is assumed to indicate
-- sites occupied in all alternate conformations, and sites flagged otherwise,
-- together with " ", to indicate a distinct member of an ensemble of
-- alternate conformations. Note that construction of ensemble members
-- according to these assumption requires two validation checks on PDB
-- "alternate conformation" flags: they must be unique among sites assigned to
-- the same atom, and that the special " " flag must occur only for unique
-- sites. Sites which violate the first check are flagged as "u", for
-- "unknown"; they are omitted from all ensemble definitions but are
-- nontheless retained in the coordinate list. Sites which violate the second
-- check are flagged "b" for "blank", and are included in an appropriately
-- named ensemble. The model type for pdb all models is 4.
-- Note that in the MMDB database models are stored in the ASN.1 stream in
-- order of increasing model type value. Since models occur as the last item
-- in a biostruc, parsers may avoid reading the entire stream if the desired
-- model is one of the simplified types, which occur first in the stream. This
-- can save considerable I/O time, particularly for large ensemble models from
-- NMR determinations.
Biostruc-model ::= SEQUENCE {
id Model-id,
type Model-type,
descr SEQUENCE OF Model-descr OPTIONAL,
model-space Model-space OPTIONAL,
model-coordinates SEQUENCE OF Model-coordinate-set OPTIONAL }
Model-id ::= INTEGER
Model-type ::= INTEGER {
ncbi-vector(1),
ncbi-backbone(2),
ncbi-all-atom(3),
pdb-model(4),
other(255)}
Model-descr ::= CHOICE {
name VisibleString,
pdb-reso VisibleString,
pdb-method VisibleString,
pdb-comment VisibleString,
other-comment VisibleString,
attribution Pub }
-- The model space defines measurement units and any external reference frame.
-- Coordinates refer to a right-handed orthogonal system defined on axes
-- tagged x, y and z in the coordinate and feature definitions of a biostruc.
-- Coordinates from PDB-derived structures are reported without change, in
-- angstrom units. The units of temperature and occupancy factors are not
-- defined explicitly in PDB, but are inferred from their value range.
Model-space ::= SEQUENCE {
coordinate-units ENUMERATED {
angstroms(1),
nanometers(2),
other(3),
unknown(255)},
thermal-factor-units ENUMERATED {
b(1),
u(2),
other(3),
unknown(255)} OPTIONAL,
occupancy-factor-units ENUMERATED {
fractional(1),
electrons(2),
other(3),
unknown(255)} OPTIONAL,
density-units ENUMERATED {
electrons-per-unit-volume(1),
arbitrary-scale(2),
other(3),
unknown(255)} OPTIONAL,
reference-frame Reference-frame OPTIONAL }
-- An external reference frame is a pointer to another biostruc, with an
-- optional operator to rotate and translate coordinates into its model space.
-- This item is intended for representation of homology-derived model
-- structures, and is not present for structures from PDB.
Reference-frame ::= SEQUENCE {
biostruc-id Biostruc-id,
rotation-translation Transform OPTIONAL }
-- Atomic coordinates may be assigned literally or by reference to another
-- biostruc. The reference coordinate type is used to represent homology-
-- derived model structures. PDB-derived structures have literal coordinates.
-- Referenced coordinates identify another biostruc, any transformation to be
-- applied to coordinates from that biostruc, and a mapping of the chemical
-- graph of the present biostruc onto that of the referenced biostruc. They
-- give an "alignment" of atoms in the current biostruc with those in another,
-- from which the coordinates of matched atoms may be retrieved. For non-
-- atomic models "alignment" may also be represented by molecule and residue
-- equivalence lists. Referenced coordinates are a data item inteded for
-- representation of homology models, with an explicit pointer to their source
-- information. They do not occur in PDB-derived models.
Model-coordinate-set ::= SEQUENCE {
id Model-coordinate-set-id OPTIONAL,
descr SEQUENCE OF Model-descr OPTIONAL,
coordinates CHOICE {
literal Coordinates,
reference Chem-graph-alignment } }
Model-coordinate-set-id ::= INTEGER
-- Literal coordinates map chemical components into the model space. Three
-- mapping types are allowed, atomic coordinate models, density-grid models,
-- and surface models. A model consists of a sequence of such coordinate sets,
-- and may thus combine coordinate subsets which have a different source.
-- PDB-derived models contain a single atomic coordinate set, as they by
-- definition represent information from a single source.
Coordinates ::= CHOICE {
atomic Atomic-coordinates,
surface Surface-coordinates,
density Density-coordinates }
-- Literal atomic coordinate values give location, occupancy and order
-- parameters, and a pointer to a specific atom defined in the biostruc graph.
-- Temperature and occupancy factors have their conventional crystallographic
-- definitions, with units defined in the model space declaration. Atoms,
-- sites, temperature-factors, occupancies and alternate-conformation-ids
-- are parallel arrays, i.e. the have the same number of values as given by
-- number-of-points. Conformation ensembles represent distinct correlated-
-- disorder subsets of the coordinates. They will be present only for certain
-- "views" of PDB structures, as described above. Their derivation from PDB-
-- supplied "alternate-conformation" ids is described below.
Atomic-coordinates ::= SEQUENCE {
number-of-points INTEGER,
atoms Atom-pntrs,
sites Model-space-points,
temperature-factors Atomic-temperature-factors OPTIONAL,
occupancies Atomic-occupancies OPTIONAL,
alternate-conf-ids Alternate-conformation-ids OPTIONAL,
conf-ensembles SEQUENCE OF Conformation-ensemble OPTIONAL }
-- The atoms whose location is described by each coordinate are identified
-- via a hierarchical pointer to the chemical graph of the biomolecular
-- assembly. Coordinates may be matched with atoms in the chemical structure
-- by the values of the molecule, residue and atom id's given here, which
-- match exactly the items of the same type defined in Biostruc-graph.
-- Coordinates are given as integer values, with a scale factor to convert
-- to real values for each x, y or z, in the units indicated in model-space.
-- Integer values must be divided by the the scale factor. This use of integer
-- values reduces the ASN.1 stream size. The scale factors for temperature
-- factors and occupancies are given separately, but must be used in the same
-- fashion to produce properly scaled real values.
Model-space-points ::= SEQUENCE {
scale-factor INTEGER,
x SEQUENCE OF INTEGER,
y SEQUENCE OF INTEGER,
z SEQUENCE OF INTEGER }
Atomic-temperature-factors ::= CHOICE {
isotropic Isotropic-temperature-factors,
anisotropic Anisotropic-temperature-factors }
Isotropic-temperature-factors ::= SEQUENCE {
scale-factor INTEGER,
b SEQUENCE OF INTEGER }
Anisotropic-temperature-factors ::= SEQUENCE {
scale-factor INTEGER,
b-11 SEQUENCE OF INTEGER,
b-12 SEQUENCE OF INTEGER,
b-13 SEQUENCE OF INTEGER,
b-22 SEQUENCE OF INTEGER,
b-23 SEQUENCE OF INTEGER,
b-33 SEQUENCE OF INTEGER }
Atomic-occupancies ::= SEQUENCE {
scale-factor INTEGER,
o SEQUENCE OF INTEGER }
-- An alternate conformation id is optionally associated with each coordinate.
-- Aside from corrections due to the validation checks described above, the
-- contents of MMDB Alternate-conformation-ids are identical to the PDB
-- "alternate conformation" field.
Alternate-conformation-ids ::= SEQUENCE OF Alternate-conformation-id
Alternate-conformation-id ::= VisibleString
-- Correlated disorder ensemble is defined by a set of alternate conformation
-- id's which identify coordinates relevant to that ensemble. These are
-- defined from the validated and corrected contents of the PDB "alternate
-- conformation" field as described above. A given ensemble, for example, may
-- consist of atom sites flagged by " " and "A" Alternate-conformation-ids.
-- Names for ensembles are constructed from these flags. This example would be
-- named, in its description, "PDB Ensemble blank plus A".
-- Note that this interpretation is consistent with common PDB usage of the
-- "alternate conformation" field, but that PDB specifications do not formally
-- distinguish between correlated and uncorrelated disorder in crystallographic
-- models. Ensembles identified in MMDB thus may not correspond to the meaning
-- intended by PDB or the depositor. No information is lost, however, and
-- if the intended meaning is known alternative ensemble descriptions may be
-- reconstructed directly from the Alternate-conformation-ids.
-- Note that correlated disorder as defined here is allowed within an atomic
-- coordinate set but not between the multiple sets which may define a model.
-- Multiple sets within the same model are intended as a means to represent
-- assemblies modeled from different experimentally determined structures,
-- where correlated disorder between coordinate sets is not relevant.
Conformation-ensemble ::= SEQUENCE {
name VisibleString,
alt-conf-ids SEQUENCE OF Alternate-conformation-id }
-- Literal surface coordinates define the chemical components whose structure
-- is described by a surface, and the surface itself. The surface may be
-- either a regular geometric solid or a triangle-mesh of arbitrary shape.
Surface-coordinates ::= SEQUENCE {
contents Chem-graph-pntrs,
surface CHOICE { sphere Sphere,
cone Cone,
cylinder Cylinder,
brick Brick,
tmesh T-mesh,
triangles Triangles } }
T-mesh ::= SEQUENCE {
number-of-points INTEGER,
scale-factor INTEGER,
swap SEQUENCE OF BOOLEAN,
x SEQUENCE OF INTEGER,
y SEQUENCE OF INTEGER,
z SEQUENCE OF INTEGER }
Triangles ::= SEQUENCE {
number-of-points INTEGER,
scale-factor INTEGER,
x SEQUENCE OF INTEGER,
y SEQUENCE OF INTEGER,
z SEQUENCE OF INTEGER,
number-of-triangles INTEGER,
v1 SEQUENCE OF INTEGER,
v2 SEQUENCE OF INTEGER,
v3 SEQUENCE OF INTEGER }
-- Literal density coordinates define the chemical components whose structure
-- is described by a density grid, parameters of this grid, and density values.
Density-coordinates ::= SEQUENCE {
contents Chem-graph-pntrs,
grid-corners Brick,
grid-steps-x INTEGER,
grid-steps-y INTEGER,
grid-steps-z INTEGER,
fastest-varying ENUMERATED {
x(1),
y(2),
z(3)},
slowest-varying ENUMERATED {
x(1),
y(2),
z(3)},
scale-factor INTEGER,
density SEQUENCE OF INTEGER }
END
|