1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407
|
<!-- ============================================
::DATATOOL:: Generated from "gbseq.asn"
::DATATOOL:: by application DATATOOL version 2.4.4
::DATATOOL:: on 12/18/2013 23:04:02
============================================ -->
<!-- ============================================ -->
<!-- This section is mapped from module "NCBI-GBSeq"
================================================= -->
<!--
$Revision: 413850 $
*********************************************************
ASN.1 and XML for the components of a GenBank format sequence
J.Ostell 2002
Updated 25 May 2010
*********************************************************
-->
<!--
********
GBSeq represents the elements in a GenBank style report
of a sequence with some small additions to structure and support
for protein (GenPept) versions of GenBank format as seen in
Entrez. While this represents the simplification, reduction of
detail, and flattening to a single sequence perspective of GenBank
format (compared with the full ASN.1 or XML from which GenBank and
this format is derived at NCBI), it is presented in ASN.1 or XML for
automated parsing and processing. It is hoped that this compromise
will be useful for those bulk processing at the GenBank format level
of detail today. Since it is a compromise, a number of pragmatic
decisions have been made.
In pursuit of simplicity and familiarity a number of
fields do not have full substructure defined here where there is
already a standard GenBank format string. For example:
Date DD-Mon-YYYY
Authors LastName, Intials (with periods)
Journal JounalName Volume (issue), page-range (year)
FeatureLocations as per GenBank feature table, but FeatureIntervals
may also be provided as a convenience
FeatureQualifiers as per GenBank feature table
Primary has a string that represents a table to construct
a third party (TPA) sequence.
other-seqids can have strings with the "vertical bar format" sequence
identifiers used in BLAST for example, when they are non-genbank types.
Currently in GenBank format you only see GI, but there are others, like
patents, submitter clone names, etc which will appear here, as they
always have in the ASN.1 format, and full XML format.
source-db is a formatted text block for peptides in GenPept format that
carries information from the source protein database.
There are also a number of elements that could have been
more exactly specified, but in the interest of simplicity
have been simply left as options. For example..
accession and accession.version will always appear in a GenBank record
they are optional because this format can also be used for non-GenBank
sequences, and in that case will have only "other-seqids".
sequences will normally all have "sequence" filled in. But contig records
will have a "join" statement in the "contig" slot, and no "sequence".
We also may consider a retrieval option with no sequence of any kind
and no feature table to quickly check minimal values.
a reference may have an author list, or be from a consortium, or both.
some fields, such as taxonomy, do appear as separate elements in GenBank
format but without a specific linetype (in GenBank format this comes
under ORGANISM). Another example is the separation of primary accession
from the list of secondary accessions. In GenBank format primary
accession is just the first one on the list that includes all secondaries
after it.
create-date deserves special comment. The date you see on the right hand
side of the LOCUS line in GenBank format is actually the last date the
the record was modified (or the update-date). The date the record was
first submitted to GenBank appears in the first submission citation in
the reference section. Internally in the databases and ASN.1 NCBI keeps
the first date the record was released into the sequence database at
NCBI as create-date. For records from EMBL, which supports create-date,
it is the date provided by EMBL. For DDBJ records, which do not supply
a create-date (same as GenBank format) the create-date is the first date
NCBI saw the record from DDBJ. For older GenBank records, before NCBI
took responsibility for GenBank, it is just the first date NCBI saw the
record. Create-date can be very useful, so we expose it here, but users
must understand it is only an approximation and comes from many sources,
and with many exceptions and caveats. It does NOT tell you the first
date the public might have seen this record and thus is NOT an accurate
measure for legal issues of precedence.
********
-->
<!ELEMENT GBSet (GBSeq*)>
<!ELEMENT GBSeq (
GBSeq_locus?,
GBSeq_length,
GBSeq_strandedness?,
GBSeq_moltype,
GBSeq_topology?,
GBSeq_division?,
GBSeq_update-date?,
GBSeq_create-date?,
GBSeq_update-release?,
GBSeq_create-release?,
GBSeq_definition?,
GBSeq_primary-accession?,
GBSeq_entry-version?,
GBSeq_accession-version?,
GBSeq_other-seqids?,
GBSeq_secondary-accessions?,
GBSeq_project?,
GBSeq_keywords?,
GBSeq_segment?,
GBSeq_source?,
GBSeq_organism?,
GBSeq_taxonomy?,
GBSeq_references?,
GBSeq_comment?,
GBSeq_comment-set?,
GBSeq_struc-comments?,
GBSeq_primary?,
GBSeq_source-db?,
GBSeq_database-reference?,
GBSeq_feature-table?,
GBSeq_feature-set?,
GBSeq_sequence?,
GBSeq_contig?,
GBSeq_alt-seq?,
GBSeq_xrefs?)>
<!ELEMENT GBSeq_locus (#PCDATA)>
<!ELEMENT GBSeq_length (%INTEGER;)>
<!ELEMENT GBSeq_strandedness (#PCDATA)>
<!ELEMENT GBSeq_moltype (#PCDATA)>
<!ELEMENT GBSeq_topology (#PCDATA)>
<!ELEMENT GBSeq_division (#PCDATA)>
<!ELEMENT GBSeq_update-date (#PCDATA)>
<!ELEMENT GBSeq_create-date (#PCDATA)>
<!ELEMENT GBSeq_update-release (#PCDATA)>
<!ELEMENT GBSeq_create-release (#PCDATA)>
<!ELEMENT GBSeq_definition (#PCDATA)>
<!ELEMENT GBSeq_primary-accession (#PCDATA)>
<!ELEMENT GBSeq_entry-version (#PCDATA)>
<!ELEMENT GBSeq_accession-version (#PCDATA)>
<!ELEMENT GBSeq_other-seqids (GBSeqid*)>
<!ELEMENT GBSeq_secondary-accessions (GBSecondary-accn*)>
<!ELEMENT GBSeq_project (#PCDATA)>
<!ELEMENT GBSeq_keywords (GBKeyword*)>
<!ELEMENT GBSeq_segment (#PCDATA)>
<!ELEMENT GBSeq_source (#PCDATA)>
<!ELEMENT GBSeq_organism (#PCDATA)>
<!ELEMENT GBSeq_taxonomy (#PCDATA)>
<!ELEMENT GBSeq_references (GBReference*)>
<!ELEMENT GBSeq_comment (#PCDATA)>
<!ELEMENT GBSeq_comment-set (GBComment*)>
<!ELEMENT GBSeq_struc-comments (GBStrucComment*)>
<!ELEMENT GBSeq_primary (#PCDATA)>
<!ELEMENT GBSeq_source-db (#PCDATA)>
<!ELEMENT GBSeq_database-reference (#PCDATA)>
<!ELEMENT GBSeq_feature-table (GBFeature*)>
<!ELEMENT GBSeq_feature-set (GBFeatureSet*)>
<!-- Optional for contig, wgs, etc. -->
<!ELEMENT GBSeq_sequence (#PCDATA)>
<!ELEMENT GBSeq_contig (#PCDATA)>
<!ELEMENT GBSeq_alt-seq (GBAltSeqData*)>
<!ELEMENT GBSeq_xrefs (GBXref*)>
<!ELEMENT GBSeqid (#PCDATA)>
<!ELEMENT GBSecondary-accn (#PCDATA)>
<!ELEMENT GBKeyword (#PCDATA)>
<!ELEMENT GBReference (
GBReference_reference,
GBReference_position?,
GBReference_authors?,
GBReference_consortium?,
GBReference_title?,
GBReference_journal,
GBReference_xref?,
GBReference_pubmed?,
GBReference_remark?)>
<!ELEMENT GBReference_reference (#PCDATA)>
<!ELEMENT GBReference_position (#PCDATA)>
<!ELEMENT GBReference_authors (GBAuthor*)>
<!ELEMENT GBReference_consortium (#PCDATA)>
<!ELEMENT GBReference_title (#PCDATA)>
<!ELEMENT GBReference_journal (#PCDATA)>
<!ELEMENT GBReference_xref (GBXref*)>
<!ELEMENT GBReference_pubmed (%INTEGER;)>
<!ELEMENT GBReference_remark (#PCDATA)>
<!ELEMENT GBAuthor (#PCDATA)>
<!ELEMENT GBXref (
GBXref_dbname,
GBXref_id)>
<!ELEMENT GBXref_dbname (#PCDATA)>
<!ELEMENT GBXref_id (#PCDATA)>
<!ELEMENT GBComment (
GBComment_type?,
GBComment_paragraphs)>
<!ELEMENT GBComment_type (#PCDATA)>
<!ELEMENT GBComment_paragraphs (GBCommentParagraph*)>
<!ELEMENT GBCommentParagraph (#PCDATA)>
<!ELEMENT GBStrucComment (
GBStrucComment_name?,
GBStrucComment_items)>
<!ELEMENT GBStrucComment_name (#PCDATA)>
<!ELEMENT GBStrucComment_items (GBStrucCommentItem*)>
<!ELEMENT GBStrucCommentItem (
GBStrucCommentItem_tag?,
GBStrucCommentItem_value?,
GBStrucCommentItem_url?)>
<!ELEMENT GBStrucCommentItem_tag (#PCDATA)>
<!ELEMENT GBStrucCommentItem_value (#PCDATA)>
<!ELEMENT GBStrucCommentItem_url (#PCDATA)>
<!ELEMENT GBFeatureSet (
GBFeatureSet_annot-source?,
GBFeatureSet_features)>
<!ELEMENT GBFeatureSet_annot-source (#PCDATA)>
<!ELEMENT GBFeatureSet_features (GBFeature*)>
<!ELEMENT GBFeature (
GBFeature_key,
GBFeature_location,
GBFeature_intervals?,
GBFeature_operator?,
GBFeature_partial5?,
GBFeature_partial3?,
GBFeature_quals?,
GBFeature_xrefs?)>
<!ELEMENT GBFeature_key (#PCDATA)>
<!ELEMENT GBFeature_location (#PCDATA)>
<!ELEMENT GBFeature_intervals (GBInterval*)>
<!ELEMENT GBFeature_operator (#PCDATA)>
<!ELEMENT GBFeature_partial5 EMPTY>
<!ATTLIST GBFeature_partial5 value ( true | false ) #REQUIRED >
<!ELEMENT GBFeature_partial3 EMPTY>
<!ATTLIST GBFeature_partial3 value ( true | false ) #REQUIRED >
<!ELEMENT GBFeature_quals (GBQualifier*)>
<!ELEMENT GBFeature_xrefs (GBXref*)>
<!ELEMENT GBInterval (
GBInterval_from?,
GBInterval_to?,
GBInterval_point?,
GBInterval_iscomp?,
GBInterval_interbp?,
GBInterval_accession)>
<!ELEMENT GBInterval_from (%INTEGER;)>
<!ELEMENT GBInterval_to (%INTEGER;)>
<!ELEMENT GBInterval_point (%INTEGER;)>
<!ELEMENT GBInterval_iscomp EMPTY>
<!ATTLIST GBInterval_iscomp value ( true | false ) #REQUIRED >
<!ELEMENT GBInterval_interbp EMPTY>
<!ATTLIST GBInterval_interbp value ( true | false ) #REQUIRED >
<!ELEMENT GBInterval_accession (#PCDATA)>
<!ELEMENT GBQualifier (
GBQualifier_name,
GBQualifier_value?)>
<!ELEMENT GBQualifier_name (#PCDATA)>
<!ELEMENT GBQualifier_value (#PCDATA)>
<!ELEMENT GBAltSeqData (
GBAltSeqData_name,
GBAltSeqData_items?)>
<!-- e.g., contig, wgs, scaffold, cage, genome -->
<!ELEMENT GBAltSeqData_name (#PCDATA)>
<!ELEMENT GBAltSeqData_items (GBAltSeqItem*)>
<!ELEMENT GBAltSeqItem (
GBAltSeqItem_interval?,
GBAltSeqItem_isgap?,
GBAltSeqItem_gap-length?,
GBAltSeqItem_gap-type?,
GBAltSeqItem_gap-linkage?,
GBAltSeqItem_gap-comment?,
GBAltSeqItem_first-accn?,
GBAltSeqItem_last-accn?,
GBAltSeqItem_value?)>
<!ELEMENT GBAltSeqItem_interval (GBInterval)>
<!ELEMENT GBAltSeqItem_isgap EMPTY>
<!ATTLIST GBAltSeqItem_isgap value ( true | false ) #REQUIRED >
<!ELEMENT GBAltSeqItem_gap-length (%INTEGER;)>
<!ELEMENT GBAltSeqItem_gap-type (#PCDATA)>
<!ELEMENT GBAltSeqItem_gap-linkage (#PCDATA)>
<!ELEMENT GBAltSeqItem_gap-comment (#PCDATA)>
<!ELEMENT GBAltSeqItem_first-accn (#PCDATA)>
<!ELEMENT GBAltSeqItem_last-accn (#PCDATA)>
<!ELEMENT GBAltSeqItem_value (#PCDATA)>
|