File: NCBI_GBSeq.mod.dtd

package info (click to toggle)
python-biopython 1.68%2Bdfsg-3
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 46,860 kB
  • ctags: 13,237
  • sloc: python: 160,306; xml: 93,216; ansic: 9,118; sql: 1,208; makefile: 155; sh: 63
file content (407 lines) | stat: -rw-r--r-- 11,502 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
<!-- ============================================
     ::DATATOOL:: Generated from "gbseq.asn"
     ::DATATOOL:: by application DATATOOL version 2.4.4
     ::DATATOOL:: on 12/18/2013 23:04:02
     ============================================ -->

<!-- ============================================ -->
<!-- This section is mapped from module "NCBI-GBSeq"
================================================= -->

<!--
$Revision: 413850 $
*********************************************************

 ASN.1 and XML for the components of a GenBank format sequence
 J.Ostell 2002
 Updated 25 May 2010

*********************************************************
-->

<!--
********
  GBSeq represents the elements in a GenBank style report
    of a sequence with some small additions to structure and support
    for protein (GenPept) versions of GenBank format as seen in
    Entrez. While this represents the simplification, reduction of
    detail, and flattening to a single sequence perspective of GenBank
    format (compared with the full ASN.1 or XML from which GenBank and
    this format is derived at NCBI), it is presented in ASN.1 or XML for
    automated parsing and processing. It is hoped that this compromise
    will be useful for those bulk processing at the GenBank format level
    of detail today. Since it is a compromise, a number of pragmatic
    decisions have been made.

  In pursuit of simplicity and familiarity a number of
    fields do not have full substructure defined here where there is
    already a standard GenBank format string. For example:

   Date  DD-Mon-YYYY
   Authors   LastName, Intials (with periods)
   Journal   JounalName Volume (issue), page-range (year)
   FeatureLocations as per GenBank feature table, but FeatureIntervals
    may also be provided as a convenience
   FeatureQualifiers  as per GenBank feature table
   Primary has a string that represents a table to construct
    a third party (TPA) sequence.
   other-seqids can have strings with the "vertical bar format" sequence
    identifiers used in BLAST for example, when they are non-genbank types.
    Currently in GenBank format you only see GI, but there are others, like
    patents, submitter clone names, etc which will appear here, as they
    always have in the ASN.1 format, and full XML format.
   source-db is a formatted text block for peptides in GenPept format that
    carries information from the source protein database.

  There are also a number of elements that could have been
   more exactly specified, but in the interest of simplicity
   have been simply left as options. For example..

  accession and accession.version will always appear in a GenBank record
   they are optional because this format can also be used for non-GenBank
   sequences, and in that case will have only "other-seqids".

  sequences will normally all have "sequence" filled in. But contig records
    will have a "join" statement in the "contig" slot, and no "sequence".
    We also may consider a retrieval option with no sequence of any kind
     and no feature table to quickly check minimal values.

  a reference may have an author list, or be from a consortium, or both.

  some fields, such as taxonomy, do appear as separate elements in GenBank
    format but without a specific linetype (in GenBank format this comes
    under ORGANISM). Another example is the separation of primary accession
    from the list of secondary accessions. In GenBank format primary
    accession is just the first one on the list that includes all secondaries
    after it.

  create-date deserves special comment. The date you see on the right hand
    side of the LOCUS line in GenBank format is actually the last date the
    the record was modified (or the update-date). The date the record was
    first submitted to GenBank appears in the first submission citation in
    the reference section. Internally in the databases and ASN.1 NCBI keeps
    the first date the record was released into the sequence database at
    NCBI as create-date. For records from EMBL, which supports create-date,
    it is the date provided by EMBL. For DDBJ records, which do not supply
    a create-date (same as GenBank format) the create-date is the first date
    NCBI saw the record from DDBJ. For older GenBank records, before NCBI
    took responsibility for GenBank, it is just the first date NCBI saw the
    record. Create-date can be very useful, so we expose it here, but users
    must understand it is only an approximation and comes from many sources,
    and with many exceptions and caveats. It does NOT tell you the first
    date the public might have seen this record and thus is NOT an accurate
    measure for legal issues of precedence.

********
-->
<!ELEMENT GBSet (GBSeq*)>


<!ELEMENT GBSeq (
        GBSeq_locus?, 
        GBSeq_length, 
        GBSeq_strandedness?, 
        GBSeq_moltype, 
        GBSeq_topology?, 
        GBSeq_division?, 
        GBSeq_update-date?, 
        GBSeq_create-date?, 
        GBSeq_update-release?, 
        GBSeq_create-release?, 
        GBSeq_definition?, 
        GBSeq_primary-accession?, 
        GBSeq_entry-version?, 
        GBSeq_accession-version?, 
        GBSeq_other-seqids?, 
        GBSeq_secondary-accessions?, 
        GBSeq_project?, 
        GBSeq_keywords?, 
        GBSeq_segment?, 
        GBSeq_source?, 
        GBSeq_organism?, 
        GBSeq_taxonomy?, 
        GBSeq_references?, 
        GBSeq_comment?, 
        GBSeq_comment-set?, 
        GBSeq_struc-comments?, 
        GBSeq_primary?, 
        GBSeq_source-db?, 
        GBSeq_database-reference?, 
        GBSeq_feature-table?, 
        GBSeq_feature-set?, 
        GBSeq_sequence?, 
        GBSeq_contig?, 
        GBSeq_alt-seq?, 
        GBSeq_xrefs?)>

<!ELEMENT GBSeq_locus (#PCDATA)>

<!ELEMENT GBSeq_length (%INTEGER;)>

<!ELEMENT GBSeq_strandedness (#PCDATA)>

<!ELEMENT GBSeq_moltype (#PCDATA)>

<!ELEMENT GBSeq_topology (#PCDATA)>

<!ELEMENT GBSeq_division (#PCDATA)>

<!ELEMENT GBSeq_update-date (#PCDATA)>

<!ELEMENT GBSeq_create-date (#PCDATA)>

<!ELEMENT GBSeq_update-release (#PCDATA)>

<!ELEMENT GBSeq_create-release (#PCDATA)>

<!ELEMENT GBSeq_definition (#PCDATA)>

<!ELEMENT GBSeq_primary-accession (#PCDATA)>

<!ELEMENT GBSeq_entry-version (#PCDATA)>

<!ELEMENT GBSeq_accession-version (#PCDATA)>

<!ELEMENT GBSeq_other-seqids (GBSeqid*)>

<!ELEMENT GBSeq_secondary-accessions (GBSecondary-accn*)>

<!ELEMENT GBSeq_project (#PCDATA)>

<!ELEMENT GBSeq_keywords (GBKeyword*)>

<!ELEMENT GBSeq_segment (#PCDATA)>

<!ELEMENT GBSeq_source (#PCDATA)>

<!ELEMENT GBSeq_organism (#PCDATA)>

<!ELEMENT GBSeq_taxonomy (#PCDATA)>

<!ELEMENT GBSeq_references (GBReference*)>

<!ELEMENT GBSeq_comment (#PCDATA)>

<!ELEMENT GBSeq_comment-set (GBComment*)>

<!ELEMENT GBSeq_struc-comments (GBStrucComment*)>

<!ELEMENT GBSeq_primary (#PCDATA)>

<!ELEMENT GBSeq_source-db (#PCDATA)>

<!ELEMENT GBSeq_database-reference (#PCDATA)>

<!ELEMENT GBSeq_feature-table (GBFeature*)>

<!ELEMENT GBSeq_feature-set (GBFeatureSet*)>

<!-- Optional for contig, wgs, etc. -->
<!ELEMENT GBSeq_sequence (#PCDATA)>

<!ELEMENT GBSeq_contig (#PCDATA)>

<!ELEMENT GBSeq_alt-seq (GBAltSeqData*)>

<!ELEMENT GBSeq_xrefs (GBXref*)>


<!ELEMENT GBSeqid (#PCDATA)>


<!ELEMENT GBSecondary-accn (#PCDATA)>


<!ELEMENT GBKeyword (#PCDATA)>


<!ELEMENT GBReference (
        GBReference_reference, 
        GBReference_position?, 
        GBReference_authors?, 
        GBReference_consortium?, 
        GBReference_title?, 
        GBReference_journal, 
        GBReference_xref?, 
        GBReference_pubmed?, 
        GBReference_remark?)>

<!ELEMENT GBReference_reference (#PCDATA)>

<!ELEMENT GBReference_position (#PCDATA)>

<!ELEMENT GBReference_authors (GBAuthor*)>

<!ELEMENT GBReference_consortium (#PCDATA)>

<!ELEMENT GBReference_title (#PCDATA)>

<!ELEMENT GBReference_journal (#PCDATA)>

<!ELEMENT GBReference_xref (GBXref*)>

<!ELEMENT GBReference_pubmed (%INTEGER;)>

<!ELEMENT GBReference_remark (#PCDATA)>


<!ELEMENT GBAuthor (#PCDATA)>


<!ELEMENT GBXref (
        GBXref_dbname, 
        GBXref_id)>

<!ELEMENT GBXref_dbname (#PCDATA)>

<!ELEMENT GBXref_id (#PCDATA)>


<!ELEMENT GBComment (
        GBComment_type?, 
        GBComment_paragraphs)>

<!ELEMENT GBComment_type (#PCDATA)>

<!ELEMENT GBComment_paragraphs (GBCommentParagraph*)>


<!ELEMENT GBCommentParagraph (#PCDATA)>


<!ELEMENT GBStrucComment (
        GBStrucComment_name?, 
        GBStrucComment_items)>

<!ELEMENT GBStrucComment_name (#PCDATA)>

<!ELEMENT GBStrucComment_items (GBStrucCommentItem*)>


<!ELEMENT GBStrucCommentItem (
        GBStrucCommentItem_tag?, 
        GBStrucCommentItem_value?, 
        GBStrucCommentItem_url?)>

<!ELEMENT GBStrucCommentItem_tag (#PCDATA)>

<!ELEMENT GBStrucCommentItem_value (#PCDATA)>

<!ELEMENT GBStrucCommentItem_url (#PCDATA)>


<!ELEMENT GBFeatureSet (
        GBFeatureSet_annot-source?, 
        GBFeatureSet_features)>

<!ELEMENT GBFeatureSet_annot-source (#PCDATA)>

<!ELEMENT GBFeatureSet_features (GBFeature*)>


<!ELEMENT GBFeature (
        GBFeature_key, 
        GBFeature_location, 
        GBFeature_intervals?, 
        GBFeature_operator?, 
        GBFeature_partial5?, 
        GBFeature_partial3?, 
        GBFeature_quals?, 
        GBFeature_xrefs?)>

<!ELEMENT GBFeature_key (#PCDATA)>

<!ELEMENT GBFeature_location (#PCDATA)>

<!ELEMENT GBFeature_intervals (GBInterval*)>

<!ELEMENT GBFeature_operator (#PCDATA)>

<!ELEMENT GBFeature_partial5 EMPTY>
<!ATTLIST GBFeature_partial5 value ( true | false ) #REQUIRED >


<!ELEMENT GBFeature_partial3 EMPTY>
<!ATTLIST GBFeature_partial3 value ( true | false ) #REQUIRED >


<!ELEMENT GBFeature_quals (GBQualifier*)>

<!ELEMENT GBFeature_xrefs (GBXref*)>


<!ELEMENT GBInterval (
        GBInterval_from?, 
        GBInterval_to?, 
        GBInterval_point?, 
        GBInterval_iscomp?, 
        GBInterval_interbp?, 
        GBInterval_accession)>

<!ELEMENT GBInterval_from (%INTEGER;)>

<!ELEMENT GBInterval_to (%INTEGER;)>

<!ELEMENT GBInterval_point (%INTEGER;)>

<!ELEMENT GBInterval_iscomp EMPTY>
<!ATTLIST GBInterval_iscomp value ( true | false ) #REQUIRED >


<!ELEMENT GBInterval_interbp EMPTY>
<!ATTLIST GBInterval_interbp value ( true | false ) #REQUIRED >


<!ELEMENT GBInterval_accession (#PCDATA)>


<!ELEMENT GBQualifier (
        GBQualifier_name, 
        GBQualifier_value?)>

<!ELEMENT GBQualifier_name (#PCDATA)>

<!ELEMENT GBQualifier_value (#PCDATA)>


<!ELEMENT GBAltSeqData (
        GBAltSeqData_name, 
        GBAltSeqData_items?)>

<!-- e.g., contig, wgs, scaffold, cage, genome -->
<!ELEMENT GBAltSeqData_name (#PCDATA)>

<!ELEMENT GBAltSeqData_items (GBAltSeqItem*)>


<!ELEMENT GBAltSeqItem (
        GBAltSeqItem_interval?, 
        GBAltSeqItem_isgap?, 
        GBAltSeqItem_gap-length?, 
        GBAltSeqItem_gap-type?, 
        GBAltSeqItem_gap-linkage?, 
        GBAltSeqItem_gap-comment?, 
        GBAltSeqItem_first-accn?, 
        GBAltSeqItem_last-accn?, 
        GBAltSeqItem_value?)>

<!ELEMENT GBAltSeqItem_interval (GBInterval)>

<!ELEMENT GBAltSeqItem_isgap EMPTY>
<!ATTLIST GBAltSeqItem_isgap value ( true | false ) #REQUIRED >


<!ELEMENT GBAltSeqItem_gap-length (%INTEGER;)>

<!ELEMENT GBAltSeqItem_gap-type (#PCDATA)>

<!ELEMENT GBAltSeqItem_gap-linkage (#PCDATA)>

<!ELEMENT GBAltSeqItem_gap-comment (#PCDATA)>

<!ELEMENT GBAltSeqItem_first-accn (#PCDATA)>

<!ELEMENT GBAltSeqItem_last-accn (#PCDATA)>

<!ELEMENT GBAltSeqItem_value (#PCDATA)>