1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491
|
<!-- ============================================
::DATATOOL:: Generated from "insdseq.asn"
::DATATOOL:: by application DATATOOL version 2.0.0
::DATATOOL:: on 08/02/2010 23:05:14
============================================ -->
<!-- ============================================ -->
<!-- This section is mapped from module "INSD-INSDSeq"
================================================= -->
<!--
$Revision: 192674 $
************************************************************************
ASN.1 and XML for the components of a GenBank/EMBL/DDBJ sequence record
The International Nucleotide Sequence Database (INSD) collaboration
Version 1.6, 25 May 2010
************************************************************************
-->
<!--
INSDSeq provides the elements of a sequence as presented in the
GenBank/EMBL/DDBJ-style flatfile formats, with a small amount of
additional structure.
Although this single perspective of the three flatfile formats
provides a useful simplification, it hides to some extent the
details of the actual data underlying those formats. Nevertheless,
the XML version of INSD-Seq is being provided with
the hopes that it will prove useful to those who bulk-process
sequence data at the flatfile-format level of detail. Further
documentation regarding the content and conventions of those formats
can be found at:
URLs for the DDBJ, EMBL, and GenBank Feature Table Document:
http://www.ddbj.nig.ac.jp/FT/full_index.html
http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html
URLs for DDBJ, EMBL, and GenBank Release Notes :
ftp://ftp.ddbj.nig.ac.jp/database/ddbj/ddbjrel.txt
http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
Because INSDSeq is a compromise, a number of pragmatic decisions have
been made:
In pursuit of simplicity and familiarity a number of fields do not
have full substructure defined here where there is already a
standard flatfile format string. For example:
Dates: DD-MON-YYYY (eg 10-JUN-2003)
Author: LastName, Initials (eg Smith, J.N.)
or Lastname Initials (eg Smith J.N.)
Journal: JournalName Volume (issue), page-range (year)
or JournalName Volume(issue):page-range(year)
eg Appl. Environ. Microbiol. 61 (4), 1646-1648 (1995)
Appl. Environ. Microbiol. 61(4):1646-1648(1995).
FeatureLocations are representated as in the flatfile feature table,
but FeatureIntervals may also be provided as a convenience
FeatureQualifiers are represented as in the flatfile feature table.
Primary has a string that represents a table to construct
a third party (TPA) sequence.
other-seqids can have strings with the "vertical bar format" sequence
identifiers used in BLAST for example, when they are non-INSD types.
Currently in flatfile format you only see Accession numbers, but there
are others, like patents, submitter clone names, etc which will
appear here
There are also a number of elements that could have been more exactly
specified, but in the interest of simplicity have been simply left as
optional. For example:
All publicly accessible sequence records in INSDSeq format will
include accession and accession.version. However, these elements are
optional in optional in INSDSeq so that this format can also be used
for non-public sequence data, prior to the assignment of accessions and
version numbers. In such cases, records will have only "other-seqids".
sequences will normally all have "sequence" filled in. But contig records
will have a "join" statement in the "contig" slot, and no "sequence".
We also may consider a retrieval option with no sequence of any kind
and no feature table to quickly check minimal values.
Four (optional) elements are specific to records represented via the EMBL
sequence database: INSDSeq_update-release, INSDSeq_create-release,
INSDSeq_entry-version, and INSDSeq_database-reference.
One (optional) element is specific to records originating at the GenBank
and DDBJ sequence databases: INSDSeq_segment.
********
-->
<!ELEMENT INSDSet (INSDSeq*)>
<!ELEMENT INSDSeq (
INSDSeq_locus,
INSDSeq_length,
INSDSeq_strandedness?,
INSDSeq_moltype,
INSDSeq_topology?,
INSDSeq_division,
INSDSeq_update-date,
INSDSeq_create-date?,
INSDSeq_update-release?,
INSDSeq_create-release?,
INSDSeq_definition,
INSDSeq_primary-accession?,
INSDSeq_entry-version?,
INSDSeq_accession-version?,
INSDSeq_other-seqids?,
INSDSeq_secondary-accessions?,
INSDSeq_project?,
INSDSeq_keywords?,
INSDSeq_segment?,
INSDSeq_source?,
INSDSeq_organism?,
INSDSeq_taxonomy?,
INSDSeq_references?,
INSDSeq_comment?,
INSDSeq_comment-set?,
INSDSeq_struc-comments?,
INSDSeq_primary?,
INSDSeq_source-db?,
INSDSeq_database-reference?,
INSDSeq_feature-table?,
INSDSeq_feature-set?,
INSDSeq_sequence?,
INSDSeq_contig?,
INSDSeq_alt-seq?)>
<!ELEMENT INSDSeq_locus (#PCDATA)>
<!ELEMENT INSDSeq_length (%INTEGER;)>
<!ELEMENT INSDSeq_strandedness (#PCDATA)>
<!ELEMENT INSDSeq_moltype (#PCDATA)>
<!ELEMENT INSDSeq_topology (#PCDATA)>
<!ELEMENT INSDSeq_division (#PCDATA)>
<!ELEMENT INSDSeq_update-date (#PCDATA)>
<!ELEMENT INSDSeq_create-date (#PCDATA)>
<!ELEMENT INSDSeq_update-release (#PCDATA)>
<!ELEMENT INSDSeq_create-release (#PCDATA)>
<!ELEMENT INSDSeq_definition (#PCDATA)>
<!ELEMENT INSDSeq_primary-accession (#PCDATA)>
<!ELEMENT INSDSeq_entry-version (#PCDATA)>
<!ELEMENT INSDSeq_accession-version (#PCDATA)>
<!ELEMENT INSDSeq_other-seqids (INSDSeqid*)>
<!ELEMENT INSDSeq_secondary-accessions (INSDSecondary-accn*)>
<!ELEMENT INSDSeq_project (#PCDATA)>
<!ELEMENT INSDSeq_keywords (INSDKeyword*)>
<!ELEMENT INSDSeq_segment (#PCDATA)>
<!ELEMENT INSDSeq_source (#PCDATA)>
<!ELEMENT INSDSeq_organism (#PCDATA)>
<!ELEMENT INSDSeq_taxonomy (#PCDATA)>
<!ELEMENT INSDSeq_references (INSDReference*)>
<!ELEMENT INSDSeq_comment (#PCDATA)>
<!ELEMENT INSDSeq_comment-set (INSDComment*)>
<!ELEMENT INSDSeq_struc-comments (INSDStrucComment*)>
<!ELEMENT INSDSeq_primary (#PCDATA)>
<!ELEMENT INSDSeq_source-db (#PCDATA)>
<!ELEMENT INSDSeq_database-reference (#PCDATA)>
<!ELEMENT INSDSeq_feature-table (INSDFeature*)>
<!ELEMENT INSDSeq_feature-set (INSDFeatureSet*)>
<!-- Optional for contig, wgs, etc. -->
<!ELEMENT INSDSeq_sequence (#PCDATA)>
<!ELEMENT INSDSeq_contig (#PCDATA)>
<!ELEMENT INSDSeq_alt-seq (INSDAltSeqData*)>
<!ELEMENT INSDSeqid (#PCDATA)>
<!ELEMENT INSDSecondary-accn (#PCDATA)>
<!ELEMENT INSDKeyword (#PCDATA)>
<!--
INSDReference_position contains a string value indicating the
basepair span(s) to which a reference applies. The allowable
formats are:
X..Y : Where X and Y are integers separated by two periods,
X >= 1 , Y <= sequence length, and X <= Y
Multiple basepair spans can exist, separated by a
semi-colon and a space. For example : 10..20; 100..500
sites : The string literal 'sites', indicating that a reference
provides sequence annotation information, but the specific
basepair spans are either not captured, or were too numerous
to record.
The 'sites' literal string is singly occuring, and
cannot be used in conjunction with any X..Y basepair spans.
References that lack an INSDReference_position element apply
to the entire sequence.
-->
<!ELEMENT INSDReference (
INSDReference_reference,
INSDReference_position?,
INSDReference_authors?,
INSDReference_consortium?,
INSDReference_title?,
INSDReference_journal,
INSDReference_xref?,
INSDReference_pubmed?,
INSDReference_remark?)>
<!ELEMENT INSDReference_reference (#PCDATA)>
<!ELEMENT INSDReference_position (#PCDATA)>
<!ELEMENT INSDReference_authors (INSDAuthor*)>
<!ELEMENT INSDReference_consortium (#PCDATA)>
<!ELEMENT INSDReference_title (#PCDATA)>
<!ELEMENT INSDReference_journal (#PCDATA)>
<!ELEMENT INSDReference_xref (INSDXref*)>
<!ELEMENT INSDReference_pubmed (%INTEGER;)>
<!ELEMENT INSDReference_remark (#PCDATA)>
<!ELEMENT INSDAuthor (#PCDATA)>
<!--
INSDXref provides a method for referring to records in
other databases. INSDXref_dbname is a string value that
provides the name of the database, and INSDXref_dbname
is a string value that provides the record's identifier
in that database.
-->
<!ELEMENT INSDXref (
INSDXref_dbname,
INSDXref_id)>
<!ELEMENT INSDXref_dbname (#PCDATA)>
<!ELEMENT INSDXref_id (#PCDATA)>
<!ELEMENT INSDComment (
INSDComment_type?,
INSDComment_paragraphs)>
<!ELEMENT INSDComment_type (#PCDATA)>
<!ELEMENT INSDComment_paragraphs (INSDCommentParagraph*)>
<!ELEMENT INSDCommentParagraph (
INSDCommentParagraph_items)>
<!ELEMENT INSDCommentParagraph_items (INSDCommentItem*)>
<!ELEMENT INSDCommentItem (
INSDCommentItem_value?,
INSDCommentItem_url?)>
<!ELEMENT INSDCommentItem_value (#PCDATA)>
<!ELEMENT INSDCommentItem_url (#PCDATA)>
<!ELEMENT INSDStrucComment (
INSDStrucComment_name?,
INSDStrucComment_items)>
<!ELEMENT INSDStrucComment_name (#PCDATA)>
<!ELEMENT INSDStrucComment_items (INSDStrucCommentItem*)>
<!ELEMENT INSDStrucCommentItem (
INSDStrucCommentItem_tag?,
INSDStrucCommentItem_value?,
INSDStrucCommentItem_url?)>
<!ELEMENT INSDStrucCommentItem_tag (#PCDATA)>
<!ELEMENT INSDStrucCommentItem_value (#PCDATA)>
<!ELEMENT INSDStrucCommentItem_url (#PCDATA)>
<!--
INSDFeature_operator contains a string value describing
the relationship among a set of INSDInterval within
INSDFeature_intervals. The allowable formats are:
join : The string literal 'join' indicates that the
INSDInterval intervals are biologically joined
together into a contiguous molecule.
order : The string literal 'order' indicates that the
INSDInterval intervals are in the presented
order, but they are not necessarily contiguous.
Either 'join' or 'order' is required if INSDFeature_intervals
is comprised of more than one INSDInterval .
-->
<!ELEMENT INSDFeatureSet (
INSDFeatureSet_annot-source?,
INSDFeatureSet_features)>
<!ELEMENT INSDFeatureSet_annot-source (#PCDATA)>
<!ELEMENT INSDFeatureSet_features (INSDFeature*)>
<!ELEMENT INSDFeature (
INSDFeature_key,
INSDFeature_location,
INSDFeature_intervals?,
INSDFeature_operator?,
INSDFeature_partial5?,
INSDFeature_partial3?,
INSDFeature_quals?,
INSDFeature_xrefs?)>
<!ELEMENT INSDFeature_key (#PCDATA)>
<!ELEMENT INSDFeature_location (#PCDATA)>
<!ELEMENT INSDFeature_intervals (INSDInterval*)>
<!ELEMENT INSDFeature_operator (#PCDATA)>
<!ELEMENT INSDFeature_partial5 EMPTY>
<!ATTLIST INSDFeature_partial5 value ( true | false ) #REQUIRED >
<!ELEMENT INSDFeature_partial3 EMPTY>
<!ATTLIST INSDFeature_partial3 value ( true | false ) #REQUIRED >
<!ELEMENT INSDFeature_quals (INSDQualifier*)>
<!ELEMENT INSDFeature_xrefs (INSDXref*)>
<!--
INSDInterval_iscomp is a boolean indicating whether
an INSDInterval_from / INSDInterval_to location
represents a location on the complement strand.
When INSDInterval_iscomp is TRUE, it essentially
confirms that a 'from' value which is greater than
a 'to' value is intentional, because the location
is on the opposite strand of the presented sequence.
INSDInterval_interbp is a boolean indicating whether
a feature (such as a restriction site) is located
between two adjacent basepairs. When INSDInterval_iscomp
is TRUE, the 'from' and 'to' values must differ by
exactly one base.
-->
<!ELEMENT INSDInterval (
INSDInterval_from?,
INSDInterval_to?,
INSDInterval_point?,
INSDInterval_iscomp?,
INSDInterval_interbp?,
INSDInterval_accession)>
<!ELEMENT INSDInterval_from (%INTEGER;)>
<!ELEMENT INSDInterval_to (%INTEGER;)>
<!ELEMENT INSDInterval_point (%INTEGER;)>
<!ELEMENT INSDInterval_iscomp EMPTY>
<!ATTLIST INSDInterval_iscomp value ( true | false ) #REQUIRED >
<!ELEMENT INSDInterval_interbp EMPTY>
<!ATTLIST INSDInterval_interbp value ( true | false ) #REQUIRED >
<!ELEMENT INSDInterval_accession (#PCDATA)>
<!ELEMENT INSDQualifier (
INSDQualifier_name,
INSDQualifier_value?)>
<!ELEMENT INSDQualifier_name (#PCDATA)>
<!ELEMENT INSDQualifier_value (#PCDATA)>
<!ELEMENT INSDAltSeqData (
INSDAltSeqData_name,
INSDAltSeqData_items?)>
<!--
e.g., CON-division-join, WGS-contig-range,
WGS-scaffold-range, MGA/CAGE-range, genome
-->
<!ELEMENT INSDAltSeqData_name (#PCDATA)>
<!ELEMENT INSDAltSeqData_items (INSDAltSeqItem*)>
<!ELEMENT INSDAltSeqItem (
INSDAltSeqItem_interval?,
INSDAltSeqItem_isgap?,
INSDAltSeqItem_gap-length?,
INSDAltSeqItem_gap-type?,
INSDAltSeqItem_gap-linkage?,
INSDAltSeqItem_gap-comment?,
INSDAltSeqItem_first-accn?,
INSDAltSeqItem_last-accn?,
INSDAltSeqItem_value?)>
<!--
INSDInterval_iscomp is a boolean indicating whether
an INSDInterval_from / INSDInterval_to location
represents a location on the complement strand.
When INSDInterval_iscomp is TRUE, it essentially
confirms that a 'from' value which is greater than
a 'to' value is intentional, because the location
is on the opposite strand of the presented sequence.
INSDInterval_interbp is a boolean indicating whether
a feature (such as a restriction site) is located
between two adjacent basepairs. When INSDInterval_iscomp
is TRUE, the 'from' and 'to' values must differ by
exactly one base.
-->
<!ELEMENT INSDAltSeqItem_interval (INSDInterval)>
<!ELEMENT INSDAltSeqItem_isgap EMPTY>
<!ATTLIST INSDAltSeqItem_isgap value ( true | false ) #REQUIRED >
<!ELEMENT INSDAltSeqItem_gap-length (%INTEGER;)>
<!ELEMENT INSDAltSeqItem_gap-type (#PCDATA)>
<!ELEMENT INSDAltSeqItem_gap-linkage (#PCDATA)>
<!ELEMENT INSDAltSeqItem_gap-comment (#PCDATA)>
<!ELEMENT INSDAltSeqItem_first-accn (#PCDATA)>
<!ELEMENT INSDAltSeqItem_last-accn (#PCDATA)>
<!ELEMENT INSDAltSeqItem_value (#PCDATA)>
|