File: obitaxonomy.rst

package info (click to toggle)
obitools 1.2.13%2Bdfsg-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 4,652 kB
  • sloc: python: 18,199; ansic: 1,542; makefile: 98
file content (26 lines) | stat: -rw-r--r-- 1,823 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
The OBITools formatted taxonomy
===============================

Management of the taxonomy
--------------------------

Filtering and annotation steps in the processing of DNA metabarcoding sequence data are greatly 
eased by the explicit association of taxonomic information to sequences together with an easy 
access to the taxonomy. Taxonomic information, including a taxonomic identifier, can thus be 
stored in the set of attributes of each sequence record. Specifically, the `taxid` attribute 
is used by the OBITools when querying taxonomic information of a sequence record, nevertheless 
several OBITools commands can annotate sequence records with taxonomy-related attributes for 
the user's convenience. The value of the `taxid` attribute must be a unique integer referring 
unambiguously to one taxon in the taxonomic associated database (note that a taxon can be any node 
in the taxonomic tree). Although this is not mandatory, the NCBI taxonomy is a preferred source of 
taxonomic information as the OBITools provide commands to easily extract the full taxonomic 
information from it. The command `obitaxonomy` is useful to build a taxonomic database in the 
OBITools format from a dump of the NCBI taxonomic database (downloadable at the following 
URL: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz). Moreover, the `obitaxonomy` command can 
enrich an existing taxonomy with private taxa, therefore enabling to associate sequence records to 
taxa not initially present in the reference taxonomic database. As the OBITools have access to the 
full taxonomic tree topology, they are able to inform higher taxonomic levels from a taxon identifier 
(e.g. the family, order, class, phylum, etc. corresponding to a genus) leading to efficient and 
simple annotation and querying of taxonomic information.