File: assignTaxonomy.Rd

package info (click to toggle)
r-bioc-dada2 1.34.0%2Bdfsg-2
links: PTS, VCS
area: main
in suites: sid, trixie
size: 3,016 kB
sloc: cpp: 3,096; makefile: 5
file content (76 lines) | stat: -rw-r--r-- 3,270 bytes
parent folder | download | duplicates (3)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/taxonomy.R
\name{assignTaxonomy}
\alias{assignTaxonomy}
\title{Classifies sequences against reference training dataset.}
\usage{
assignTaxonomy(
  seqs,
  refFasta,
  minBoot = 50,
  tryRC = FALSE,
  outputBootstraps = FALSE,
  taxLevels = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
  multithread = FALSE,
  verbose = FALSE
)
}
\arguments{
\item{seqs}{(Required). A character vector of the sequences to be assigned, or an object 
coercible by \code{\link{getUniques}}.}

\item{refFasta}{(Required). The path to the reference fasta file, or an 
R connection Can be compressed.
This reference fasta file should be formatted so that the id lines correspond to the
taxonomy (or classification) of the associated sequence, and each taxonomic level is 
separated by a semicolon. Eg.

 >Kingom;Phylum;Class;Order;Family;Genus;   
 ACGAATGTGAAGTAA......}

\item{minBoot}{(Optional). Default 50. 
The minimum bootstrap confidence for assigning a taxonomic level.}

\item{tryRC}{(Optional). Default FALSE. 
If TRUE, the reverse-complement of each sequences will be used for classification if it is a better match to the reference
sequences than the forward sequence.}

\item{outputBootstraps}{(Optional). Default FALSE.
If TRUE, bootstrap values will be retained in an integer matrix. A named list containing the assigned taxonomies (named "taxa") 
and the bootstrap values (named "boot") will be returned. Minimum bootstrap confidence filtering still takes place,
to see full taxonomy set minBoot=0}

\item{taxLevels}{(Optional). Default is c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species").
The taxonomic levels being assigned. Truncates if deeper levels not present in
training fasta.}

\item{multithread}{(Optional). Default is FALSE.
If TRUE, multithreading is enabled and the number of available threads is automatically determined.   
If an integer is provided, the number of threads to use is set by passing the argument on to
\code{\link{setThreadOptions}}.}

\item{verbose}{(Optional). Default FALSE.
If TRUE, print status to standard output.}
}
\value{
A character matrix of assigned taxonomies exceeding the minBoot level of
  bootstrapping confidence. Rows correspond to the provided sequences, columns to the
  taxonomic levels. NA indicates that the sequence was not consistently classified at
  that level at the minBoot threshhold.
  
  If outputBootstraps is TRUE, a named list containing the assigned taxonomies (named "taxa") 
  and the bootstrap values (named "boot") will be returned.
}
\description{
assignTaxonomy implements the RDP Naive Bayesian Classifier algorithm described in
Wang et al. Applied and Environmental Microbiology 2007, with kmer size 8 and 100 bootstrap
replicates. Properly formatted reference files for several popular taxonomic databases
are available \url{http://benjjneb.github.io/dada2/training.html}
}
\examples{
seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqs, training_fasta)
taxa80 <- assignTaxonomy(seqs, training_fasta, minBoot=80, multithread=2)

}