File: learnErrors.Rd

package info (click to toggle)
r-bioc-dada2 1.34.0%2Bdfsg-2
links: PTS, VCS
area: main
in suites: sid, trixie
size: 3,016 kB
sloc: cpp: 3,096; makefile: 5
file content (100 lines) | stat: -rw-r--r-- 4,393 bytes
parent folder | download | duplicates (3)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/errorModels.R
\name{learnErrors}
\alias{learnErrors}
\title{Learns the error rates from an input list, or vector, of file names or a list of \code{\link{derep-class}} objects.}
\usage{
learnErrors(
  fls,
  nbases = 1e+08,
  nreads = NULL,
  errorEstimationFunction = loessErrfun,
  multithread = FALSE,
  randomize = FALSE,
  MAX_CONSIST = 10,
  OMEGA_C = 0,
  qualityType = "Auto",
  verbose = FALSE,
  ...
)
}
\arguments{
\item{fls}{(Required). \code{character}.
The file path(s) to the fastq file(s), or a directory containing fastq file(s).
Compressed file formats such as .fastq.gz and .fastq.bz2 are supported.
A list of \code{\link{derep-class}} ojects can also be provided.}

\item{nbases}{(Optional). Default 1e8.
The minimum number of total bases to use for error rate learning. Samples are read into memory
until at least this number of total bases has been reached, or all provided samples have been
read in.}

\item{nreads}{(Optional). Default NULL. DEPRECATED.
Please update your code to use the nbases parameter.}

\item{errorEstimationFunction}{(Optional). Function. Default \code{\link{loessErrfun}}.

 \code{errorEstimationFunction} is computed on the matrix of observed transitions
 after each sample inference step in order to generate the new matrix of estimated error rates.}

\item{multithread}{(Optional). Default is FALSE.
If TRUE, multithreading is enabled and the number of available threads is automatically determined.   
If an integer is provided, the number of threads to use is set by passing the argument on to
\code{\link{setThreadOptions}}.}

\item{randomize}{(Optional). Default FALSE.
If FALSE, samples are read in the provided order until enough reads are obtained.
If TRUE, samples are picked at random from those provided.}

\item{MAX_CONSIST}{(Optional). Default 10.
The maximum number of times to step through the self-consistency loop. If convergence was not
reached in MAX_CONSIST steps, the estimated error rates in the last step are returned.}

\item{OMEGA_C}{(Optional). Default 0.
The threshold at which unique sequences inferred to contain errors are corrected in the final output,
 and used to estimate the error rates (see more at \code{\link{setDadaOpt}}). For reasons of convergence,
 and because it is more conservative, it is recommended to set this value to 0, which means that all
 reads are counted and contribute to estimating the error rates.}

\item{qualityType}{(Optional). \code{character(1)}.
The quality encoding of the fastq file(s). "Auto" (the default) means to
attempt to auto-detect the encoding. This may fail for PacBio files with
uniformly high quality scores, in which case use "FastqQuality". This
parameter is passed on to \code{\link[ShortRead]{readFastq}}; see
information there for details.}

\item{verbose}{(Optional). Default TRUE 
 Print verbose text output. More fine-grained control is available by providing an integer argument.
\itemize{ 
 \item{0: Silence. No text output (same as FALSE).}
 \item{1: Basic text output (same as TRUE). }
 \item{2: Detailed text output, mostly intended for debugging. }
}}

\item{...}{(Optional). Additional arguments will be passed on to the \code{\link{dada}} function.}
}
\value{
A named list with three entries:
 $err_out: A numeric matrix with the learned error rates.
 $err_in: The initialization error rates (unimportant).
 $trans: A feature table of observed transitions for each type (eg. A->C) and quality score.
}
\description{
Error rates are learned by alternating between sample inference and error rate estimation 
 until convergence. Sample inferences is performed by the \code{\link{dada}} function.
 Error rate estimation is performed by \code{errorEstimationFunction}.
 The output of this function serves as input to the dada function call as the \code{err} parameter.
}
\examples{
 fl1 <- system.file("extdata", "sam1F.fastq.gz", package="dada2")
 fl2 <- system.file("extdata", "sam2F.fastq.gz", package="dada2")
 err <- learnErrors(c(fl1, fl2))
 err <- learnErrors(c(fl1, fl2), nbases=5000000, randomize=TRUE)
 # Using a list of derep-class objects
 dereps <- derepFastq(c(fl1, fl2))
 err <- learnErrors(dereps, multithread=TRUE, randomize=TRUE, MAX_CONSIST=20)

}
\seealso{
\code{\link{derepFastq}}, \code{\link{plotErrors}}, \code{\link{loessErrfun}}, \code{\link{dada}}
}