File: removePrimers.Rd

package info (click to toggle)
r-bioc-dada2 1.34.0%2Bdfsg-2
links: PTS, VCS
area: main
in suites: sid, trixie
size: 3,016 kB
sloc: cpp: 3,096; makefile: 5
file content (88 lines) | stat: -rw-r--r-- 3,465 bytes
parent folder | download | duplicates (3)
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/filter.R
\name{removePrimers}
\alias{removePrimers}
\title{Removes primers and orients reads in a consistent direction.}
\usage{
removePrimers(
  fn,
  fout,
  primer.fwd,
  primer.rev = NULL,
  max.mismatch = 2,
  allow.indels = FALSE,
  trim.fwd = TRUE,
  trim.rev = TRUE,
  orient = TRUE,
  compress = TRUE,
  verbose = FALSE
)
}
\arguments{
\item{fn}{(Required). \code{character}.
The path(s) to the input fastq file(s). Can be compressed.}

\item{fout}{(Required). \code{character}.
The path(s) to the output fastq file(s) corresponding to the \code{fwd} input files.
If directory containing the file does not exist, it will be created.
Output files are gzip compressed by default.}

\item{primer.fwd}{(Required). \code{character}.
The forward primer sequence expected to be at the beginning of the sequenced amplicon.
Can contain IUPAC ambiguous nucleotide codes.}

\item{primer.rev}{(Optional). Default NULL.
The reverse primer sequence expected to be at the end of the sequenced amplicon.
Can contain IUPAC ambiguous nucleotide codes.
NOTE: `primer.rev` should be provided in the orientation that would appear in a DNA sequence
starting at the forward primer and being read towards the reverse primer. Thus, it is
often necessary to reverse-complement the reverse primer sequence before providing it to
this function.}

\item{max.mismatch}{(Optional). Default 2.
The number of mismatches to tolerate when matching reads to primer sequences.
See \code{\link[Biostrings]{vmatchPattern}} for details.}

\item{allow.indels}{(Optional). Default FALSE.
If TRUE, indels ared allowed when matching the primer sequences to the read. If FALSE,
no indels are allowed. Note that when `allow.indels=TRUE`, primer matching is significantly
slower, currently about 4x slower.}

\item{trim.fwd}{(Optional). Default TRUE.
If TRUE, reads are trimmed to the end of the forward primer, i.e. the forward
primer and any preceding sequence are trimmed off.}

\item{trim.rev}{(Optional). Default TRUE.
If TRUE, reads are trimmed to the beginning of the reverse primer, i.e. the reverse
primer and any subsequent sequence are trimmed off.}

\item{orient}{(Optional). Default TRUE.
If TRUE, reads are re-oriented if the reverse complement of the read is a better match to the
provided primer sequence(s). This is recommended for PacBio CCS reads, which come in a random
mix of forward and reverse-complement orientations.}

\item{compress}{(Optional). Default TRUE.
If TRUE, the output fastq file(s) are gzipped.}

\item{verbose}{(Optional). Default FALSE.
Whether to output status messages.}
}
\value{
Integer matrix. Returned invisibly (i.e. only if assigned to something).
 Rows correspond to the input files, columns record the number of reads.in and reads.out after
 discarding reads that didn't match the provided primers.
}
\description{
Removes primer(s) and orients the reads in input fastq file(s) (can be compressed).
Reads that do not contain the primer(s) are discarded.
Intended for use with PacBio CCS data.
Faster external solutions such as cutadapt or trimmomatic are recommended for short-read data.
}
\examples{
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
fn <- system.file("extdata", "samPBprimers.fastq.gz", package="dada2")
fn.noprime <- tempfile(fileext=".fastq.gz")
removePrimers(fn, fn.noprime, primer.fwd=F27, primer.rev=rc(R1492), orient=TRUE, verbose=TRUE)

}