File: mergePairs.Rd

package info (click to toggle)
r-bioc-dada2 1.34.0%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 3,016 kB
  • sloc: cpp: 3,096; makefile: 5
file content (111 lines) | stat: -rw-r--r-- 5,454 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/paired.R
\name{mergePairs}
\alias{mergePairs}
\title{Merge denoised forward and reverse reads.}
\usage{
mergePairs(
  dadaF,
  derepF,
  dadaR,
  derepR,
  minOverlap = 12,
  maxMismatch = 0,
  returnRejects = FALSE,
  propagateCol = character(0),
  justConcatenate = FALSE,
  trimOverhang = FALSE,
  verbose = FALSE,
  ...
)
}
\arguments{
\item{dadaF}{(Required). A \code{\link{dada-class}} object, or a list of such objects.
The \code{\link{dada-class}} object(s) generated by denoising the forward reads.}

\item{derepF}{(Required). \code{character} or \code{\link{derep-class}}.
The file path(s) to the fastq file(s), or a directory containing fastq file(s) corresponding to the
the forward reads of the samples to be merged. Compressed file formats such as .fastq.gz and .fastq.bz2 are supported.
A \code{\link{derep-class}} object (or list thereof) returned by \code{link{derepFastq}} can also be provided.
These \code{\link{derep-class}} object(s) or fastq files should correspond to those used 
as input to the the \code{\link{dada}} function when denoising the forward reads.}

\item{dadaR}{(Required). A \code{\link{dada-class}} object, or a list of such objects.
The \code{\link{dada-class}} object(s) generated by denoising the reverse reads.}

\item{derepR}{(Required). \code{character} or \code{\link{derep-class}}.
The file path(s) to the fastq file(s), or a directory containing fastq file(s) corresponding to the
the reverse reads of the samples to be merged. Compressed file formats such as .fastq.gz and .fastq.bz2 are supported.
A \code{\link{derep-class}} object (or list thereof) returned by \code{link{derepFastq}} can also be provided.
These \code{\link{derep-class}} object(s) or fastq files should correspond to those used 
as input to the the \code{\link{dada}} function when denoising the reverse reads.}

\item{minOverlap}{(Optional). Default 12.
The minimum length of the overlap required for merging the forward and reverse reads.}

\item{maxMismatch}{(Optional). Default 0. 
The maximum mismatches allowed in the overlap region.}

\item{returnRejects}{(Optional). Default FALSE.
If TRUE, the pairs that that were rejected based on mismatches in the overlap
region are retained in the return \code{data.frame}.}

\item{propagateCol}{(Optional). \code{character}. Default \code{character(0)}.
The return data.frame will include values from columns in the $clustering \code{data.frame}
of the provided \code{\link{dada-class}} objects with the provided names.}

\item{justConcatenate}{(Optional). Default FALSE.
If TRUE, the forward and reverse-complemented reverse read are concatenated rather than merged,
  with a NNNNNNNNNN (10 Ns) spacer inserted between them.}

\item{trimOverhang}{(Optional). Default FALSE.
If TRUE, "overhangs" in the alignment between the forwards and reverse read are trimmed off.
"Overhangs" are when the reverse read extends past the start of the forward read, and vice-versa,
as can happen when reads are longer than the amplicon and read into the other-direction primer region.}

\item{verbose}{(Optional). Default FALSE. 
If TRUE, a summary of the function results are printed to standard output.}

\item{...}{(Optional). Further arguments to pass on to \code{\link{nwalign}}.
By default, \code{mergePairs} uses alignment parameters that hevaily penalizes mismatches and gaps
when aligning the forward and reverse sequences.}
}
\value{
A \code{data.frame}, or a list of \code{data.frames}. 

The return \code{data.frame}(s) has a row for each unique pairing of forward/reverse denoised sequences, 
and the following columns:
\itemize{
 \item{\code{$abundance}: Number of reads corresponding to this forward/reverse combination.}
 \item{\code{$sequence}: The merged sequence.}
 \item{\code{$forward}: The index of the forward denoised sequence.}
 \item{\code{$reverse}: The index of the reverse denoised sequence.}
 \item{\code{$nmatch}: Number of matches nts in the overlap region.}
 \item{\code{$nmismatch}: Number of mismatches in the overlap region.}
 \item{\code{$nindel}: Number of indels in the overlap region.}
 \item{\code{$prefer}: The sequence used for the overlap region. 1=forward; 2=reverse.}
 \item{\code{$accept}: TRUE if overlap between forward and reverse denoised sequences was at least 
               \code{minOverlap} and had at most \code{maxMismatch} differences. FALSE otherwise.}
 \item{\code{$...}: Additional columns specified in \code{propagateCol}.}
}
A list of data.frames are returned if a list of input objects was provided.
}
\description{
This function attempts to merge each denoised pair of forward and reverse reads, 
rejecting any pairs which do not sufficiently overlap or which contain too many 
(>0 by default) mismatches in the overlap region. Note: This function assumes that 
the fastq files for the forward and reverse reads were in the same order.
}
\examples{
fnF <- system.file("extdata", "sam1F.fastq.gz", package="dada2")
fnR = system.file("extdata", "sam1R.fastq.gz", package="dada2")
dadaF <- dada(fnF, selfConsist=TRUE)
dadaR <- dada(fnR, selfConsist=TRUE)
merger <- mergePairs(dadaF, fnF, dadaR, fnR)
merger <- mergePairs(dadaF, fnF, dadaR, fnR, returnRejects=TRUE, propagateCol=c("n0", "birth_ham"))
merger <- mergePairs(dadaF, fnF, dadaR, fnR, justConcatenate=TRUE)

}
\seealso{
\code{\link{derepFastq}}, \code{\link{dada}}, \code{\link{fastqPairedFilter}}
}