1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
|
\name{stackStringsFromBam}
\alias{stackStringsFromBam}
\title{Stack the read sequences stored in a BAM file on a region of interest}
\description{
\code{stackStringsFromBam} lays the read sequences (or their quality
strings) stored in a BAM file alongside the reference space, and stacks
them on the specified region.
}
\usage{
stackStringsFromBam(file, index=file, param,
what="seq", use.names=FALSE,
D.letter="-", N.letter=".",
Lpadding.letter="+", Rpadding.letter="+")
}
\arguments{
\item{file, index}{
The path to the BAM file to read, and to the index file of the BAM file
to read, respectively. The latter is given \emph{without} the '.bai'
extension. See \code{\link{scanBam}} for more information.
}
\item{param}{
A \link{ScanBamParam} object containing exactly 1 genomic region
(i.e. \code{unlist(bamWhich(param))} must have length 1).
Alternatively, \code{param} can be a \link[GenomicRanges]{GRanges} or
\link[IRanges]{RangesList} object containing exactly 1 genomic region,
or a character string specifying a single genomic region (in the
\code{"chr14:5201-5300"} format).
}
\item{what}{
A single string. Either \code{"seq"} or \code{"qual"}. If \code{"seq"}
(the default), the read sequences will be stacked. If \code{"qual"},
the read quality strings will be stacked.
}
\item{use.names}{
Use the query template names (QNAME field) as the names of the returned
object? If not (the default), then the returned object has no names.
}
\item{D.letter, N.letter}{
A single letter used as a filler for injections. The 2 arguments are
passed down to the \code{\link{sequenceLayer}} function.
See \code{?\link{sequenceLayer}} for more details.
}
\item{Lpadding.letter, Rpadding.letter}{
A single letter to use for padding the sequences on the left, and another
one to use for padding on the right. The 2 arguments are passed down to
the \code{\link[Biostrings]{stackStrings}} function defined in the
\pkg{Biostrings} package.
See \code{?\link[Biostrings]{stackStrings}} in the \pkg{Biostrings}
package for more details.
}
}
\details{
\code{stackStringsFromBam} performs the 3 following steps:
\enumerate{
\item Load the read sequences (or their quality strings) from the BAM
file. Only the read sequences that overlap with the specified region
are loaded. This is done with the
\code{\link{readGAlignmentsFromBam}} function. Note that if the
file contains paired-end reads, the pairing is ignored.
\item Lay the sequences alongside the reference space, using their CIGARs.
This is done with the \code{\link{sequenceLayer}} function.
\item Stack them on the specified region. This is done with the
\code{\link[Biostrings]{stackStrings}} function defined in the
\pkg{Biostrings} package.
}
}
\value{
A rectangular (i.e. constant-width) \link[Biostrings]{DNAStringSet} object
(if \code{what} is \code{"seq"}) or \link[Biostrings]{BStringSet} object
(if \code{what} is \code{"qual"}).
}
\note{
TWO IMPORTANT CAVEATS:
Specifying a big genomic region, say >= 100000 bp, can require a lot of
memory (especially with high coverage reads) and is not recommended.
See the \code{\link{pileLettersAt}} function for piling the read letters
on top of a set of individual genomic positions, which is more
flexible and more memory efficient.
Paired-end reads are treated as single-end reads (i.e. they're not paired).
}
\author{H. Pages}
\seealso{
\itemize{
\item The \code{\link{pileLettersAt}} function for piling the letters
of a set of aligned reads on top of a set of individual genomic
positions.
\item The \code{\link{readGAlignmentsFromBam}} function for loading read
sequences (or their quality strings) from a BAM file (via a
\link{GAlignments} object).
\item The \code{\link{sequenceLayer}} function for laying read sequences
alongside the reference space, using their CIGARs.
\item The \code{\link[Biostrings]{stackStrings}} function in the
\pkg{Biostrings} package for stacking an arbitrary
\link[Biostrings]{XStringSet} object.
\item The SAMtools mpileup command available at
\url{http://samtools.sourceforge.net/} as part of the
SAMtools project.
}
}
\examples{
## ---------------------------------------------------------------------
## A. EXAMPLE WITH TOY DATA
## ---------------------------------------------------------------------
bamfile <- BamFile(system.file("extdata", "ex1.bam", package="Rsamtools"))
stackStringsFromBam(bamfile, param=GRanges("seq1", IRanges(1, 60)))
options(showHeadLines=18)
options(showTailLines=6)
stackStringsFromBam(bamfile, param=GRanges("seq1", IRanges(61, 120)))
stacked_qseq <- stackStringsFromBam(bamfile, param="seq2:1509-1519")
stacked_qseq # deletion in read 13
stackStringsFromBam(bamfile, param="seq2:1509-1519", what="qual")
consensusMatrix(stacked_qseq)
## ---------------------------------------------------------------------
## B. EXAMPLE WITH REAL DATA
## ---------------------------------------------------------------------
library(RNAseqData.HNRNPC.bam.chr14)
bamfile <- BamFile(RNAseqData.HNRNPC.bam.chr14_BAMFILES[1])
## My Region Of Interest:
my_ROI <- GRanges("chr14", IRanges(19650095, 19650159))
readGAlignments(bamfile, param=ScanBamParam(which=my_ROI))
stackStringsFromBam(bamfile, param=my_ROI)
}
\keyword{methods}
\keyword{manip}
|