File: stackStringsFromBam.Rd

package info (click to toggle)
r-bioc-genomicalignments 1.0.6-1
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 2,980 kB
  • ctags: 54
  • sloc: ansic: 1,493; makefile: 4; sh: 3
file content (154 lines) | stat: -rw-r--r-- 5,631 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
\name{stackStringsFromBam}

\alias{stackStringsFromBam}


\title{Stack the read sequences stored in a BAM file on a region of interest}

\description{
  \code{stackStringsFromBam} lays the read sequences (or their quality
  strings) stored in a BAM file alongside the reference space, and stacks
  them on the specified region.
}

\usage{
stackStringsFromBam(file, index=file, param,
                    what="seq", use.names=FALSE,
                    D.letter="-", N.letter=".",
                    Lpadding.letter="+", Rpadding.letter="+")
}

\arguments{
  \item{file, index}{
    The path to the BAM file to read, and to the index file of the BAM file
    to read, respectively. The latter is given \emph{without} the '.bai'
    extension. See \code{\link{scanBam}} for more information.
  }
  \item{param}{
    A \link{ScanBamParam} object containing exactly 1 genomic region
    (i.e. \code{unlist(bamWhich(param))} must have length 1).
    Alternatively, \code{param} can be a \link[GenomicRanges]{GRanges} or
    \link[IRanges]{RangesList} object containing exactly 1 genomic region,
    or a character string specifying a single genomic region (in the
    \code{"chr14:5201-5300"} format).
  }
  \item{what}{
    A single string. Either \code{"seq"} or \code{"qual"}. If \code{"seq"}
    (the default), the read sequences will be stacked. If \code{"qual"},
    the read quality strings will be stacked.
  }
  \item{use.names}{
    Use the query template names (QNAME field) as the names of the returned
    object? If not (the default), then the returned object has no names.
  }
  \item{D.letter, N.letter}{
    A single letter used as a filler for injections. The 2 arguments are
    passed down to the \code{\link{sequenceLayer}} function.
    See \code{?\link{sequenceLayer}} for more details.
  }
  \item{Lpadding.letter, Rpadding.letter}{
    A single letter to use for padding the sequences on the left, and another
    one to use for padding on the right. The 2 arguments are passed down to
    the \code{\link[Biostrings]{stackStrings}} function defined in the
    \pkg{Biostrings} package.
    See \code{?\link[Biostrings]{stackStrings}} in the \pkg{Biostrings}
    package for more details.
  }
}

\details{
  \code{stackStringsFromBam} performs the 3 following steps:
  \enumerate{
    \item Load the read sequences (or their quality strings) from the BAM
          file. Only the read sequences that overlap with the specified region
          are loaded. This is done with the
          \code{\link{readGAlignmentsFromBam}} function. Note that if the
          file contains paired-end reads, the pairing is ignored.

    \item Lay the sequences alongside the reference space, using their CIGARs.
          This is done with the \code{\link{sequenceLayer}} function.

    \item Stack them on the specified region. This is done with the
          \code{\link[Biostrings]{stackStrings}} function defined in the
          \pkg{Biostrings} package.
  }
}

\value{
  A rectangular (i.e. constant-width) \link[Biostrings]{DNAStringSet} object
  (if \code{what} is \code{"seq"}) or \link[Biostrings]{BStringSet} object
  (if \code{what} is \code{"qual"}).
}

\note{
  TWO IMPORTANT CAVEATS:

  Specifying a big genomic region, say >= 100000 bp, can require a lot of
  memory (especially with high coverage reads) and is not recommended.
  See the \code{\link{pileLettersAt}} function for piling the read letters
  on top of a set of individual genomic positions, which is more
  flexible and more memory efficient.

  Paired-end reads are treated as single-end reads (i.e. they're not paired).
}

\author{H. Pages}

\seealso{
  \itemize{
    \item The \code{\link{pileLettersAt}} function for piling the letters
          of a set of aligned reads on top of a set of individual genomic
          positions.

    \item The \code{\link{readGAlignmentsFromBam}} function for loading read
          sequences (or their quality strings) from a BAM file (via a
          \link{GAlignments} object).

    \item The \code{\link{sequenceLayer}} function for laying read sequences
          alongside the reference space, using their CIGARs.

    \item The \code{\link[Biostrings]{stackStrings}} function in the
          \pkg{Biostrings} package for stacking an arbitrary
          \link[Biostrings]{XStringSet} object.

    \item The SAMtools mpileup command available at
          \url{http://samtools.sourceforge.net/} as part of the
          SAMtools project.
  }
}

\examples{
## ---------------------------------------------------------------------
## A. EXAMPLE WITH TOY DATA
## ---------------------------------------------------------------------

bamfile <- BamFile(system.file("extdata", "ex1.bam", package="Rsamtools"))

stackStringsFromBam(bamfile, param=GRanges("seq1", IRanges(1, 60)))

options(showHeadLines=18)
options(showTailLines=6)
stackStringsFromBam(bamfile, param=GRanges("seq1", IRanges(61, 120)))

stacked_qseq <- stackStringsFromBam(bamfile, param="seq2:1509-1519")
stacked_qseq  # deletion in read 13

stackStringsFromBam(bamfile, param="seq2:1509-1519", what="qual")
consensusMatrix(stacked_qseq)

## ---------------------------------------------------------------------
## B. EXAMPLE WITH REAL DATA
## ---------------------------------------------------------------------

library(RNAseqData.HNRNPC.bam.chr14)
bamfile <- BamFile(RNAseqData.HNRNPC.bam.chr14_BAMFILES[1])

## My Region Of Interest:
my_ROI <- GRanges("chr14", IRanges(19650095, 19650159))

readGAlignments(bamfile, param=ScanBamParam(which=my_ROI))
stackStringsFromBam(bamfile, param=my_ROI)
}

\keyword{methods}
\keyword{manip}