File: MALDIquant-intro.Rnw

package info (click to toggle)
r-cran-maldiquant 1.19.3-1
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 2,864 kB
  • sloc: ansic: 315; sh: 10; makefile: 4
file content (338 lines) | stat: -rw-r--r-- 11,654 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
%\VignetteEngine{knitr}
%\VignetteIndexEntry{MALDIquant: Quantitative Analysis of Mass Spectrometry Data}
%\VignetteKeywords{Bioinformatics, Proteomics, Mass Spectrometry}
%\VignettePackage{MALDIquant}

\documentclass[12pt]{article}

\usepackage{natbib}
\usepackage{hyperref}
\usepackage{tikz}
\usepackage{bibentry}       % inline bibentries
\nobibliography*            % no special bibliography for bibentry

\newcommand{\R}{\texttt{R}}
\newcommand{\CRAN}{\texttt{CRAN}}
\newcommand{\Rfunction}[1]{{\texttt{#1}}}
\newcommand{\Robject}[1]{{\texttt{#1}}}
\newcommand{\Rpackage}[1]{{\texttt{#1}}}
\newcommand{\Mq}{\Rpackage{MALDIquant}}
\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}}

\newcommand{\workflow}{
  \tikzstyle{line} = [draw, -latex]
  \tikzstyle{mqbox}=[rectangle, draw, text width=8.5em, text centered,
             minimum height=2.0em, fill=white]

  \begin{tikzpicture}[node distance=3.2em]
  % nodes
  \node [mqbox] (di) {Data Import};
  \node [mqbox, below of=di] (qc) {Quality Control};
  \node [mqbox, below of=qc] (vs) {Transformation \& Smoothing};
  \node [mqbox, below of=vs] (bc) {Baseline Correction};
  \node [mqbox, below of=bc] (cb) {Intensity Calibration};
  \node [mqbox, below of=cb] (sa) {Spectra Alignment};
  \node [mqbox, below of=sa] (pd) {Peak Detection};
  \node [mqbox, below of=pd] (pb) {Peak Binning};
  \node [mqbox, below of=pb] (fm) {Feature Matrix};

  % edges
  \path [line] (di) -- (qc);
  \path [line] (qc) -- (vs);
  \path [line] (vs) -- (bc);
  \path [line] (bc) -- (cb);
  \path [line] (cb) -- (sa);
  \path [line] (sa) -- (pd);
  \path [line] (pd) -- (pb);
  \path [line] (pb) -- (fm);

  \end{tikzpicture}
}


\title{\Mq{}: Quantitative Analysis of Mass Spectrometry Data}

\author{
  Sebastian Gibb%
  \thanks{\email{mail@sebastiangibb.de}}
}
\date{\today}

\begin{document}

<<setup, include=FALSE, cache=FALSE>>=
library("knitr")
opts_chunk$set(tidy.opts=list(width.cutoff=45), tidy=FALSE, fig.align="center",
               fig.height=4.25, comment=NA, prompt=TRUE)
@
<<env, echo=FALSE>>=
suppressPackageStartupMessages(library("MALDIquant"))
@

\maketitle

\begin{abstract}
  \Mq{} provides a complete analysis pipeline for MALDI-TOF and other
  2D mass spectrometry data.\\
  This vignette describes the usage of the \Mq{} package and guides the user
  through a typical preprocessing workflow.
\end{abstract}

\clearpage

\tableofcontents

\section*{Foreword}

\Mq{} is free and open source software for the \R{} \citep{RPROJECT}
environment and under active development.
If you use it, please support the project by citing it in publications:

\begin{quote}
  \bibentry{MALDIquant}
\end{quote}

If you have any questions, bugs, or suggestions do not hesitate to contact
me (\email{mail@sebastiangibb.de}). \\
Please visit \url{http://strimmerlab.org/software/maldiquant/}.

\section{Introduction}

\Mq{} comprising all steps from importing of raw data, preprocessing
(e.g. baseline removal), peak detection and non-linear peak alignment to
calibration of mass spectra.

\Mq{} was initially developed for clinical proteomics using
Matrix-Assisted Laser Desorption/Ionization (MALDI) technology.
However, the algorithms implemented in \Mq{} are generic and may be equally
applied to other 2D mass spectrometry data.

\Mq{} was carefully designed to be independent of any specific mass spectrometry
hardware. Nonetheless, a lot of open and native file formates, e.g. binary data
files from Bruker flex series instruments, mzXML, mzML, etc. are supported
through the associated \R{} package \Rpackage{MALDIquantForeign}.

\section{Setup}

After starting \R{} we could install \Mq{} and \Rpackage{MALDIquantForeign}
directly from \CRAN{} using \Rfunction{install.packages}:
<<mqsetup, eval=FALSE>>=
install.packages(c("MALDIquant", "MALDIquantForeign"))
@
Before we can use \Mq{} we have to load the package.
<<mqlibrary, eval=FALSE>>=
library("MALDIquant")
@

\section{MALDIquant objects}
\Mq{} is written in an object-oriented programming approach and uses
\R{}'s S4 objects. A spectrum is represented by an \Robject{MassSpectrum}
and a list of peaks by an \Robject{MassPeaks} instance. To create such objects
manually we could use \Rfunction{createMassSpectrum} and
\Rfunction{createMassPeaks}. In general we do not need these functions because
\Rpackage{MALDIquantForeign's} import routines will generate the
\Robject{MassSpectrum}/\Robject{MassPeaks} objects.
<<mqobjects>>=
s <- createMassSpectrum(mass=1:10, intensity=1:10,
                        metaData=list(name="Spectrum1"))
s
@
Each \Robject{MassSpectrum}/\Robject{MassPeaks} stores the mass and intensity
values of a spectrum respective of the peaks. Additionally they contain a list
of metadata. To access these information we use \Rfunction{mass},
\Rfunction{intensity} and \Rfunction{metaData}.
<<mqaccess>>=
mass(s)
intensity(s)
metaData(s)
@

\section{Workflow}

A Mass Spectrometry Analysis often follows the same workflow (see also Fig.
\ref{fig:workflow}). After importing the raw data
(see also the \Rpackage{MALDIquantForeign} package) we control the quality of
the spectra and draw some plots. We apply a variance-stabilizing transformation
and smoothing filter. Next we remove the chemical background using a
Baseline Correction method. To compare the intensities across spectra
we calibrate the intensity values (often called normalization) and the mass
values (warping, alignment). Subsequently we perfom a Peak Detection and do some
post processing like filtering etc.

\begin{figure}[htbp]
  \centering
    \begin{small}
    \workflow{}
  \caption{MS Analysis Workflow}
  \label{fig:workflow}
    \end{small}
\end{figure}

\clearpage
\subsection{Data Import}

Normally we will use some of the import methods provided by
\Rpackage{MALDIquantForeign}, e.g. \Rfunction{importBrukerFlex},
\Rfunction{importMzMl}, etc.
But in this vignette we will use a small example dataset
shipped with \Mq{}. This dataset is a
subset of MALDI-TOF data described in \cite{Fiedler2009}.
<<mqdataimport>>=
data(fiedler2009subset)
@
\Robject{fiedler2009subset} is a \Robject{list} of 16 \Robject{MassSpectrum}
objects. The 16 spectra are 8 biological samples with 2 technical replicates.
<<mqdataimport2>>=
length(fiedler2009subset)
fiedler2009subset[1:2]
@

\subsection{Quality Control}

For a basic quality control we test whether all spectra contain the same number
of data points and are not empty.
<<mqqclength>>=
any(sapply(fiedler2009subset, isEmpty))
table(sapply(fiedler2009subset, length))
@

Subsequently we control the mass difference between each data point
(should be equal or monotonically increasing) because
\Mq{} is designed for profile data and not for centroided data.
<<mqqcregular>>=
all(sapply(fiedler2009subset, isRegular))
@
Finally we draw some plots and inspect the spectra visually.
<<mqqcplots>>=
plot(fiedler2009subset[[1]])
plot(fiedler2009subset[[16]])
@

\subsection{Variance Stabilization}
We use the square root transformation to simplify graphical visualization and
to overcome the potential dependency of the variance from the mean.
<<mqvs>>=
spectra <- transformIntensity(fiedler2009subset,
                              method="sqrt")
@
\subsection{Smoothing}
Next we use a 21 point Savitzky-Golay-Filter \citep{Savitzky1964}
to smooth the spectra.
<<mqsm>>=
spectra <- smoothIntensity(spectra, method="SavitzkyGolay",
                           halfWindowSize=10)
@

\subsection{Baseline Correction}
Before we correct the baseline we visualize it. Here we use the SNIP algorithm
\citep{Ryan1988}.
<<mqve>>=
baseline <- estimateBaseline(spectra[[16]], method="SNIP",
                             iterations=100)
plot(spectra[[16]])
lines(baseline, col="red", lwd=2)
@
If we are satisfied with our estimated baseline we remove it.
<<mqbc>>=
spectra <- removeBaseline(spectra, method="SNIP",
                          iterations=100)
plot(spectra[[1]])
@

\subsection{Intensity Calibration/Normalization}
For better comparison and to overcome (very) small batch effects we equalize the
intensity values using the Total-Ion-Current-Calibration (often called
normalization).
<<mqcb>>=
spectra <- calibrateIntensity(spectra, method="TIC")
@

\subsection{Warping/Alignment}
Now we (re)calibrate the mass values. Our
alignment procedure is a peak based warping algorithm. If you need a finer
control or want to investigate the impact of different parameters please use
\Rfunction{determineWarpingFunctions} instead of the easier
\Rfunction{alignSpectra}.
<<mqpa>>=
spectra <- alignSpectra(spectra,
                        halfWindowSize=20,
                        SNR=2,
                        tolerance=0.002,
                        warpingMethod="lowess")
@
Before we call the Peak Detection we want to average the technical replicates.
Therefore we look for the sample name that is stored in the metadata because
each technical replicate has the same sample name.
<<mqav1>>=
samples <- factor(sapply(spectra,
                         function(x)metaData(x)$sampleName))
@
Next we use \Rfunction{averageMassSpectra} to create a mean spectrum for each
biological sample.
<<mqav2>>=
avgSpectra <- averageMassSpectra(spectra, labels=samples,
                                 method="mean")
@

\subsection{Peak Detection}
The next crucial step is the Peak Detection. Before we perform the peak
detection algorithm we estimate the noise of the spectra to get a feeling for
the signal-to-noise ratio.
<<mqpd1>>=
noise <- estimateNoise(avgSpectra[[1]])
plot(avgSpectra[[1]], xlim=c(4000, 5000), ylim=c(0, 0.002))
lines(noise, col="red")
lines(noise[,1], noise[, 2]*2, col="blue")
@
We decide to use a signal-to-noise ratio of 2 (blue line).
<<mqpd2>>=
peaks <- detectPeaks(avgSpectra, method="MAD",
                     halfWindowSize=20, SNR=2)
plot(avgSpectra[[1]], xlim=c(4000, 5000), ylim=c(0, 0.002))
points(peaks[[1]], col="red", pch=4)
@
\subsection{Peak Binning}
After the alignment the peak positions (mass) are very similar but not
identical. The binning is needed to make similar peak mass values identical.
<<mqpb>>=
peaks <- binPeaks(peaks, tolerance=0.002)
@
\subsection{Feature Matrix}
We choose a very low signal-to-noise ratio to keep as much features as possible.
To remove some false positive peaks we remove less frequent peaks.
<<mqfp>>=
peaks <- filterPeaks(peaks, minFrequency=0.25)
@
At the end of the analysis we create a feature matrix that could be used in
further statistical analysis. Please note that missing values (not detected
peaks) are imputed/interpolated from the corresponding spectrum.
<<mqim>>=
featureMatrix <- intensityMatrix(peaks, avgSpectra)
head(featureMatrix[, 1:3])
@

\section{Summary}
We shortly described a complete example workflow of a mass spectrometry data
analysis. Please note that this workflow is only an example and could not cover
every use case.\\
\Mq{} provides a lot of more functions than we mentioned in this vignette. The
described functions are the most used ones but they have a lot of more
parameters which could/need adjust to your data (e.g. \Robject{halfWindowSize},
\Robject{SNR}, \Robject{tolerance}, etc.). That's why we suggest the
user to read the manual pages of theses functions carefully.\\
We also provide more examples in the demo directory and at:
\begin{quote}
\url{http://strimmerlab.org/software/maldiquant/}
\end{quote}
Please do not hesitate to contact me (\email{mail@sebastiangibb.de}) if you have
any questions.

\section{Session Information}
<<sessioninfo, echo=FALSE, results="asis">>=
toLatex(sessionInfo(), locale=FALSE)
@

\bibliographystyle{apalike}
\bibliography{bibliography}

\end{document}