File: apply.Rmd

package info (click to toggle)
r-bioc-singlecellexperiment 1.28.1%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 640 kB
  • sloc: makefile: 2
file content (172 lines) | stat: -rw-r--r-- 6,541 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Applying a function over a SingleCellExperiment's contents"
author: 
- name: Aaron Lun
  email: infinite.monkeys.with.keyboards@gmail.com
package: SingleCellExperiment
output:
  BiocStyle::html_document:
    toc_float: true
vignette: >
  %\VignetteIndexEntry{2. Applying over a SingleCellExperiment object}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r options, include=FALSE, echo=FALSE}
library(BiocStyle)
knitr::opts_chunk$set(warning=FALSE, error=FALSE, message=FALSE)
```

# Motivation

The `SingleCellExperiment` is quite a complex class that can hold multiple aspects of the same dataset.
It is possible to have multiple assays, multiple dimensionality reduction results, and multiple alternative Experiments - 
each of which can further have multiple assays and `reducedDims`!
In some scenarios, it may be desirable to loop over these pieces and apply the same function to each of them.
This is made conveniently possible via the `applySCE()` framework.

# Quick start

Let's say we have a moderately complicated `SingleCellExperiment` object, 
containing multiple alternative Experiments for different data modalities.

```{r}
library(SingleCellExperiment)
counts <- matrix(rpois(100, lambda = 10), ncol=10, nrow=10)
sce <- SingleCellExperiment(counts)

altExp(sce, "Spike") <- SingleCellExperiment(matrix(rpois(20, lambda = 5), ncol=10, nrow=2))
altExp(sce, "Protein") <- SingleCellExperiment(matrix(rpois(50, lambda = 100), ncol=10, nrow=5))
altExp(sce, "CRISPR") <- SingleCellExperiment(matrix(rbinom(80, p=0.1, 1), ncol=10, nrow=8))

sce
```

Assume that we want to compute the total count for each modality, using the first assay.
We might define a function that looks like the below.
(We will come back to the purpose of `multiplier=` and `subset.row=` later.)

```{r}
totalCount <- function(x, i=1, multiplier=1, subset.row=NULL) {
    mat <- assay(x, i)
    if (!is.null(subset.row)) {
        mat <- mat[subset.row,,drop=FALSE]
    }
    colSums(mat) * multiplier
}
```

We can then easily apply this function across the main and alternative Experiments with:

```{r}
totals <- applySCE(sce, FUN=totalCount)
totals
```

# Design explanation 

The `applySCE()` call above is functionally equivalent to:

```{r}
totals.manual <- list( 
    totalCount(sce),
    Spike=totalCount(altExp(sce, "Spike")),
    Protein=totalCount(altExp(sce, "Protein")),
    CRISPR=totalCount(altExp(sce, "CRISPR"))
)
stopifnot(identical(totals, totals.manual))
```

Besides being more verbose than `applySCE()`, this approach does not deal well with common arguments.
Say we wanted to set `multiplier=10` for all calls.
With the manual approach above, this would involve specifying the argument multiple times:

```{r}
totals10.manual <- list( 
    totalCount(sce, multiplier=10),
    Spike=totalCount(altExp(sce, "Spike"), multiplier=10),
    Protein=totalCount(altExp(sce, "Protein"), multiplier=10),
    CRISPR=totalCount(altExp(sce, "CRISPR"), multiplier=10)
)
```

Whereas with the `applySCE()` approach, we can just set it once.
This makes it easier to change and reduces the possibility of errors when copy-pasting parameter lists across calls.

```{r}
totals10.apply <- applySCE(sce, FUN=totalCount, multiplier=10)
stopifnot(identical(totals10.apply, totals10.manual))
```

Now, one might consider just using `lapply()` in this case, which also avoids the need for repeated specification:

```{r}
totals10.lapply <- lapply(c(List(sce), altExps(sce)),
    FUN=totalCount, multiplier=10)
stopifnot(identical(totals10.apply, totals10.lapply))
```

However, this runs into the opposite problem - it is no longer possible to specify _custom_ arguments for each call.
For example, say we wanted to subset to a different set of features for each main and alternative Experiment.
With `applySCE()`, this is still possible:

```{r}
totals.custom <- applySCE(sce, FUN=totalCount, multiplier=10, 
    ALT.ARGS=list(Spike=list(subset.row=2), Protein=list(subset.row=3:5)))
totals.custom
```

In cases where we have a mix between custom and common arguments, `applySCE()` provides a more convenient and flexible interface than manual calls or `lapply()`ing.

# Simplifying to a `SingleCellExperiment`

The other convenient aspect of `applySCE()` is that, if the specified `FUN=` returns a `SingleCellExperiment`, `applySCE()` will try to format the output as a `SingleCellExperiment`.
To demonstrate, let's use the `head()` function to take the first few features for each main and alternative Experiment:

```{r}
head.sce <- applySCE(sce, FUN=head, n=5)
head.sce
```

Rather than returning a list of `SingleCellExperiment`s, we can see that the output is neatly organized as a `SingleCellExperiment` with the specified `n=5` features.
Moreover, each of the alternative Experiments is also truncated to its first 5 features (or fewer, if there weren't that many to begin with).
This output mirrors, as much as possible, the format of the input `sce`, and is much more convenient to work with than a list of objects.

```{r}
altExp(head.sce)
altExp(head.sce, "Protein")
altExp(head.sce, "CRISPR")
```

To look under the hood, we can turn off simplification and see what happens.
We see that the function indeed returns a list of `SingleCellExperiment` objects corresponding to the `head()` of each Experiment.
When `SIMPLIFY=TRUE`, this list is passed through `simplifyToSCE()` to attempt the reorganization into a single object.

```{r}
head.sce.list <- applySCE(sce, FUN=head, n=5, SIMPLIFY=FALSE) 
head.sce.list
```

For comparison, if we had to do this manually, it would be rather tedious and error-prone,
e.g., if we forgot to set `n=` or if we re-assigned the output of `head()` to the wrong alternative Experiment.

```{r}
manual.head <- head(sce, n=5)
altExp(manual.head, "Spike") <- head(altExp(sce, "Spike"), n=5)
altExp(manual.head, "Protein") <- head(altExp(sce, "Protein"), n=5)
altExp(manual.head, "CRISPR") <- head(altExp(sce, "CRISPR"), n=5)
manual.head
```

Of course, this simplification is only possible when circumstances permit.
It requires that `FUN=` returns a `SingleCellExperiment` at each call, and that no more than one result is generated for each alternative Experiment.
Failure to meet these conditions will result in a warning and a non-simplified output.

Developers may prefer to set `SIMPLIFY=FALSE` and manually call `simplifyToSCE()`, possibly with `warn.level=3` to trigger an explicit error when simplification fails.

# Session information {-}

```{r}
sessionInfo()
```