File: correlateNull.Rd

package info (click to toggle)
r-bioc-scran 1.26.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 1,692 kB
  • sloc: cpp: 733; makefile: 2
file content (79 lines) | stat: -rw-r--r-- 3,196 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/correlateNull.R
\name{correlateNull}
\alias{correlateNull}
\title{Build null correlations}
\usage{
correlateNull(
  ncells,
  iters = 1e+06,
  block = NULL,
  design = NULL,
  equiweight = TRUE,
  BPPARAM = SerialParam()
)
}
\arguments{
\item{ncells}{An integer scalar indicating the number of cells in the data set.}

\item{iters}{An integer scalar specifying the number of values in the null distribution.}

\item{block}{A factor specifying the blocking level for each cell.}

\item{design}{A numeric design matrix containing uninteresting factors to be ignored.}

\item{equiweight}{A logical scalar indicating whether statistics from each block should be given equal weight.
Otherwise, each block is weighted according to its number of cells.
Only used if \code{block} is specified.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object that specifies the manner of parallel processing to use.}
}
\value{
A numeric vector of length \code{iters} is returned containing the sorted correlations under the null hypothesis of no correlations.
}
\description{
Build a distribution of correlations under the null hypothesis of independent expression between pairs of genes.
This is now deprecated as \code{\link{correlatePairs}} uses an approximation instead.
}
\details{
The \code{correlateNull} function constructs an empirical null distribution for Spearman's rank correlation when it is computed with \code{ncells} cells.
This is done by shuffling the ranks, calculating the correlation and repeating until \code{iters} values are obtained.
No consideration is given to tied ranks, which has implications for the accuracy of p-values in \code{\link{correlatePairs}}.

If \code{block} is specified, a null correlation is created within each level of \code{block} using the shuffled ranks.
The final correlation is then defined as the average of the per-level correlations, 
weighted by the number of cells in that level if \code{equiweight=FALSE}.
Levels with fewer than 3 cells are ignored, and if no level has 3 or more cells, all returned correlations will be \code{NA}.

If \code{design} is specified, the same process is performed on ranks derived from simulated residuals computed by fitting the linear model to a vector of normally distributed values.
If there are not at least 3 residual d.f., all returned correlations will be \code{NA}.
The \code{design} argument cannot be used at the same time as \code{block}.

% Yeah, we could use a t-distribution for this, but the empirical distribution is probably more robust if you have few cells (or effects, after batch correction).
}
\examples{
set.seed(0)
ncells <- 100

# Simplest case:
null.dist <- correlateNull(ncells, iters=10000)
hist(null.dist)

# With a blocking factor:
block <- sample(LETTERS[1:3], ncells, replace=TRUE)
null.dist <- correlateNull(block=block, iters=10000)
hist(null.dist)

# With a design matrix.
cov <- runif(ncells)
X <- model.matrix(~cov)
null.dist <- correlateNull(design=X, iters=10000)
hist(null.dist)

}
\seealso{
\code{\link{correlatePairs}}, where the null distribution is used to compute p-values.
}
\author{
Aaron Lun
}