File: diff_mean_test.Rd

package info (click to toggle)
r-cran-sctransform 0.4.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 600 kB
  • sloc: cpp: 323; sh: 13; makefile: 2
file content (101 lines) | stat: -rw-r--r-- 4,326 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/differential_expression.R
\name{diff_mean_test}
\alias{diff_mean_test}
\title{Non-parametric differential expression test for sparse non-negative data}
\usage{
diff_mean_test(
  y,
  group_labels,
  compare = "each_vs_rest",
  R = 99,
  log2FC_th = log2(1.2),
  mean_th = 0.05,
  cells_th = 5,
  only_pos = FALSE,
  only_top_n = NULL,
  mean_type = "geometric",
  verbosity = 1
)
}
\arguments{
\item{y}{A matrix of counts; must be (or inherit from) class dgCMatrix; genes are row,
cells are columns}

\item{group_labels}{The group labels (e.g. cluster identities); 
will be converted to factor}

\item{compare}{Specifies which groups to compare, see details; default is 'each_vs_rest'}

\item{R}{The number of random permutations used to derive the p-values; default is 99}

\item{log2FC_th}{Threshold to remove genes from testing; absolute log2FC must be at least
this large for a gene to be tested; default is \code{log2(1.2)}}

\item{mean_th}{Threshold to remove genes from testing; gene mean must be at least this
large for a gene to be tested; default is 0.05}

\item{cells_th}{Threshold to remove genes from testing; gene must be detected (non-zero count)
in at least this many cells in the group with higher mean; default is 5}

\item{only_pos}{Test only genes with positive fold change (mean in group 1 > mean in group2); 
default is FALSE}

\item{only_top_n}{Test only the this number of genes from both ends of the log2FC spectrum
after all of the above filters have been applied; useful to get only the top markers; 
only used if set to a numeric value; default is NULL}

\item{mean_type}{Which type of mean to use; if \code{'geometric'} (default) the geometric mean is
used; to avoid \code{log(0)} we use \code{log1p} to add 1 to all counts and log-transform, 
calculate the arithmetic mean, and then back-transform and subtract 1 using \code{exp1m}; if
this parameter is set to \code{'arithmetic'} the data is used as is}

\item{verbosity}{Integer controlling how many messages the function prints; 
0 is silent, 1 (default) is not}
}
\value{
Data frame of results
}
\description{
Non-parametric differential expression test for sparse non-negative data
}
\section{Details}{

This model-free test is applied to each gene (row) individually but is
optimized to make use of the efficient sparse data representation of
the input. A permutation null distribution us used to assess the 
significance of the observed difference in mean between two groups.

The observed difference in mean is compared against a distribution
obtained by random shuffling of the group labels. For each gene every 
random permutation yields a difference in mean and from the population of
these background differences we estimate a mean and standard
deviation for the null distribution. 
This mean and standard deviation are used to turn the observed
difference in mean into a z-score and then into a p-value. Finally,
all p-values (for the tested genes) are adjusted using the Benjamini & Hochberg
method (fdr). The log2FC values in the output are \code{log2(mean1 / mean2)}.
Empirical p-values are also calculated: \code{emp_pval = (b + 1) / (R + 1)}
where b is the number of times the absolute difference in mean from a random 
permutation is at least as large as the absolute value of the observed difference
in mean, R is the number of random permutations. This is an upper bound of
the real empirical p-value that would be obtained by enumerating all possible
group label permutations.

There are multiple ways the group comparisons can be specified based on the compare
parameter. The default, \code{'each_vs_rest'}, does multiple comparisons, one per 
group vs all remaining cells. \code{'all_vs_all'}, also does multiple comparisons, 
covering all groups pairs. If compare is set to a length two character vector, e.g.
\code{c('T-cells', 'B-cells')}, one comparison between those two groups is done.
To put multiple groups on either side of a single comparison, use a list of length two. 
E.g. \code{compare = list(c('cluster1', 'cluster5'), c('cluster3'))}.
}

\examples{
\donttest{
clustering <- 1:ncol(pbmc) \%\% 2
vst_out <- vst(pbmc, return_corrected_umi = TRUE)
de_res <- diff_mean_test(y = vst_out$umi_corrected, group_labels = clustering)
}

}