1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/getTopHVGs.R
\name{getTopHVGs}
\alias{getTopHVGs}
\title{Identify HVGs}
\usage{
getTopHVGs(
stats,
var.field = "bio",
n = NULL,
prop = NULL,
var.threshold = 0,
fdr.field = "FDR",
fdr.threshold = NULL,
row.names = !is.null(rownames(stats))
)
}
\arguments{
\item{stats}{A \linkS4class{DataFrame} of variance modelling statistics with one row per gene.
Alternatively, a \linkS4class{SummarizedExperiment} object, in which case it is supplied to \code{\link{modelGeneVar}} to generate the required DataFrame.}
\item{var.field}{String specifying the column of \code{stats} containing the relevant metric of variation.}
\item{n}{Integer scalar specifying the number of top HVGs to report.}
\item{prop}{Numeric scalar specifying the proportion of genes to report as HVGs.}
\item{var.threshold}{Numeric scalar specifying the minimum threshold on the metric of variation.}
\item{fdr.field}{String specifying the column of \code{stats} containing the adjusted p-values.
If \code{NULL}, no filtering is performed on the FDR.}
\item{fdr.threshold}{Numeric scalar specifying the FDR threshold.}
\item{row.names}{Logical scalar indicating whether row names should be reported.}
}
\value{
A character vector containing the names of the most variable genes, if \code{row.names=TRUE}.
Otherwise, an integer vector specifying the indices of \code{stats} containing the most variable genes.
}
\description{
Define a set of highly variable genes, based on variance modelling statistics
from \code{\link{modelGeneVar}} or related functions.
}
\details{
This function will identify all genes where the relevant metric of variation is greater than \code{var.threshold}.
By default, this means that we retain all genes with positive values in the \code{var.field} column of \code{stats}.
If \code{var.threshold=NULL}, the minimum threshold on the value of the metric is not applied.
If \code{fdr.threshold} is specified, we further subset to genes that have FDR less than or equal to \code{fdr.threshold}.
By default, FDR thresholding is turned off as \code{\link{modelGeneVar}} and related functions
determine significance of large variances \emph{relative} to other genes.
This can be overly conservative if many genes are highly variable.
If \code{n=NULL} and \code{prop=NULL}, the resulting subset of genes is directly returned.
Otherwise, the top set of genes with the largest values of the variance metric are returned,
where the size of the set is defined as the larger of \code{n} and \code{prop*nrow(stats)}.
}
\examples{
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)
stats <- modelGeneVar(sce)
str(getTopHVGs(stats))
str(getTopHVGs(stats, fdr.threshold=0.05)) # more stringent
# Or directly pass in the SingleCellExperiment:
str(getTopHVGs(sce))
# Alternatively, use with the coefficient of variation:
stats2 <- modelGeneCV2(sce)
str(getTopHVGs(stats2, var.field="ratio"))
}
\seealso{
\code{\link{modelGeneVar}} and friends, to generate \code{stats}.
\code{\link{modelGeneCV2}} and friends, to also generate \code{stats}.
}
\author{
Aaron Lun
}
|