1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
|
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/coassignProb.R
\name{coassignProb}
\alias{coassignProb}
\title{Compute coassignment probabilities}
\usage{
coassignProb(ref, alt, summarize = FALSE)
}
\arguments{
\item{ref}{A character vector or factor containing one set of groupings, considered to be the reference.}
\item{alt}{A character vector or factor containing another set of groupings, to be compared to \code{alt}.}
\item{summarize}{Logical scalar indicating whether the output matrix should be converted into a per-label summary.}
}
\value{
If \code{summarize=FALSE}, a numeric matrix is returned with upper triangular entries filled with the coassignment probabilities for each pair of labels in \code{ref}.
Otherwise, a \linkS4class{DataFrame} is returned with one row per label in \code{ref} containing the \code{self} and \code{other} coassignment probabilities.
}
\description{
Compute coassignment probabilities for each label in a reference grouping when compared to an alternative grouping of samples.
This is now deprecated for \code{\link{pairwiseRand}}.
}
\details{
The coassignment probability for each pair of labels in \code{ref} is the probability that a randomly chosen cell from each of the two reference labels will have the same label in \code{alt}.
High coassignment probabilities indicate that a particular pair of labels in \code{ref} are frequently assigned to the same label in \code{alt}, which has some implications for cluster stability.
When \code{summarize=TRUE}, we summarize the matrix of coassignment probabilities into a set of per-label values.
The \dQuote{self} coassignment probability is simply the diagonal entry of the matrix, i.e., the probability that two cells from the same label in \code{ref} also have the same label in \code{alt}.
The \dQuote{other} coassignment probability is the maximum probability across all pairs involving that label.
% One might consider instead reporting the 'other' probability as the probability that a randomly chosen cell in the cluster and a randomly chosen cell in any other cluster belong in the same cluster.
% However, this results in very small probabilities in all cases, simply because most of the other clusters are well seperated.
% Reporting the maximum is more useful as at least you can tell that a cluster is well-separated from _all_ other clusters if it has a low 'other' probability.
In general, \code{ref} is well-recapitulated by \code{alt} if the diagonal entries of the matrix is much higher than the sum of the off-diagonal entries.
This manifests as higher values for the self probabilities compared to the other probabilities.
Note that the coassignment probability is closely related to the Rand index-based ratios
broken down by cluster pair in \code{\link{pairwiseRand}} with \code{mode="ratio"} and \code{adjusted=FALSE}.
The off-diagonal coassignment probabilities are simply 1 minus the off-diagonal ratio,
while the on-diagonal values differ only by the lack of consideration of pairs of the same cell in \code{\link{pairwiseRand}}.
}
\examples{
library(scuttle)
sce <- mockSCE(ncells=200)
sce <- logNormCounts(sce)
clust1 <- kmeans(t(logcounts(sce)),3)$cluster
clust2 <- kmeans(t(logcounts(sce)),5)$cluster
coassignProb(clust1, clust2)
coassignProb(clust1, clust2, summarize=TRUE)
}
\seealso{
\code{\link{bootstrapCluster}}, to compute coassignment probabilities across bootstrap replicates.
\code{\link{pairwiseRand}}, for another way to compare different clusterings.
}
\author{
Aaron Lun
}
|