File: run_ora.Rd

package info (click to toggle)
r-bioc-decoupler 2.12.0%2Bdfsg-2
links: PTS, VCS
area: main
in suites: sid, trixie
size: 2,612 kB
sloc: makefile: 5
file content (137 lines) | stat: -rw-r--r-- 5,485 bytes
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/statistic-ora.R
\name{run_ora}
\alias{run_ora}
\title{Over Representation Analysis (ORA)}
\usage{
run_ora(
  mat,
  network,
  .source = source,
  .target = target,
  n_up = ceiling(0.05 * nrow(mat)),
  n_bottom = 0,
  n_background = 20000,
  with_ties = TRUE,
  seed = 42,
  minsize = 5,
  ...
)
}
\arguments{
\item{mat}{Matrix to evaluate (e.g. expression matrix).
Target nodes in rows and conditions in columns.
\code{rownames(mat)} must have at least one intersection with the elements
in \code{network} \code{.target} column.}

\item{network}{Tibble or dataframe with edges and it's associated metadata.}

\item{.source}{Column with source nodes.}

\item{.target}{Column with target nodes.}

\item{n_up}{Integer indicating the number of top targets to slice from mat.}

\item{n_bottom}{Integer indicating the number of bottom targets to slice from
mat.}

\item{n_background}{Integer indicating the background size of the sliced
targets. If not specified the number of background targets is determined by
the total number of unique targets in the union of \code{mat} and \code{network}.}

\item{with_ties}{Should ties be kept together? The default, \code{TRUE},
may return more rows than you request. Use \code{FALSE} to ignore ties,
and return the first \code{n} rows.}

\item{seed}{A single value, interpreted as an integer, or NULL for random
number generation.}

\item{minsize}{Integer indicating the minimum number of targets per source.}

\item{...}{
  Arguments passed on to \code{\link[stats:fisher.test]{stats::fisher.test}}
  \describe{
    \item{\code{workspace}}{an integer specifying the size of the workspace
    used in the network algorithm.  In units of 4 bytes.  Only used for
    non-simulated p-values larger than \eqn{2 \times 2}{2 by 2} tables.
    Since \R version 3.5.0, this also increases the internal stack size
    which allows larger problems to be solved, however sometimes needing
    hours.  In such cases, \code{simulate.p.values=TRUE} may be more
    reasonable.}
    \item{\code{hybrid}}{a logical. Only used for larger than \eqn{2 \times 2}{2 by 2}
    tables, in which cases it indicates whether the exact probabilities
    (default) or a hybrid approximation thereof should be computed.}
    \item{\code{hybridPars}}{a numeric vector of length 3, by default describing
    \dQuote{Cochran's conditions} for the validity of the chisquare
    approximation, see \sQuote{Details}.}
    \item{\code{control}}{a list with named components for low level algorithm
    control.  At present the only one used is \code{"mult"}, a positive
    integer \eqn{\ge 2} with default 30 used only for larger than
    \eqn{2 \times 2}{2 by 2} tables.  This says how many times as much
    space should be allocated to paths as to keys: see file
    \file{fexact.c} in the sources of this package.}
    \item{\code{or}}{the hypothesized odds ratio.  Only used in the
    \eqn{2 \times 2}{2 by 2} case.}
    \item{\code{alternative}}{indicates the alternative hypothesis and must be
    one of \code{"two.sided"}, \code{"greater"} or \code{"less"}.
    You can specify just the initial letter.  Only used in the
    \eqn{2 \times 2}{2 by 2} case.}
    \item{\code{conf.int}}{logical indicating if a confidence interval for the
    odds ratio in a \eqn{2 \times 2}{2 by 2} table should be
    computed (and returned).}
    \item{\code{conf.level}}{confidence level for the returned confidence
    interval.  Only used in the \eqn{2 \times 2}{2 by 2} case and if
    \code{conf.int = TRUE}.}
    \item{\code{simulate.p.value}}{a logical indicating whether to compute
    p-values by Monte Carlo simulation, in larger than \eqn{2 \times
      2}{2 by 2} tables.}
    \item{\code{B}}{an integer specifying the number of replicates used in the
    Monte Carlo test.}
  }}
}
\value{
A long format tibble of the enrichment scores for each source
across the samples. Resulting tibble contains the following columns:
\enumerate{
\item \code{statistic}: Indicates which method is associated with which score.
\item \code{source}: Source nodes of \code{network}.
\item \code{condition}: Condition representing each column of \code{mat}.
\item \code{score}: Regulatory activity (enrichment score).
}
}
\description{
Calculates regulatory activities using ORA.
}
\details{
ORA measures the overlap between the target feature set and a list of most
altered molecular features in mat. The most altered molecular features can
be selected from the top and or bottom of the molecular readout distribution,
by default it is the top 5\% positive values. With these, a contingency table
is build and a one-tailed Fisher’s exact test is computed to determine if a
regulator’s set of features are over-represented in the selected features
from the data. The resulting score, \code{ora}, is the minus log10 of the
obtained p-value.
}
\examples{
inputs_dir <- system.file("testdata", "inputs", package = "decoupleR")

mat <- readRDS(file.path(inputs_dir, "mat.rds"))
net <- readRDS(file.path(inputs_dir, "net.rds"))

run_ora(mat, net, minsize=0)
}
\seealso{
Other decoupleR statistics: 
\code{\link{decouple}()},
\code{\link{run_aucell}()},
\code{\link{run_fgsea}()},
\code{\link{run_gsva}()},
\code{\link{run_mdt}()},
\code{\link{run_mlm}()},
\code{\link{run_udt}()},
\code{\link{run_ulm}()},
\code{\link{run_viper}()},
\code{\link{run_wmean}()},
\code{\link{run_wsum}()}
}
\concept{decoupleR statistics}