File: DeMixT_GS.Rd

package info (click to toggle)
r-bioc-demixt 1.14.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 4,300 kB
  • sloc: ansic: 1,591; cpp: 1,209; makefile: 5; sh: 4
file content (160 lines) | stat: -rw-r--r-- 7,128 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DeMixT_GS.R
\name{DeMixT_GS}
\alias{DeMixT_GS}
\title{Estimates the proportions of mixed samples for each mixing component using
profile likelihood gene selection}
\usage{
DeMixT_GS(
  data.Y,
  data.N1,
  data.N2 = NULL,
  niter = 10,
  nbin = 50,
  if.filter = TRUE,
  filter.sd = 0.5,
  ngene.Profile.selected = NA,
  ngene.selected.for.pi = NA,
  mean.diff.in.CM = 0.25,
  nspikein = min(200, ceiling(ncol(data.Y) * 0.3)),
  tol = 10^(-5),
  pi01 = NULL,
  pi02 = NULL,
  nthread = parallel::detectCores() - 1
)
}
\arguments{
\item{data.Y}{A SummarizedExperiment object of expression data from mixed
tumor samples. It is a \eqn{G} by \eqn{My} matrix where \eqn{G} is the
number of genes and \eqn{My} is the number of mixed samples. Samples with
the same tissue type should be placed together in columns.}

\item{data.N1}{A SummarizedExperiment object of expression data from
reference component 1 (e.g., normal). It is a \eqn{G} by \eqn{M1} matrix
where \eqn{G} is the number of genes and \eqn{M1} is the number of samples
for component 1.}

\item{data.N2}{A SummarizedExperiment object of expression data from
additional reference samples. It is a \eqn{G} by \eqn{M2} matrix where
\eqn{G} is the number of genes and \eqn{M2} is the number of samples for
component 2. Component 2 is needed only for running a three-component model.}

\item{niter}{The maximum number of iterations used in the algorithm of
iterated conditional modes. A larger value better guarantees the convergence
in estimation but increases the running time. The default is 10.}

\item{nbin}{The number of bins used in numerical integration for computing
complete likelihood. A larger value increases accuracy in estimation but
increases the running time, especially in a three-component deconvolution
problem. The default is 50.}

\item{if.filter}{The logical flag indicating whether a predetermined filter
rule is used to select genes for proportion estimation. The default is TRUE.}

\item{filter.sd}{The cut-off for the standard deviation of lognormal
distribution. Genes whose log transferred standard deviation smaller than
the cut-off will be selected into the model. The default is TRUE.}

\item{ngene.Profile.selected}{The number of genes used for proportion
estimation ranked by profile likelihood. The default is
\eqn{min(1500,0.1*G)}, where \eqn{G} is the number of genes.}

\item{ngene.selected.for.pi}{The percentage or the number of genes used for
proportion estimation. The difference between the expression levels from
mixed tumor samples and the known component(s) are evaluated, and the most
differential expressed genes are selected, which is called DE. It is enabled
when if.filter = TRUE. The default is \eqn{min(1500, 0.3*G)}, where \eqn{G}
is the number of genes. Users can also try using more genes, ranging from
\eqn{0.3*G} to \eqn{0.5*G}, and evaluate the outcome.}

\item{mean.diff.in.CM}{Threshold of expression difference for selecting
genes in the component merging strategy. We merge three-component to
two-component by selecting genes with similar expressions for the two known
components. Genes with the mean differences less than the threshold will be
selected for component merging. It is used in the three-component setting,
and is enabled when if.filter = TRUE. The default is 0.25.}

\item{nspikein}{The number of spikes in normal reference used for proportion
estimation. The default value is \eqn{ min(200, 0.3*My)}, where \eqn{My} the
number of mixed samples. If it is set to 0, proportion estimation is
performed without any spike in normal reference.}

\item{tol}{The convergence criterion. The default is 10^(-5).}

\item{pi01}{Initialized proportion for first kown component. The default is
\eqn{Null} and pi01 will be generated randomly from uniform distribution.}

\item{pi02}{Initialized proportion for second kown component. pi02 is needed
only for running a three-component model. The default is \eqn{Null} and pi02
will be generated randomly from uniform distribution.}

\item{nthread}{The number of threads used for deconvolution when OpenMP is
available in the system. The default is the number of whole threads minus
one. In our no-OpenMP version, it is set to 1.}
}
\value{
\item{pi}{A matrix of estimated proportion. First row and second row
corresponds to the proportion estimate for the known components and unkown
component respectively for two or three component settings, and each column
corresponds to one sample.} \item{pi.iter}{Estimated proportions in each
iteration. It is a \eqn{niter *My*p} array, where \eqn{p} is the number of
components. This is enabled only when output.more.info = TRUE.}
\item{gene.name}{The names of genes used in estimating the proportions.  If
no gene names are provided in the original data set, the genes will be
automatically indexed.}
}
\description{
This function is designed to estimate the proportions of all mixed samples
for each mixing component with a new proposed profile likelihood based gene
selection, which can select most identifiable genes as reference gene sets
to achieve better model fitting quality. We first calculated the Hessian
matrix of the parameter spaces and then derive the confidence interval of
the profile likelihood of each gene. We then utilized the length of
confidence interval as a metric to rank the identifiability of genes. As a
result, the proposed gene selection approach can improve the tumor-specific
transcripts proportion estimation.
}
\note{
A Hessian matrix file will be created in the working directory and the
corresponding Hessian matrix with an encoded name from the mixed tumor
sample data will be saved under this file. If a user reruns this function
with the same dataset, this Hessian matrix will be loaded to in place of
running the profile likelihood method and reduce running time.
}
\examples{


# Example 1: estimate proportions for simulated two-component data 
# with spike-in normal reference
  data(test.data.2comp)
# res.GS = DeMixT_GS(data.Y = test.data.2comp$data.Y, 
#                    data.N1 = test.data.2comp$data.N1,
#                    niter = 10, nbin = 50, nspikein = 50,
#                    if.filter = TRUE, ngene.Profile.selected = 150,
#                    mean.diff.in.CM = 0.25, ngene.selected.for.pi = 150,
#                    tol = 10^(-5))
#
# Example 2: estimate proportions for simulated two-component data 
# without spike-in normal reference
# data(test.dtat.2comp)
# res.GS = DeMixT_GS(data.Y = test.data.2comp$data.Y, 
#                    data.N1 = test.data.2comp$data.N1,
#                    niter = 10, nbin = 50, nspikein = 0,
#                    if.filter = TRUE, ngene.Profile.selected = 150,
#                    mean.diff.in.CM = 0.25, ngene.selected.for.pi = 150,
#                    tol = 10^(-5))



}
\references{
Gene Selection and Identifiability Analysis of RNA Deconvolution
Models using Profile Likelihood. Manuscript in preparation.
}
\seealso{
http://bioinformatics.mdanderson.org/main/DeMixT
}
\author{
Shaolong Cao, Zeya Wang, Wenyi Wang
}
\keyword{DeMixT_GS}