File: qn.test.combined.Rd

package info (click to toggle)
r-cran-ksamples 1.2-10-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 456 kB
  • sloc: ansic: 1,321; makefile: 2
file content (181 lines) | stat: -rw-r--r-- 8,070 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
\name{qn.test.combined}
\alias{qn.test.combined}
\title{
Combined Rank Score k-Sample Tests
}
\description{
This function combines several independent rank score \eqn{k}-sample tests
into one overall test of the hypothesis that the independent samples within 
each block come from a common unspecified distribution, while the common
distributions may vary from block to block.  
}
\usage{
qn.test.combined(\dots, data = NULL, test = c("KW", "vdW", "NS"),
	method = c("asymptotic", "simulated", "exact"),
	dist = FALSE, Nsim = 10000)
}
\arguments{
  	\item{\dots}{
		Either a sequence of several lists, say \eqn{L_1, \ldots, L_M} (\eqn{M > 1})
		where list \eqn{L_i} contains \eqn{k_i > 1}  sample vectors of respective 
		sizes \eqn{n_{i1}, \ldots, n_{ik_i}}, 
		where \eqn{n_{ij} > 4} is recommended
		for reasonable asymptotic \eqn{P}-value calculation. 
		\eqn{N_i=n_{i1}+\ldots+n_{ik_i}}
		is the pooled sample size for block \eqn{i},

		or a list of such lists,

		or a formula, like y ~ g | b, where y is a numeric response vector,
		g is a factor with levels indicating different treatments and
		b is a factor indicating different blocks; y, g, b have same length. 
		y is split separately for each block level into separate samples 
		according to the g levels. The same g level may occur in different 
		blocks. The variable names may correspond to variables in an optionally 
		supplied data frame via the data = argument.
	}
    	\item{data}{= an optional data frame providing the variables in formula input
		}
	\item{test}{= \code{c("KW", "vdW", "NS")}, 

		where \code{"KW"} uses scores \code{1:N} (Kruskal-Wallis test)

		\code{"vdW"} uses van der Waerden scores, \code{qnorm( (1:N) / (N+1) )}

		\code{"NS"} uses normal scores, i.e., expected values of 
		standard normal order statistics,
		invoking function \code{normOrder} of \code{package SuppDists (>=1.1-9.4)}

		For the above scores \eqn{N} changes from block to block and represents the total 
		pooled sample size \eqn{N_i}{N.i} for block \eqn{i}. 
	}
    \item{method}{=\code{c("asymptotic","simulated","exact")}, where

		\code{"asymptotic"} uses only an asymptotic chi-square approximation for the  \eqn{P}-value.
		The adequacy of asymptotic \eqn{P}-values for use with moderate sample sizes
		may be checked with \code{method = "simulated"}.

		\code{"simulated"} uses simulation to get \code{Nsim} simulated \eqn{QN} statistics for each
		block of samples, adding them component wise across blocks to get \code{Nsim} 
		combined values, and compares these with the observed combined value to 
		get the estimated \eqn{P}-value.

		\code{"exact"} uses full enumeration of the test statistic value 
		for all sample splits of the pooled samples within each block.
		The test statistic vectors for each block are added 
		(each component against each component, as in the R \code{outer(x,y,} \code{"+")} command) 
		to get the convolution enumeration for the combined test statistic.
		This "addition" is done one block at a time. 
		It is possible only for small problems, and is attempted only when \code{Nsim}
		is at least the (conservatively maximal) length 
		\deqn{\frac{N_1!}{n_{11}!\ldots n_{1k_1}!}\times\ldots\times \frac{N_M!}{n_{M1}!\ldots n_{Mk_M}!}}
		of the final distribution vector, were \eqn{N_i = n_{i1}+\ldots+n_{ik_i}} 
		is the sample size of the pooled samples for the i-th block. Otherwise, it reverts to the 
		simulation method using the provided \code{Nsim}.
	}
	\item{dist}{\code{FALSE} (default) or \code{TRUE}. If \code{TRUE}, the 
		simulated or fully enumerated convolution vector \code{null.dist} is returned for the 
		\eqn{QN} test statistic. 

		Otherwise, \code{NULL} is returned.
	}
	\item{Nsim}{\code{= 10000} (default), number of simulation splits to use within 
		each block of samples. It is only used when \code{method =} \code{"simulated"}
		or when \code{method =} \code{"exact"} reverts to \code{method =} \code{"simulated"}, 
		as previously explained.
		Simulations are independent across blocks, using \code{Nsim} for each block.
	}

}
\details{
The rank score \eqn{QN} criterion \eqn{QN_i}{QN.i} for the \eqn{i}-th block of 
\eqn{k_i}{k.i} samples, 
is used to test the hypothesis that the samples in the \eqn{i}-th block all come 
from the same but unspecified continuous distribution function \eqn{F_i(x)}{F.i(x)}.
See \code{\link{qn.test}} for the definition of the \eqn{QN} criterion
and the exact calculation of its null distribution.

The combined \eqn{QN} criterion \eqn{QN_{\rm comb} = QN_1 + \ldots + QN_M}{%
QN.comb = QN.1 + \ldots + QN.M}
is used to simultaneously test whether the samples 
in block i come from the same continuous distribution function \eqn{F_i(x)}{F.i(x)}. 
However, the unspecified common distribution function \eqn{F_i(x)}{F.i(x)} may change 
from block to block. 

The \eqn{k} for each block of \eqn{k}
independent samples may change from block to block.

The asymptotic approximating chi-square distribution has 
\eqn{f = (k_1-1)+\ldots+(k_M-1)}{f = (k.1-1)+\ldots+(k.M-1)} degrees of freedom.

NA values are removed and the user is alerted with the total NA count.
It is up to the user to judge whether the removal of NA's is appropriate.

The continuity assumption can be dispensed with if we deal with 
independent random samples, or if randomization was used in allocating
subjects to samples or treatments, independently from block to block, and if
the asymptotic, simulated or exact \eqn{P}-values are viewed conditionally, given the tie patterns
within each block. Under such randomization any conclusions 
are valid only with respect to the blocks of subjects that were randomly allocated.
In case of ties the average rank scores are used across tied observations within each block.
}
\value{
A list of class \code{kSamples} with components 
\item{test.name}{\code{"Kruskal-Wallis"}, \code{"van der Waerden scores"}, or

\code{"normal scores"}}
\item{M}{number of blocks of samples being compared}
\item{n.samples}{list of \code{M} vectors, each vector giving the sample sizes for 
each block of samples being compared}
\item{nt}{vector of length \code{M} of total sample sizes involved in each of the 
\code{M} comparisons of \eqn{k_i}{k.i} samples, respectively}
\item{n.ties}{vector giving the number of ties in each the \code{M}
comparison blocks}
\item{qn.list}{list of \code{M} matrices giving the \code{qn} results 
from \code{qn.test}, applied to the samples in each of
the \code{M} blocks}
\item{qn.c}{2 (or 3) vector containing the observed 
\eqn{QN_{\rm comb}}{QN.comb}, asymptotic \eqn{P}-value, 
(simulated or exact \eqn{P}-value).}
\item{warning}{logical indicator, \code{warning = TRUE} when at least one 
\eqn{n_{ij} < 5}{n.ij < 5}.}
\item{null.dist}{simulated or enumerated null distribution of the 
\eqn{QN_{\rm comb}}{QN.comb} statistic.
It is \code{NULL} when \code{dist = FALSE} or when \code{method = "asymptotic"}.}
\item{method}{The \code{method} used.}
\item{Nsim}{The number of simulations used for each block of samples.}
}
\references{
Lehmann, E.L. (2006), \emph{Nonparametric, Statistical Methods Based on Ranks}, Springer Verlag, New York. Ch. 6, Sec. 5D.
}

\note{
These tests are useful in analyzing treatment effects of shift nature in randomized 
(incomplete) block experiments.
}
\seealso{
\code{\link{qn.test}}
}
\examples{
## Create two lists of sample vectors.
x1 <- list( c(1, 3, 2, 5, 7), c(2, 8, 1, 6, 9, 4), c(12, 5, 7, 9, 11) )
x2 <- list( c(51, 43, 31, 53, 21, 75), c(23, 45, 61, 17, 60) )
# and a corresponding data frame datx1x2
x1x2 <- c(unlist(x1),unlist(x2))
gx1x2 <- as.factor(c(rep(1,5),rep(2,6),rep(3,5),rep(1,6),rep(2,5)))
bx1x2 <- as.factor(c(rep(1,16),rep(2,11)))
datx1x2 <- data.frame(A = x1x2, G = gx1x2, B = bx1x2)

## Run qn.test.combined.
set.seed(2627)
qn.test.combined(x1, x2, method = "simulated", Nsim = 1000) 
# or with same seed
# qn.test.combined(list(x1, x2), method = "simulated", Nsim = 1000)
# or qn.test.combined(A~G|B,data=datx1x2,method="simulated",Nsim=1000)
}



\keyword{nonparametric}
\keyword{htest}
\keyword{design}