1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
|
% $Author: sinnwell $
% $Date: 2008/04/14 19:54:34 $
% $Header: /people/biostat3/sinnwell/Haplo/Make/RCS/haplo.score.slide.Rd,v 1.3 2008/04/14 19:54:34 sinnwell Exp $
% $Locker: $
% $Log: haplo.score.slide.Rd,v $
% Revision 1.3 2008/04/14 19:54:34 sinnwell
% change eps.svd note
%
% Revision 1.2 2008/04/10 14:36:49 sinnwell
% add eps.svd
%
% Revision 1.1 2008/01/09 19:38:58 sinnwell
% Initial revision
%
%Revision 1.11 2007/04/12 21:52:57 sinnwell
%add \s-arg
%Revision 1.10 2007/01/25 22:45:26 sinnwell
%add haplo.effect and min.count
%
%Revision 1.9 2005/03/03 20:50:55 sinnwell
%change default for skip.haplo
%
%Revision 1.8 2004/03/24 14:40:26 sinnwell
%comment out examples, except data setup, so users may uncomment and run
%
%Revision 1.7 2004/03/22 20:17:58 sinnwell
%shorten example
%
%Revision 1.6 2004/03/16 22:27:33 sinnwell
%put plot examples here to consolidate running the function
%
%Revision 1.5 2004/03/03 15:40:18 sinnwell
%add labels in example function calls
%
%Revision 1.4 2004/03/02 20:27:28 sinnwell
%binomial to ordinal in second example call
%Revision 1.3 2004/03/01 22:46:41 sinnwell
%fix examples
%
%Revision 1.2 2004/02/16 17:32:06 sinnwell
%change F to FALSE
%
%Revision 1.1 2003/08/22 20:04:58 sinnwell
%Initial revision
\name{haplo.score.slide}
\alias{haplo.score.slide}
\title{
Score Statistics for Association of Traits with Haplotypes
}
\description{
Used to identify sub-haplotypes from a group of loci. Run haplo.score
on all contiguous subsets of size n.slide from the loci in a genotype
matrix (geno). From each call to haplo.score, report the global score
statistic p-value. Can also report global and maximum score statistics
simulated p-values.
}
\usage{
haplo.score.slide(y, geno, trait.type="gaussian", n.slide=2,
offset = NA, x.adj = NA, min.count=5,
skip.haplo=min.count/(2*nrow(geno)),
locus.label=NA, miss.val=c(0,NA),
haplo.effect="additive", eps.svd=1e-5,
simulate=FALSE, sim.control=score.sim.control(),
em.control=haplo.em.control())
}
\arguments{
\item{y}{
Vector of trait values. For trait.type = "binomial", y must
have values of 1 for event, 0 for no event.
}
\item{geno}{
Matrix of alleles, such that each locus has a pair of
adjacent columns of alleles, and the order of columns
corresponds to the order of loci on a chromosome. If
there are K loci, then ncol(geno) = 2*K. Rows represent
alleles for each subject.
}
\item{trait.type }{
Character string defining type of trait, with values of
"gaussian", "binomial", "poisson", "ordinal".
}
\item{n.slide }{
Number of loci in each contiguous subset. The first subset is the ordered
loci numbered 1 to n.slide, the second subset is 2 through n.slide+1
and so on. If the total number of loci in geno is n.loci, then there
are n.loci - n.slide + 1 total subsets.
}
\item{offset}{
Vector of offset when trait.type = "poisson"
}
\item{x.adj }{
Matrix of non-genetic covariates used to adjust the score
statistics. Note that intercept should not be included,
as it will be added in this function.
}
\item{min.count }{
The minimum number of counts for a haplotype to be included in the
model. First, the haplotypes selected to score are chosen by minimum
frequency greater than skip.haplo (based on min.count, by default).
It is also used when haplo.effect is either dominant or
recessive. This is explained best in the recessive instance, where
only subjects who are homozygous for a haplotype will contribute
information to the score for that haplotype. If fewer than min.count
subjects are estimated to be affected by that haplotype, it is not
scored. A warning is issued if no haplotypes can be scored.
}
\item{skip.haplo }{
For haplotypes with frequencies < skip.haplo, categorize them into a common
group of rare haplotypes.
}
\item{locus.label }{
Vector of labels for loci, of length K (see definition of geno matrix).
}
\item{miss.val }{
Vector of codes for missing values of alleles.
}
\item{haplo.effect }{
The "effect" pattern of haplotypes on the response. This parameter
determines the coding for scoring the haplotypes.
Valid coding options for heterozygous and homozygous carriers of a
haplotype are "additive" (1, 2, respectively), "dominant" (1,1,
respectively), and "recessive" (0, 1, respectively).
}
\item{eps.svd}{
epsilon value for singular value cutoff; to be used in the generalized
inverse calculation on the variance matrix of the score vector.
}
\item{simulate}{
Logical, if [F]alse (default) no empirical p-values are computed.
If [T]rue simulations are performed. Specific simulation parameters
can be controlled in the sim.control parameter list.
}
\item{sim.control }{
A list of control parameters used to perform simulations for simulated
p-values in haplo.score. The list is created by the function
score.sim.control and the default values of this function can be
changed as desired.
}
\item{em.control }{
A list of control parameters used to perform the em algorithm for
estimating haplotype frequencies when phase is unknown. The list is
created by the function haplo.em.control and the default values of
this function can be changed as desired.
}
}
\value{
List with the following components:
\item{df}{
Data frame with start locus, global p-value, simulated global p-value,
and simulated maximum-score p-value.
}
\item{n.loci}{
Number of loci given in the genotype matrix.
}
\item{simulate}{
Same as parameter description above.
}
\item{haplo.effect}{
The haplotype effect model parameter that was selected for haplo.score.
}
\item{n.slide}{
Same as parameter description above.
}
\item{locus.label}{
Same as parameter description above.
}
\item{n.val.haplo}{
Vector containing the number of valid simulations used in the
maximum-score statistic p-value simulation. The number of valid simulations
can be less than the number of simulations requested (by sim.control)
if simulated data sets produce unstable variables of the score
statistics.
}
\item{n.val.global}{
Vector containing the number of valid simulations used in the
global score statistic p-value simulation.
}
}
\section{Side Effects}{
}
\details{
Haplo.score.slide is useful for a series of loci where little is
known of the association between a trait and haplotypes. Using a
range of n.slide values, the region with the strongest association
will consistently have low p-values for locus subsets containing the
associated haplotypes. The global p-value measures significance
of the entire set of haplotypes for the locus subset. Simulated
maximum score statistic p-values indicate when one or a few haplotypes
are associated with the trait.
}
\section{References}{
Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA.
"Score tests for association of traits with haplotypes when
linkage phase is ambiguous." Amer J Hum Genet. 70 (2002): 425-434.
}
\seealso{
\code{\link{haplo.score}},
\code{\link{plot.haplo.score.slide}},
\code{\link{score.sim.control}}
}
\examples{
setupData(hla.demo)
# Continuous trait slide by 2 loci on all 11 loci, uncomment to run it.
# Takes > 20 minutes to run
# geno.11 <- hla.demo[,-c(1:4)]
# label.11 <- c("DPB","DPA","DMA","DMB","TAP1","TAP2","DQB","DQA","DRB","B","A")
# slide.gaus <- haplo.score.slide(hla.demo$resp, geno.11, trait.type = "gaussian",
# locus.label=label.11, n.slide=2)
# print(slide.gaus)
# plot(slide.gaus)
# Run shortened example on 9 loci
# For an ordinal trait, slide by 3 loci, and simulate p-values:
# geno.9 <- hla.demo[,-c(1:6,15,16)]
# label.9 <- c("DPA","DMA","DMB","TAP1","DQB","DQA","DRB","B","A")
# y.ord <- as.numeric(hla.demo$resp.cat)
# data is set up, to run, run these lines of code on the data that was
# set up in this example. It takes > 15 minutes to run
# slide.ord.sim <- haplo.score.slide(y.ord, geno.9, trait.type = "ordinal",
# n.slide=3, locus.label=label.9, simulate=TRUE,
# sim.control=score.sim.control(min.sim=200, max.sim=500))
# note, results will vary due to simulations
# print(slide.ord.sim)
# plot(slide.ord.sim)
# plot(slide.ord.sim, pval="global.sim")
# plot(slide.ord.sim, pval="max.sim")
}
\keyword{}
% docclass is function
% Converted by Sd2Rd version 37351.
|