1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346

=================================================================
# HAPLO STATS News File
#
# This file documents software changes up to version 1.3.8
#
# format is as follows
# 
# [change/add]: function.name title for issue
# explanation of issue, status, and recommendations
===================================================
### changes made between releases 1.3.8 and 1.4.1
===================================================

change: seqhap use precision threshold for permuation pvalues
Adapt the permutation rules used in haplo.score's sim.control
parameter to ensure accuracy and precision thresholds for permutation
pvalues. The permutations are carried out in seqhap.c, so the
parameters p.threshold, min.sim, and max.sim are passed to the C code
to permute the response until precision criteria met. No longer use n.sim
parameter; now sim.control=score.sim.control() handles the permutations.

update: user manual
The user manual has been updated from version 1.3.1 to reflect all the
updates since then, and will be placed on Dan Schaid's software page, in
addition to its current location within the package.

update: help files for example datasets contained \item keyword with
no text, which didn't pass R CMD check on R 2.8.1. Now they pass.
===================================================
### changes made between releases 1.3.6 and 1.3.8
===================================================

change: plot.seqhap handle small pvalues
Handle very small pvalues better by having a minimum allowable
asymptotic pvalue of .Machine.double$eps, and permutation pvalue of
1/(n.sim+1). It will also handle a ylim value if passed.
Add more useful warning messages for when pvalues are fixed for
plotting.

change: haplo.score add eps.svd
In some assocation tests from haplo.score, we have observed extremely
significant values for the global association test statistic. The
degrees of freedom for the global test is the rank of the score
vector's variance matrix. We found the source of the problem was
having too low a cutoff (epsilon) for svd values for
determining rank of the variance matrix. We increased the default for
the epsilon from 1e6 to 1e5 and allow it to be changed by the user
as the eps.svd parameter in any function that uses haplo.score
(haplo.score.slide, haplo.cc).

change: haplo.cc parameters
We remove haplo.min.count as a toplevel parameter; it can only
be used in the control() function, just as in haplo.glm.
Note that haplo.freq.min can also be used.
The eps.svd parameter is also added, as noted for haplo.score.
===================================================
### changes made between releases 1.3.0 and 1.3.6
===================================================

add: haplo.power.qt and haplo.power.cc:
Power and sample size calculations for haplotype association studies.
Calculations are performed given a set of haplotypes, their freqs, and
their beta coefficients, which can be converted by log(OR) for
casecontrol (cc) or calculated for quantitative trait (qt) by R2
variance explained by gene association. For qt, use the
find.haplo.beta.qt to get these beta coefficients.

added: dataset hapPower.demo
An example data set hapPower.demo is included in the package for
demonstrating the haplo.power.qt/cc functions in example() and in the manual.

change: haplo.em
In past versions, a change was made to precalculate how much memory
would be needed for all haplotype pairs, and issued a warning if that
memory could not be allocated. It stopped calculations
that could have been completed by progressive insertion & trimming
steps because rare haplotypes are trimmed off and memory rarely meets
the max. So the warning is taken off.

change: haplo.em.control: min.posterior
The old default for min.posterior was set at 1e7. In rare cases of some
datasets that had low LD and 10 or more markers, the trimming steps
actually trimmed away all haplotypes for a given person and the person
was removed. We have changed min.posterior to 1e9 and put in
warnings and check for this occuring. Note, we have only observed this in
simulated data on very rare occasions.

change: haplo.glm remove allele.lev and miss.val parameters
We used to require the use of allele.lev as a parameter for haplo.glm,
and allow miss.val to specify codes for missing alleles in the
genotype matrix. However, we require using setupGeno to prepare the
genotype matrix to be used in haplo.glm, after it is added to the
data.frame to be passed to haplo.glm. miss.val is completely taken
care of there, and allele.lev is assigned as an attribute of geno.
We have reworked the formula and na.geno.keep to recognize these
values when it finds geno in the formula; therefore, these parameters
are not required in haplo.glm.

change: na.geno.keep
We used to keep all subjects who were missing any number of
alleles. However, if a subject is missing all alleles, they both slow
the calcualtions down, and don't add any information to the analysis.
This function still removes subjects missing y or covariate values,
and now removes subjects missing all their alleles. After the
removal, the attributes of the genotype matrix are recalculated and
retained for its use in haplo.model.frame.

changes: haplo.model.frame
Get allele.lev from geno in m[[]], not as passed paremeter from haplo.glm.

change: haplo.glm.control
enforce the default setting for haplo.min.count and haplo.freq.min in
the function delcaration. In the declaration they were NA, but a
default min.count of 5 was enforced. We have changed the default of
haplo.freq.min of .01 to be enforced, and the delcaration now reflects the
enforced default.

changes: Ginv.q and Ginv.R
Nothing has changed for R. Splus version 8.0.1 has a problem in its
use of the svd fortran function, as called by svd.Matrix. We
contacted Insightful and they fixed it for version 8.0.4. We include
the svd.Matrix function from version 7 and 8.0.4 in the Ginv.q file,
but only load it if the Splus version matches 8.0.1.

change: louis.info.c
Prior efforts to make all long integer values as int was not completed
for this function. The result was the package didn't work on linux
64bit machines. Now it doesn't use long, and it should work on most
platforms.

change: louis.info.q
When the variance of a quantitative trait is so high that the the
information matrix becomes illconditioned, the Ginv determines the
information matrix singular, and the standard errors are
incorrect. Change the epsilon parameter for the generalized inverse to
about 1e8, versus the old default in Ginv of 1e6.
=========================================================
#### changes made between release 1.2.5 and 1.3.0 #####
=========================================================

seqhap: sequential haplotype selection in a set of loci
For choosing loci for haplotype associations, as
described in Yu and Schaid, 2007. The method performs three tests
for association of a binary trait over a set of biallelic loci.
When evaluating each locus, loci close to it are added in a sequential
manner based on the MantelHaenszel test.

geno1to2: convert geno from 1 to 2column
convert 1column minorallelecount matrix to twocolumn
allele codes

plot.haplo.score.slide: handle nearzero pvalues
For asymptotic pvalues near zero, set to epsilon.
For simulated, set to 0.5 divided by the number of simulations performed

haplo.design: create design matrix for haplotypes
In response to many requests made for getting columns for haplotype
effects to use in glm, survival, or other regression models, we
created a function to set up this kind of design matrix. There are
issues surrounding the use of these effect columns, as outlined in the
user manual.

Ginv: svd problems continue
The Matrix library svd function has changed for Splus 8.0.1.
Therefore, revert back to the default svd function in getting the
generalized inverse.
=========================================================
#### changes made between release 1.2.0 and 1.2.5 #####
=========================================================

haplo.glm: Iterative steps efficiency
In consecutive steps of the IRWLS steps in haplo.glm, the starting
values for refitting the glm model were not updated to be the most
recently updated values. This now saves about 20% of run time in
haplo.glm.

haplo.score: haplo.effect allow additive, dominant, recessive
A new option to make haplo.score more flexible. Previously the scores
for haplotypes were computed assuming an additive effect for all
haplotypes. A new parameter, haplo.effect, is in place to allow
either additive, dominant, or recessive effects.

haplo.score: min.count parameter
The cutoff for selecting haplotypes to score is either by a minimum
frequency, skip.haplo, or a new option, min.count. The min.count is
based on the same idea as that used in haplo.glm, where the minimum
expected count of haplotypes in the population is enough such that
accurate estimates of parameters and standard errors are computed. The
min.count became needed when haplo.effect was added because under
the dominant or recessive models, the number of persons actually
having a haplotype effect could be fewer than the expected count
over the population (i.e., haplotype pair h1/h2 is coded as 0 for
both under recessive model, and h1/h1 is coded as 1 under dominant).

haplo.em: improved reliability of C routines
Previously problems had been observed with running haplo.em and
haplo.glm on linux 64bit machines, because of issues with the storage
of integers in R. In R, all integers are stored as int, which are
stored differently on 64bit and 32bit machines. We get around this
problem by using all int types for integers, which are only used for
indices of other data structures. We find out the max value for integers
on the system, and if the indices are going to exceed the max, issue a
warning from C.

haplo.glm and Ginv: improvement of standard error calculations
Under some extreme circumstances, such as haplo.glm modeling
haplotypes with rare frequencies, or a high amount of variance in the
response, the standard error estimates were unreliable.
The issue came out in the Ginv function in haplo.stats, which needed a
smaller epsilon to decide on the rank of the information matrix.
=========================================================
#### changes made between release 1.1.1 and 1.2.0 #####
=========================================================

haplo.em: fixed memory leak
Versions up to 1.1.1 had either one or two memory leaks in haplo.em.
They are fixed.

All .C functions: Long Integers warning for 64bit machine
Due to problems with long integers between 32bit and 64bit machines
using R, all integers used in C functions will use unsigned integers.

haplo.glm: haplo.effect="recessive"
the estimation stops if no columns are left in the model.matrix for
homozygotes with the haplotype, and for haplotypes that do not have
any subjects with a posterior probability of being homozygous for the
haplotype, those subjects are grouped into the baseline effect.
Guidelines for rare haplotypes are explained further in the manual.

haplo.glm: na.action, when not specified got set to something besides
the intended 'na.geno.keep'. Now the default setting works.

haplo.cc: New Function for CaseControl Analysis
New function added to combine methods of haplo.score,
haplo.group and haplo.glm into one set of output for CaseControl
data. Choose haplotypes for analysis by haplo.min.count only, not a
frequency cutoff.

haplo.score: skip.haplo new default
Default for skip.haplo is now 5/(nrow(geno)*2)

haplo.glm: haplo.freq.min and haplo.min.count control parameters
Haplotypes used in the glm are still chosen by haplo.freq.min, but
the default is based on a minimum expected count of 5 in the
sample. The better choice for selecting haplotypes is
haplo.min.count. The issue is documented in the manual and help files.

haplo.score: maxstat simulated pvalue
A better description of this is included in the manual and help file

haplo.em.control and haplo.em: defaults for control parameters
changed
The default for control parameter:
max.iter=5000, changed from 500
insert.batch.size = 6, changed from 4

locus
The genetics package for R has a function named locus which does not
agree with locus from haplo.stats. We do not plan to change it, so be
aware of the possible clash if you use these two packages

haplo.scan: new function
For analyzing a genome region with casecontrol
data. Search for a traitlocus by sliding a fixedwidth window over
each marker locus and scanning all possible haplotype lengths within
the window
=================================================================
### changes made prior to release 1.1.1 #####
=================================================================

haplo.glm: Warnings for noninteger weights
glm.fit for R does not allow noninteger weights for subjects, whereas
SPLUS does. Use a glm.fit.nowarn function for R to ignore warnings.

haplo.glm: Character Alleles
Local settings for strings as factors causes confusion for keeping
orinial character allele values. To ensure consistency of allele
codes, use setupGeno() and then in the haplo.glm call, use allele.lev
as documented in the manual and help files.

haplo.score.slide: add to package
Run haplo.score on all contiguous subsets of size n.slide from the
loci in a genotype matrix (geno).

haplo.score: simulations controlled for precision
Employ simulation precision criteria for pvalues, adopted from
Besag and Clifford [1991]. Control simulations with
score.sim.control.
