1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258
|
-*- org -*- C-c C-o follows link[MM: grep-r -e '\(FIXME\|TODO\)']
* Before next release
** TODO glmrob() -- bug = R-forge bug : *warn* when family = "gaussian":
they should rather use lmrob() !!
==> ~/R/MM/Pkg-ex/robustbase/glmrob_ChrSchoetz-ex.R
** TODO when lmrob.S() detects exact fit (in C code), it *should* return it, incl scale = 0
*and* it should have correct rweights[] in {0,1} and residuals[];
(fitted[] computed in R code after .C() call);
==> ~/R/MM/Pkg-ex/robustbase/ThMang_lmrob.R
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and >>> tests/subsample.R (bottom) <<<-------
*** in src/lmrob.c: Currently there *two* such situations the first explicit when
#{zero resid} >= ~ (n + p)/2 {--> theory of *modified* MAD
==> ~/R/MM/STATISTICS/robust/tmad.R
*** lmrob() calling lmrob..M.. after that should "work", optionally/always ?? using a non-zero scale,
------- it might use Martin's tmad() {trimmed mean of absolute deviations from the median}
** DONE print(<covMcd>) and print(summary(<covMcd>)): 'method' almost twice; show iBest
for "deterministic"
** DONE covMcd(): allow 'hsets.ini' argument to covMcd(), and *return* them (optionally) as 'Hsubsets'
** DONE export rankMM, classSVD, .signflip [MM: repmat should not be needed]
** DONE 'scalefn' argument for covMcd() to be used for "detMCD"
** DONE adjOutlyingness():
*** DONE change defaults for clower & cupper (E-mail from P.Segaert).
*** DONE But there is more: +/- swap ==> results not back compatible
** DONE colMedians() -> ask Henrik/ <maintainer@bioconductor.org> about "License: Artistic-2.0"
** DONE splitFrame() [important for lmrob(.. method = "M-S") --> lmrob.M.S()]: *character* should be treated *as* factors
** TODO nlrob()
*** TODO summary(nlrob(*)) fails for new methods; better error message or *work*
**** TODO for "MM" we are close to done; ideally want '$ rweights' (robustness weights) for all meth
*** TODO residuals( nlrob(), type = "...") should provide types "as in the literature"
*** DONE nlrob(*, method = "...") should call methods "tau", "CM", "MTL", "MM"
by Eduardo Conceicao
**** DONE shouldn't we rename jde() to jdeopt() or even jdeoptim(), jDEoptim(), or JDEoptim()
R users already know optim() etc.. so the name seems more logical for them.
* Short Term
** DONE lmrob.tau.fast.coefs() calls `coef(lmrob( <y> ~ <x> -1))` ... should use `lmrob.fit()` !
** TODO lmrob.S() -> lmrob.control() should get nResample = "exact" -- as covMcd() / ltsReg() :
*** use Fortran routines rfncomb() and rfgenp() [in src/rf-common.f -> need F77_name(.) / F77_CALL(.)] from C
**** or even just right a version for C indices in {0, 1, ..., p-1} instead of Fortran {1, 2, ...., p}
** DONE sc.1) Qn(), Sn() should work with 'NA' (via 'na.rm=FALSE' argument, as mad())
** DONE sc.2) Qn(), Sn() should work with Inf and "large x", see example(mc)
** DONE sc.2) mc() now works better with "large x", see example(mc) --- BSc thesis by Lukas Graz
** DONE fixed: mc(x) can *fail* to converge: thesis Lukas Graz {and ~/R/MM/Pkg-ex/robust/Robnik-mc.R }
** DONE Now that mc() works, also define lmc() and rmc() (left and right tail measures).
** TODO Try scaleTau2(*, iterate=TRUE) or 'iter=TRUE' (which can have 'iter = 10' etc)
** TODO [Peter Filzmoser, Geneva 2016-07-07 talk]: covMcd() warns when n < 2*p .. should not *warn* but give message()
** TODO [Peter Filzmoser, Geneva 2016-07-07 talk]: solve.default(getCov(mcd)) error with CovControlOgk() init
*** INSTEAD it should report the (theoretical) breakdown point (p ...) / (n - h ... ) [from the MCD theory]
** TODO estimethod(): also for lmrob() and glmrob() models
** TODO VT implement .detMcd() in C
** TODO r6pack(milk, ..): *return* singularity message but do not signal an error
** TODO nlrob(*):
*** TODO for the "vector parameter" biomass example in tests/nlrob-tst.R: method = "MM"
As we do want the formula to work ==> we *must* allow 'lower' & 'upper' as list()s
in R/nlregrob.R, have 14 matches for "eval *( *formula\[\[3L?"
((and *org* shows the `[[3L].]` (no ".") as underscored 3L)) :
123: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
127: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
141: y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
175: res <- y - eval( formula[[3L]], c(data, initial$par) )
193: fit <- eval( formula[[3L]], c(data, coef) )
254: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
300: fit <- eval( formula[[3L]], c(data, coef) )
355: fit <- eval( formula[[3L]], c(data, par) )
361: fit <- eval( formula[[3L]], c(data, par) )
366: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
390: fit <- eval( formula[[3L]], c(data, coef) )
434: fit <- eval( formula[[3L]], c(data, par) )
442: fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
468: fit <- eval( formula[[3L]], c(data, coef) )
the same as in R/nlrob.R where we had eval(.., c(data, coef)) but now eval(.., c(data, start))
*** TODO nlrob(*, method=.) -- try at least *one* other optimizer than JDEoptim(),
since we provide already most of the needed "hooks".
*** DONE nlrob(*, method="M"): allow a "fixed initial sigma" (--> inference for "MM")
*** TODO confint.nlrob():, "Wald" works; naming follows confint.lmer() from lme4
**** TODO method = "profile" based on (TODO) profile.nlrob()
**** DONE method = "Wald" works
** TODO simulate() : implement for "nlrob" {and lmrob() ? and ..}
** glmrob
*** BYlogreg() [ R/BYlogreg.R ]
--> more tests in ./tests/glmrob-1.R
--> glm.fit() instead of glm()
--> vcov() instead of just std.err. {is already there}
*** glmrob(*, weights.on.x = "robCov") uses MASS::cov.rob(),
i.e. "MVE" and Andreas had a comment that "mcd" is worse.
"covMcd" has been available for a while; now via robXweights() in ./R/glmrobMqle.R
HOWEVER: Need something better when 'X' has (binary!) factors!
"hat" +- works, but needs more work
*** We now allow weights.on.x to be an arbitrary general wts(X, intercept)
function _or_ a list containing a robMcd()-like function.
Definitely need *testing* this situation!
*** glmrob(<Gamma>): anova() has three variants: "V1", "Eva1", "Andreas1"
--> ./R/glmrobMqle-DQD.R
- gives warning every time {-> easy to fix}
- Default is "V1" is that a good idea?
*** glmrob() needs a bit more tests in ./tests/
[also consider those from man/glmrob.Rd]
take those from Martin's old 'robGLM1' package (need more!)
*** --> first test already shows that Martin's tests for "huberC == Inf"
were *not* yet moved from robGLM1 to glmrob()...
(in other words: glmrob() should work
*** also, ni = 0 does not work quite as it should ( ./tests/binom-ni-small.R )
*** obj $ df ... maybe should be defined -- for "glm" methods to be
applicable --> e.g. for predict(<glmrob>, interval="..") !
*** summary.glmrob() should be better documented;
we should decide if the current return value is fine.
*** Eva's code (and MM's) also computed & returned the "asymptotic efficiency"!
*** anova.glmrob(): More modularization, allowing to provide own 'test' function.
Test if Huber's C are different. Need theory to compare different C's and
same model (which includes classical vs robust).
*** add1() and/or drop1() would be nice
** TODO scaleTau2(): Also do a cheap finite-sample correction [MM] !
[DONE partly; but undocumented, since bound to change -->
file:~/R/MM/STATISTICS/robust/1d-scale.R , 1d-scale-sim.R, etc --- unfinished!!
** TODO Psi/Rho/Chi/Wgt Functions
We have quite a few "partial" collections of rho/psi functions;
some are "sync"ed now, some not yet::
*** TODO 1) have the nice S4 class psi_func + psiFunc() and .defDwgt() functions
in file:R/psi-rho-funs.R with further explorations, ideas in
file:misc/experi-psi-rho-funs.R
**** TODO print/show of such psi_func should show more; at least the psi function
**** TODO str.psi_func() should be a bit nicer than the current default str()
**** TODO nlrob(): also allow psi to be a 'psiFunc':
--> ./R/nlrob.R ; consider even more *real* checks; now in tests/nlrob-tst.R
*** DONE 2) deprecated: "old" tukeyChi() & tukeyPsi1() originally called from lmrob() , in
./R/biweight-funs.R
*** DONE 3) psi.*(...., rho = FALSE/TRUE) functions from Andreas
([[file:.R/psi-funs-AR.R]]) replaced by using the new psi_func objects
**** DONE nlrob() changed: uses psi = .Mwgt.psi1("huber", cc=1.345) as default
*** TODO 4) have (C-based) functions Mpsi(), Mchi(), Mwgt(), etc, used from lmrob(),
in ./R/lmrob.MM.R
**** TODO provide Mpsi(psi = "t") etc; tuning parameter: 'nu' => MLE-t_\nu psi and psi'
**** TODO provide '1)'-i.e. psi_func versions of the Mpsi() etc
**** TODO Mpsi(*, "GGW") etc : have no (??) easy way to directly specify (a,b,c) tuning pars
**** TODO *New* Mwgt(*, deriv=1) would correspond to Dwgt in psiFunc() which Manuel needs
**** DONE now exported and documented in man/M.psi.Rd
Further files, illustrating features, differences, etc:
./vignettes/psi_functions.Rnw -- with quite a few FIXME
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./inst/xtraR/plot-psiFun.R chkPsiDeriv() {and plot utils}
./tests/psi-rho-etc.R compute asymp.efficiency and breakdown point !
./tests/lmrob-psifns.R plot and lmrob()-test them
**** DONE Deprecate* the "2)", tukeyChi() etc, making them call the M.*fun():
* Mid Term
** TODO R/lmrob.MM.R: Using lmrob.E( * )
*** TODO: numerically clearly better than these -- e.g. in summary.lmrob(): use integrate(), or
*** ---- allow to specify 'use.integrate=TRUE, tol = 1e-10' etc for *accurate* cor.factors
*** "hampel", "bisquare", "lqq" (polyn!): derive *exact* formula; others maybe, too <==> psi_func objects above?
** TODO New lmrob() features ("M-S" as option?):
*** Function names with "." (which are exported) are frowned upon e.g. lmrob.split()
*** checking .vcov.avar1() and its "posdefify" options [but "KS201x" uses .vcov.w() anyway]
*** TODO lmrob.mar() [file:inst/doc/estimating.functions.R]: Maronna & Yohai (2010) should
~~~~~~~~~~ become part of robustbase, maybe under a better name,
e.g. via lmrob( ... control ..) or directly.
It is much used in the simulations of Koller & Stahel (2011)
*** TODO Provide "simple M" estimator [so MASS :: rlm() is entirely superseeded]
Consider lmrob(*, method = "M") --> default init = "ls" (Least Sq; as MASS:::rlm.default)
which calls lmrob..M..fit() which is already documented as "simple"
M-estimator (though the *scale* is kept fixed; i.e., no 'proposal 2').
** TODO glmrob(), glmrobMqle(), etc : expression()s and eval() no longer "satisfactory",
e.g., see FIXME in ./R/glmrobMqle.R
** TODO covMcd(): pass k3 as argument; default=current ==> allow "formula" k3 = k3(n,p) !!
** covOGK():
The argument name 'weight.fn' is pretty ugly and the default function
name 'hard.rejection()' is just awful (we need a globally available
function as 'role model'.
- Could allow 'n.iter = 0' to simply compute Cov()_{ij} = rcov(X_i, X_j)
** rrcov etc
*** rrcov.control() __ NEEDS name change ! ______
probably use mcd.control() and lts.control()
or forget about *control() completely?
since there are only a few in each ??????/
*** TODO tolellipse() --> renamed to tolEllipsePlot()
**** maybe use cluster::ellipsoidPoints()
**** allow other percentiles than just 97.5%
**** maybe *return* something
*** plot(mcd. ) [ R/covPlot.R ] : should show the call
Default for 'ask' should be smarter: depend on
prod(par("mfrow")) < #{plots} (which depends on 'classic' and p=2)
*** ltsReg(): has undocumented '$resid'
in addition to '$residuals' and '$raw.residuals';
drop it or document it !
** More lmrob() considerations
*** DONE more tests in tests/
*** fully implement and test the multivariate case (y = matrix with > 1 col.)
*** src/lmrob.c :
does median() , MAD() instead of using R's sort() routines
* Long Term / Maybe
** inst/doc/lmrob_simulation.Rnw :
*** use hyperlinks {e.g. using jss docu.class!}
*** consider making parts available in (new) ./demo/lmrob...R
*** tau_i (p.5) is not clear for Joe Average.
..........................................
** Generalizing 'wgt.himedian': We'd want a C API on which R builds.
There are pure R implementations:
- 'weighted.median()' in limma
and I have generalized it ---> file:inst/xtraR/ex-funs.R
- more general code (different 'tie' strategies; weighted *quantile*s)
in file:/u/maechler/R/MM/STATISTICS/robust/weighted-median.R
- The 'Hmisc' package has wtd.quantile()
** Miscellaneous
*** Alternative version of covOGK() for correlation-only
using's Huber's correlation formula which ensures [-1,1] range
--> ~/R/MM/Pkg-ex/robustbase/robcorgroesser1.R
and ~/R/MM/STATISTICS/robust/pairwise-new.R
*** package 'riv' (author @ epfl.ch!) has 'slc()' ~= cov.S(.) -- in pure R code
doesn't Valentin have a version too?
otherwise: test this, ask author for "donation" to robustbase
*** adjOutlyingness() :
**** typo-bug is corrected; and I have made it more pretty.
Still a bit problematic when denominator = 0
Currently leave away all the c/0 = Inf and 0/0 = NaN values.
MM: Maybe, it's the fact that the coef = 1.5 should really depend on
the sample size n and will be too large for small n (??)
--> should ask Mia and maybe Guy Brys
**** For really small (n,p): Taking 250 random samples of size p; is non-sense when choose(n,p) <= 250
Rather then, take *all* sub-samples of size p ==> getting a non-random result.
*** Add data sets from the MMY-book -- mostly done {do we have *all* ?}
*** Data Sets --- Valentin Todorov has several of Rousseeuw's in the 'rrov' package
(and promised me "the rest" when needed)
Don't like the *.x, *.y sub datasets: They shouldn't be needed when use a *formula*
In his lts tests, he uses these "data sets from the literature":
(Note that 'stackloss' is already in "datasets") :
heart.x,heart.y, data(heart)
stars.x,stars.y, data(stars)
phosphor.x,phosphor.y, data(phosphor)
stack.x,stack.loss, data(stackloss)
coleman.x,coleman.y, data(coleman)
salinity.x,salinity.y, data(salinity)
aircraft.x,aircraft.y, data(aircraft)
delivery.x,delivery.y, data(delivery)
wood.x,wood.y, data(wood)
hbk.x,hbk.y, data(hbk)
|