File: TODO

package info (click to toggle)
robustbase 0.99-4-1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, trixie
  • size: 4,552 kB
  • sloc: fortran: 3,245; ansic: 3,243; sh: 15; makefile: 2
file content (258 lines) | stat: -rw-r--r-- 14,691 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
				-*- org -*- C-c C-o follows link[MM: grep-r -e '\(FIXME\|TODO\)']

* Before next release
** TODO  glmrob() -- bug = R-forge bug : *warn* when family = "gaussian":
     they should rather use  lmrob() !!
    ==> ~/R/MM/Pkg-ex/robustbase/glmrob_ChrSchoetz-ex.R

** TODO when lmrob.S() detects exact fit (in C code), it *should* return it, incl scale = 0
    *and* it should have correct rweights[] in {0,1} and residuals[];
    (fitted[] computed in R code after .C() call);
    ==> ~/R/MM/Pkg-ex/robustbase/ThMang_lmrob.R
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and >>> tests/subsample.R (bottom) <<<-------

*** in src/lmrob.c: Currently there *two* such situations the first explicit when
      #{zero resid} >=  ~ (n + p)/2 {--> theory of *modified* MAD
      ==> ~/R/MM/STATISTICS/robust/tmad.R
*** lmrob() calling lmrob..M.. after that should "work", optionally/always ?? using a non-zero scale,
    ------- it might use Martin's tmad() {trimmed mean of absolute deviations from the median}
** DONE print(<covMcd>) and print(summary(<covMcd>)): 'method' almost twice; show iBest
   for "deterministic"
** DONE covMcd(): allow 'hsets.ini' argument to covMcd(), and *return* them (optionally) as 'Hsubsets'
** DONE export rankMM, classSVD, .signflip [MM: repmat should not be needed]
** DONE 'scalefn' argument for covMcd() to be used for "detMCD"
** DONE adjOutlyingness():
*** DONE change defaults for clower & cupper (E-mail from P.Segaert).
*** DONE But there is more: +/- swap ==> results not back compatible
** DONE colMedians() -> ask Henrik/ <maintainer@bioconductor.org> about "License: Artistic-2.0"
** DONE splitFrame() [important for lmrob(.. method = "M-S") --> lmrob.M.S()]: *character* should be treated *as* factors
** TODO nlrob()
*** TODO summary(nlrob(*)) fails for new methods; better error message or *work*
**** TODO for "MM" we are close to done; ideally want '$ rweights' (robustness weights) for all meth
*** TODO residuals( nlrob(), type = "...") should provide types "as in the literature"
*** DONE nlrob(*, method = "...") should call methods "tau", "CM", "MTL", "MM"
    by Eduardo Conceicao
**** DONE shouldn't we rename jde() to jdeopt() or even jdeoptim(), jDEoptim(), or JDEoptim()
R users already know optim() etc.. so the name seems more logical for them.
* Short Term
** DONE lmrob.tau.fast.coefs() calls `coef(lmrob( <y> ~ <x> -1))` ... should use `lmrob.fit()` !
** TODO lmrob.S() -> lmrob.control() should get nResample = "exact" -- as covMcd() / ltsReg() :
***    use Fortran routines rfncomb() and rfgenp() [in src/rf-common.f -> need F77_name(.) / F77_CALL(.)] from C
****    or even just right a version for C indices in {0, 1, ..., p-1}  instead of Fortran {1, 2, ...., p}
** DONE sc.1) Qn(), Sn()  should work with 'NA' (via 'na.rm=FALSE' argument, as mad())
** DONE sc.2) Qn(), Sn()  should work with Inf and "large x", see example(mc)
** DONE sc.2) mc() now works better with "large x", see example(mc) --- BSc thesis by Lukas Graz
** DONE fixed: mc(x) can *fail* to converge: thesis Lukas Graz {and ~/R/MM/Pkg-ex/robust/Robnik-mc.R }
** DONE Now that mc() works, also define lmc() and rmc() (left and right tail measures).
** TODO Try  scaleTau2(*, iterate=TRUE) or 'iter=TRUE' (which can have 'iter = 10' etc)
** TODO [Peter Filzmoser, Geneva 2016-07-07 talk]: covMcd() warns when n < 2*p .. should not *warn* but give message()
** TODO [Peter Filzmoser, Geneva 2016-07-07 talk]: solve.default(getCov(mcd)) error  with CovControlOgk() init
*** INSTEAD it should report the (theoretical) breakdown point (p ...) / (n - h ... ) [from the MCD theory]
** TODO estimethod(): also for lmrob() and glmrob() models
** TODO VT implement .detMcd() in C
** TODO r6pack(milk, ..): *return* singularity message but do not signal an error
** TODO nlrob(*):
*** TODO for the "vector parameter" biomass example in tests/nlrob-tst.R:  method = "MM"
    As we do want the formula to work ==> we *must* allow 'lower' & 'upper' as list()s
    in R/nlregrob.R, have   14 matches for "eval *( *formula\[\[3L?"
    ((and *org* shows the `[[3L].]` (no ".") as underscored 3L)) :
       123:                   y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       127:                   y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       141:        y.hat <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       175:    res <- y - eval( formula[[3L]], c(data, initial$par) )
       193:    fit <- eval( formula[[3L]], c(data, coef) )
       254:        fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       300:    fit <- eval( formula[[3L]], c(data, coef) )
       355:	    fit <- eval( formula[[3L]], c(data, par) )
       361:	    fit <- eval( formula[[3L]], c(data, par) )
       366:	    fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       390:    fit <- eval( formula[[3L]], c(data, coef) )
       434:	    fit <- eval( formula[[3L]], c(data, par) )
       442:	    fit <- eval( formula[[3L]], c(data, setNames(par, pnames)) )
       468:    fit <- eval( formula[[3L]], c(data, coef) )
    the same as in R/nlrob.R where we had  eval(.., c(data, coef)) but now eval(.., c(data, start))
*** TODO nlrob(*, method=.) -- try at least *one* other optimizer than JDEoptim(),
 since we provide already most of the needed "hooks".
*** DONE nlrob(*, method="M"): allow a "fixed initial sigma" (--> inference for "MM")
*** TODO confint.nlrob():, "Wald" works; naming follows confint.lmer() from lme4
**** TODO  method = "profile" based on (TODO) profile.nlrob()
**** DONE  method = "Wald" works
** TODO simulate() : implement for "nlrob" {and lmrob() ? and ..}
** glmrob
*** BYlogreg()  [ R/BYlogreg.R ]
    --> more tests in ./tests/glmrob-1.R
    --> glm.fit() instead of glm()
    --> vcov() instead of just std.err. {is already there}
*** glmrob(*, weights.on.x = "robCov")  uses  MASS::cov.rob(),
   i.e. "MVE" and Andreas had a comment that "mcd" is worse.
   "covMcd" has been available for a while; now via robXweights() in ./R/glmrobMqle.R
   HOWEVER: Need something better when 'X' has (binary!) factors!
   "hat" +- works, but needs more work
*** We now allow  weights.on.x  to be an arbitrary general  wts(X, intercept)
   function _or_ a list containing a  robMcd()-like function.
   Definitely need *testing* this situation!
*** glmrob(<Gamma>):  anova() has  three variants: "V1", "Eva1", "Andreas1"
    --> ./R/glmrobMqle-DQD.R
    - gives warning every time {-> easy to fix}
    -  Default is "V1" is that a good idea?

*** glmrob() needs a bit more tests in ./tests/
            [also consider those from man/glmrob.Rd]
    take those from Martin's old 'robGLM1' package (need more!)
***  --> first test already shows that Martin's tests for "huberC == Inf"
       were *not* yet moved from robGLM1 to glmrob()...
    (in other words:  glmrob() should work

*** also, ni = 0 does not work quite as it should ( ./tests/binom-ni-small.R )

*** obj $ df ...  maybe should be defined -- for "glm" methods to be
   applicable  --> e.g. for predict(<glmrob>, interval="..") !

*** summary.glmrob() should be better documented;
   we should decide if the current return value is fine.

*** Eva's code (and MM's) also computed & returned the "asymptotic efficiency"!

*** anova.glmrob(): More modularization, allowing to provide own 'test' function.
   Test if Huber's C are different. Need theory to compare different C's and
   same model (which includes classical vs robust).

*** add1() and/or drop1() would be nice

** TODO scaleTau2():  Also do a cheap finite-sample correction [MM] !
   [DONE partly; but undocumented, since bound to change -->
   file:~/R/MM/STATISTICS/robust/1d-scale.R , 1d-scale-sim.R, etc --- unfinished!!

** TODO Psi/Rho/Chi/Wgt Functions
    We have quite a few "partial" collections of rho/psi functions;
    some are "sync"ed now, some not yet::
*** TODO 1) have the nice S4 class psi_func  + psiFunc() and .defDwgt() functions
    in file:R/psi-rho-funs.R  with further explorations, ideas in
    file:misc/experi-psi-rho-funs.R
**** TODO print/show of such psi_func should show more; at least the psi function
**** TODO str.psi_func() should be a bit nicer than the current default str()
**** TODO nlrob(): also allow psi to be a 'psiFunc':
       --> ./R/nlrob.R ; consider even more *real* checks; now in tests/nlrob-tst.R
*** DONE 2) deprecated: "old" tukeyChi() & tukeyPsi1()  originally called from lmrob() , in
       ./R/biweight-funs.R
*** DONE 3) psi.*(....,  rho = FALSE/TRUE)   functions from Andreas
       ([[file:.R/psi-funs-AR.R]]) replaced by using the new  psi_func  objects
**** DONE nlrob() changed: uses psi = .Mwgt.psi1("huber", cc=1.345) as default
*** TODO 4) have (C-based) functions Mpsi(), Mchi(), Mwgt(), etc, used from lmrob(),
       in ./R/lmrob.MM.R
**** TODO provide Mpsi(psi = "t") etc; tuning parameter: 'nu' => MLE-t_\nu psi and psi'
**** TODO provide '1)'-i.e. psi_func  versions of the  Mpsi() etc
**** TODO Mpsi(*, "GGW") etc :  have no (??) easy way to directly specify (a,b,c) tuning pars
**** TODO *New* Mwgt(*,  deriv=1) would correspond to Dwgt in psiFunc() which Manuel needs
**** DONE now exported and documented in man/M.psi.Rd
     Further files, illustrating features, differences, etc:
	 ./vignettes/psi_functions.Rnw -- with quite a few FIXME
	  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	 ./inst/xtraR/plot-psiFun.R	chkPsiDeriv() {and plot utils}
	 ./tests/psi-rho-etc.R   	compute asymp.efficiency and breakdown point !
	 ./tests/lmrob-psifns.R 	   	plot and lmrob()-test them

**** DONE Deprecate* the "2)", tukeyChi() etc, making them call the M.*fun():

* Mid Term
** TODO R/lmrob.MM.R: Using lmrob.E( * )
*** TODO: numerically clearly better than these -- e.g. in summary.lmrob(): use integrate(), or
*** ---- allow to specify 'use.integrate=TRUE, tol = 1e-10' etc for *accurate* cor.factors
*** "hampel", "bisquare", "lqq" (polyn!): derive *exact* formula; others maybe, too <==> psi_func objects above?
** TODO New  lmrob() features ("M-S" as option?):
*** Function names with "." (which are exported) are frowned upon e.g.  lmrob.split()
*** checking .vcov.avar1() and its "posdefify" options [but "KS201x" uses .vcov.w() anyway]
*** TODO lmrob.mar() [file:inst/doc/estimating.functions.R]: Maronna & Yohai (2010) should
         ~~~~~~~~~~  become part of robustbase, maybe under a better name,
    e.g. via lmrob( ... control ..) or directly.
    It is much used in the simulations of Koller & Stahel (2011)
*** TODO Provide "simple M" estimator [so MASS :: rlm() is entirely superseeded]
    Consider lmrob(*,  method = "M") --> default init = "ls" (Least Sq; as MASS:::rlm.default)
    which calls  lmrob..M..fit() which is already documented as "simple"
    M-estimator (though the *scale* is kept fixed; i.e., no  'proposal 2').
** TODO glmrob(), glmrobMqle(), etc : expression()s and eval() no longer "satisfactory",
   e.g., see FIXME in ./R/glmrobMqle.R
** TODO covMcd(): pass k3 as argument; default=current ==> allow "formula"  k3 = k3(n,p) !!
** covOGK():
   The argument name 'weight.fn' is pretty ugly and the default function
   name 'hard.rejection()' is just awful (we need a globally available
   function as 'role model'.
   - Could allow 'n.iter = 0' to simply compute Cov()_{ij} = rcov(X_i, X_j)
** rrcov etc
*** rrcov.control() __ NEEDS  name change ! ______
    probably use  mcd.control() and lts.control()

    or forget about *control() completely?
    since there are only a few in each ??????/

*** TODO tolellipse() --> renamed to tolEllipsePlot()
**** maybe use  cluster::ellipsoidPoints()
**** allow other percentiles than just  97.5%
**** maybe *return* something

*** plot(mcd. ) [ R/covPlot.R ] : should show the call
	      Default for 'ask' should be smarter: depend on
	      prod(par("mfrow")) < #{plots} (which depends on 'classic' and p=2)
*** ltsReg():  has  undocumented '$resid'
	      in addition to '$residuals' and '$raw.residuals';
	      drop it or document it !
** More lmrob() considerations
*** DONE more tests in	tests/

*** fully implement and test the multivariate case (y = matrix with > 1 col.)
*** src/lmrob.c :
     does median() , MAD() instead of using R's  sort() routines

* Long Term / Maybe
** inst/doc/lmrob_simulation.Rnw :
*** use hyperlinks {e.g. using jss docu.class!}
*** consider making parts available in (new) ./demo/lmrob...R
*** tau_i (p.5)  is not clear for  Joe Average.

..........................................

** Generalizing  'wgt.himedian':  We'd want a C API on which R builds.
    There are pure R implementations:
     - 'weighted.median()' in limma
    	  and I have generalized it ---> file:inst/xtraR/ex-funs.R
     - more general code (different 'tie' strategies; weighted *quantile*s)
       in file:/u/maechler/R/MM/STATISTICS/robust/weighted-median.R
     - The 'Hmisc' package has wtd.quantile()
** Miscellaneous
*** Alternative version of covOGK() for correlation-only
   using's Huber's correlation formula which ensures [-1,1] range
   --> ~/R/MM/Pkg-ex/robustbase/robcorgroesser1.R
   and ~/R/MM/STATISTICS/robust/pairwise-new.R

*** package 'riv' (author @ epfl.ch!) has 'slc()'  ~=  cov.S(.)  -- in pure R code
   doesn't Valentin have a version too?
   otherwise: test this, ask author for "donation" to robustbase

*** adjOutlyingness() :
**** typo-bug is corrected; and I have made it more pretty.
     Still a bit problematic when denominator = 0
     Currently leave away all the c/0 = Inf and 0/0 = NaN values.

     MM: Maybe, it's the fact that the   coef = 1.5  should really depend on
         the sample size  n   and will be too large for small n (??)
   --> should ask Mia and maybe Guy Brys
**** For really small (n,p):  Taking 250 random samples of size p; is non-sense when choose(n,p) <= 250
  Rather then, take *all* sub-samples of size p ==> getting a non-random result.

*** Add data sets from the MMY-book -- mostly done {do we have *all* ?}

*** Data Sets --- Valentin Todorov has several of Rousseeuw's in the 'rrov' package
    (and promised me "the rest" when needed)
   Don't like the *.x, *.y  sub datasets: They shouldn't be needed when use a *formula*
   In his lts tests, he uses these "data sets from the literature":
   (Note that 'stackloss' is already in "datasets") :
    heart.x,heart.y,	   data(heart)
    stars.x,stars.y,	   data(stars)
    phosphor.x,phosphor.y, data(phosphor)
    stack.x,stack.loss,	   data(stackloss)
    coleman.x,coleman.y,   data(coleman)
    salinity.x,salinity.y, data(salinity)
    aircraft.x,aircraft.y, data(aircraft)
    delivery.x,delivery.y, data(delivery)
    wood.x,wood.y,	   data(wood)
    hbk.x,hbk.y,	   data(hbk)