1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375
|
This file is large. Many a thought here has come to a quiet rest.
Please do not disrupt the peace.
=============================================================================
mcl loop.mci -L 10 -c 0
results in perceived convergence, after which cluster interpretation
fails miserably, so much so that it tries to alloc a
3x0 matrix, which crashes.
add extra 8532c levels (given Thomas'es experiences)?
add
(mclclusterinfo
..
)
to output clusterings.
http://www.wkap.nl/journalhome.htm/0167-8019
How do to do assertions. just use them?
======================================================================
::mcl::util::lib
o double/real
o collapse the duplicate recovery cod3? there are pros and cons.
con is that current setup seems easier for keeping track of things.
o make simple distribution, with one script moving makefiles etc.
o sparse matrix storage (using col and row characteristic vector).
o go from general representation to canonical representation (0-N-1, 0-M-1),
return permutations needed.
o make poolWalk routine.
o mclMatrixBinary probably refuses to write negative numbers. Remove.
o mcx interface to pruning routines. right now I have .. trim primitive.
? with -fkeep-inlines, can I use function pointers (gnu specific anyway)?
o impala io should consider ON_FAIL directive.
o make fibonacci like pool growth algorithm.
o mcx-ize taurus (renaming), cleaning up a lot in taurus.
o streamOpen contains raw exits but should consider ON_FAIL directive.
o Current hash interface is absent/dangerous.
(load is ok when touched (except when negative!!), others not).
o do something options string like (??) -- all the sprintf stuff
is very irky, so perhaps think of something else.
o search for memory leaks in impala,mcl libs.
o registering streams like verbose, track, log etc.
o check header files includes.
o make arguments const where appropriate.
o make functions static where appropriate.
o wrap hard mallocs and reallocs.
o insert mcxiostreamopenfailure calls where appropriate.
o propagate hard exit's in mcxIOstream library where appropriate.
o audit vector.[ch] on NULL consistency (should be ok).
? enable precise memory allocation in txt.
? replace mcxTingAlert by mcxTingExit vararg function.
===============================================================================
la.c:genclustering speaks of gargabe cluster vector. Ughh.
(?) IOReadInteger should return bool I think, it could have failed.
(?) IO stuff -> mclReadInteger.
(?) remove exits from ilist.c (e.g. ilinvert -> return NULL for fail).
mcxIO section. streamOpen contains raw exits but should consider
ON_FAIL directive. Different modes of failure seem to be present.
Introduce them in file.h?
mcxVectorWriteAscii
mcxVectorDump
not yet ported to mcxIOstream and ACTION_ON_FAIL.
dump interface is ugly as a nightmare in hell (or would that
be dreaming of heaven?). Remedy?
mempool growing scheme:
1 1 2 3 5 8 13 21 34 55 89
1 2 4 7 12 20 33 54 88 143
8 4 6 9 14 20 30
8 12 18 27 41 61 90
all the statics in shmcl/mcl.c
at various places, if I use maxval, must consider using maxabsval
instead (if e.g. used for printing).
readFile can do a system call to find out about the size of the
file. (if its not stdin). mmm.
is it ok to make only release and init routines take void arg?
can make void free routines as inlines around true free routines.
option strings parsing. return values probably mcxTing.
or other solution. Perhaps sth with hashes, you know.
insert mcxIOstreamOpenFailure(stderr, "mcxMatrixMaskedRead", xfIn)
etc in the right places.
; if (!xfIn->fp && mcxIOstreamOpen(xfIn, ON_FAIL) != STATUS_OK)
{ mcxIOstreamOpenFailure(stderr, "mcxMatrixMaskedRead", xfIn)
; return NULL
; }
Cool idiom (consider ON_FAIL is EXIT_ON_FAIL).
Check all ON_FAIL instances, see whether idiom can be shortened.
mcx-ize taurus library. Dust on this section, inspection severly
needed. consistency list==NULL, n=0. permutation functionality.
general audit.
ilStore very strange functionality. ilResize funnily
implemented. Resize should probably not try to be smart and look at
sizes. It's not like mcxTingShrink & Ensure, it's like vectorResize.
ilCon should be ilAppend should wrap mcxSplice, and so should ilInsert
etcetera.
check whether ascii input column is not sorted or contains duplicates.
Only in those cases sort and merge.
behaviour of kbarselect when requested nr is larger than vec size.
it currently returns a bar of -1.0.
being able to read in a tagged matrix. requires probably just a
branch in mcxmatrixreadascii (or whatever it is called).
implement mcxVectorFromIlist and mcxIlistFromVector?
I do not want mutual dependencies in the archives I believe.
Should both go in a separate archive, which must always be listed
before the other two?
Right now I will just put them in taurus/la.[ch]
mcxIOstreamRewind suggests something it does not (it only resets
counters to initial state). perhaps add wrappers around fseek and
ftell that preserve mcxIOstream counter state.
(?) AllocZero(N_cols, N_rows) interface (change arguments)
Submatrix
MatrixComplete
MatrixPermute?
mcxMatrixDiag should take vector as argument.
Did remove ugly 0,0,0,0 (defining window) from mcxMatrixNrofEntries,
because we now have mcxSubMatrixNrofEntries.
Ugly windows are still present in mcxMatrixList and mcxVectorList
taurus/la.c idxShare. Strange location.
define mcxTingAppend in terms of mcxTingNAppend (really?)
mcxMatrixWriteTable
Default ivp indices will not be printed, vector indices problably will
indeed. No EOV character. Column separator optionally yes.
mcltype table?
Intention: create something that is easily parseable for unknown purposes.
how big can inhomogeneity get? If it is N times some maximum then it could
get very big if there is a column (0.3,0.7). (-> 0.7 -0.58 = 0.12)
If vector I/O is ever seriously re-implemented,
consider making them matrices. (or do I want to be implicit about range?)
......................................................................
;;thoughts;;
vectorBinary and other vector routines, confusion over src en dst argument.
There seems to be a split between create routines and input/output
routines, which is perhaps fine.
implement mcxbool's where used. tiresome.
(DONE) util/types.h, not iface.h
Should I also prefix my files? There are other unprefixed types out
there I believe. Should I prefix EXIT_ON_FAIl etcetera like mcxFALSE
and mcxTRUE? the mcxTRUE looks stupid, I do this because of potential
crosslinking difficulties. Dumb?
What about doing setMerge, setMinus, setMeet also for integer lists.
Does this mean I should do it in general?
If there are two ordered lists of the same type which have
a mapping property like mcxVector this can be useful.
Requires comparison test on the mapping domain.
and a binary function on the mapping range.
There is a reason why this is overkill. Thought of it the other day.
using an ivp pool for matrices. (means cols can no longer be
alloc'ed). Difficult: mclMatrixVectorCompose writes directly
in destination vector, and reallocs its ivps (vectorInstantiate).
changing member name 'list' of ilist to 'ints'
ilRange(left, right), ilInterval(left, right) {non-inclusive}
Some standard notation for
*addresses of variables containing pointers to memory on the heap*
The variables themselves are usually structure members.
mempptr currently. memptraddr would be ugly
readascii contains/contained check on return value of instantiate.
Sigh, errorchecking, how and why.
I tend to do checking wheneverafter interaction with the environment,
mainly IO, not malloc.
xfStdout
xfStderr
xfStdin
xfVerbose
xfTrack
in util/iface.[ch]
pooling, registering, managing, whatever. Think of something clueful.
rethink structure mcl section. dpsd, interpret, clm. some legacy
routines in there.
Which routines take the name of their caller as argument?
Right now it is difficult to work with induced submatrices; these can
only exist as the full thing, this is a pity of memory and results
in things like a lot of singleton clusters if you start clustering
the induced subgraph.
It may be an idea to include two maps (ilist permutations) in the
definition of a matrix, resulting in the idea of a mapped matrix. This
requires also members mapped_N_rows en mapped_N_cols.
Which routines suffer from complexity? Compose routines for sure ..
Add routines also. Vector compose routine.
Idx inquiry, but this is simple.
mcxIOstream: wrap bc,lc,lo in struct. introduce data/text dichotomy.
VectorCountCmpBar should use bsearch with idxcmp argument.
This option is currently nowhere used. Needs idx sorted cols then.
......................................................................
;;audit;;
Check vector.[ch] on the condition n_ivps==0 and ivps==NULL.
Check ilist.[ch] idem.
Check code voor malloc(0) (if semantics ok change to rqRealloc(0)).
What happens when vectorresize(0)?
Allow this everywhere.
Sum of a vector with ivps==NULL is simply 0.0, this is consistent
with the MCL sparse matrix/vector paradigm.
At some places a condition like while(ivp<ivpmax).
Translates to while(0<0) according to the standard.
so I guess this is ok. Still scrutinize for trying to access
ivps->0
mcxVectorCreate(0) induces rqRealloc(0) call, and that is ok.
Redundant and missing includes in header files and source files.
What about these .a libraries, it seems you can't have mututal
dependencies (remember having this problem once).
return values and interaction of routines in impala/io.c, parse.c.
==============================================================================
=done
noted
mcxIOstream now only accepts `-' as token for both stdin and stdout.
Name of stream is changed appropriately only after a call to Open,
meaning that Open sort of has a delayed side effect relative to New.
If a file name is inquired inbetween these two calls, the result
is inconsistent. Inquirers for file names do have to think of the
`std stream' possibility though, so maybe this is not entirely an
issue.
Of course, inquiry should be a method in this case, and callers
should in critical cases not poll the streamname directly.
considered
mcxTingEnsure returns new mcxTing struct if given a null mcxTing.
Possibility that callers forget to use return value when passing
text argument that equals NULL is quite real.
Solution: define wrapper for functionality "gimme a new txt with that
capacity". Name? mcxTingNewCapacity
Did this, then removed it. Confusing that NewCapacity also takes
a txt as argument. Had something to do with EmptyString I believe,
which now uses mcxTingEnsure, which is a lousy name also by the way.
considered
giving mcxBuf structure a pointer to the number of elements
currently in use. This should always be possible, as caller
must keep track of this.
Did not do this because .. the advantages are not clear.
What if people accidentally fiddle with it?
Right now finalization yields the final count.
@
vectorGetNrofEntries
overlappende functionaliteit met vectorCountCmpBar,
behalve dat de laatste alleen halfopen bereiken
accepteert.
Misschien moet vectorSelectCmpBar een vectorSelectCmpBars
worden, met een cmp die drie argumenten neemt.
Mogelijkheid om een `window' uit te snijden.
Alleen, hoe vermijd je dat het een wildgroei aan cmps wordt?
==============================================================================
=totodo (old todo's not yet checked and converted to new format)
warning_guts
_E_GutsShouldNotHappen
een warning in de low level routines
die eigenlijk al eerder afgevangen had moeten worden.
NULL argumenten vallen hier in principe niet onder?
WARNING_NEWCODE
_E_NewCodeShouldNotHappen
(safety checks voor nieuwe code die nog onder veel
berekeningspaden getest moet worden)
WARNING_IFACE_VIOLATION
_E_NullArgument
_E_ArgumentOutOfDomain
_E_ArgumentsConflict
@
inhomogeneity is nu hardwired 0.001e
haak.
@
-C 1:1 analoog aan -A 1:1
Vergt dat mclCenterMatrix een mxDiag als argument neemt.
-A en -C vlaggen zijn overigens onhandig, want dit wil je niet
grootschalig op de cline doen. Hier zou eigenlijk een input vector
als argument moeten fungeren.
-a 1 -A 3:10 zou ik de rechter waarde willen laten prevaleren.
Is niet helemaal logisch te definieren, aangezien functie van waarde 0.0
onduidelijk is: Wat betekent -A 0:0 en hoe te onderscheiden van
*niet* gespecificeerde -A x:? waardes.
@
mclCluster verwijderd input matrix uit geheugen.
In toekomstige scenario's niet ok.
@
Dan: -dump cls losweken van de mclVerbose vlag.
(?)
@
ascii interface voor permutaties?
(mclheader
mcltype permutation
)
(mclpermutation
0 1 2 3 4 5
)
======================================================================
|