1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557
|
num-utils - Goals for this project
-----------------------------------
(NOTE: This document does not represent a currrent feature set, please read
the README file and man pages instead.)
This document outlines certain criteria that the num-utils should eventually
meet. It outlines a general interface that each utility should follow and
guides the project to bring the num-utils to version 1.0
The initial releases of the num-utils will be small and much less featured
than what is shown below. I originally wrote the num-utils with many of
the features shown below, but everything was rather unorganized and some of
the programs simply processed the data incorrectly. I decided that I would
go back to the beginning, write down all the features that I would like to
have and think are useful, and create a development cycle. I'll release
each version to the public for review, comments and contribution. Anyone who
is interested in helping out in any way, please contact me at suso@suso.org.
Any contributions are welcome.
I've put an 'X' at the beginning of the line for features that have been
completeled.
-= All numeric utilities =-
1. All num utils should process data on either STDIN or from a file or files
specified from command line arguments.
Examples of STDIN usage:
cat file | num-util
num-util < file
num-util
[Program will wait for data on STDIN until Ctrl-D is pressed]
Examples of file argument usage:
num-util file
num-util file1 file2 file3 ... fileN
num-util -[options] file1 file2 file3
2. All num utils should use the following options that are standard among
all the utilities:
X -h -- provide helpful usage information
X -V -- Be verbose. Send verbose information to STDERR.
X -d -- Debug information for developers. This implies verbose and more.
-q -- Quiet mode, Don't print out any warnings about file read errors.
-i -- Output numeric values as their integer only.
-I -- Output numeric values as their decimal portion only.
-a -- Don't treat '-' characters as negative signs.
Think of -a as 'absolute value'.
-d -- Don't treat the '.' character as a decimal point.
-N -- Don't process complete numbers, process each numeric character
individually.
-C -- Commify numbers in output. Instead of printing out '1000000', it
should print out '1,000,000'.
-b <n>
-- Use the following base <n> when dealing with numbers.
-- -- Don't process any more arguments as options.
3. All num utils should provide user documentation in the form of man pages.
- While the programs are written in perl I plan on using POD information in
the programs themselves and then that documentation can be exported to man
pages, info docs or HTML format.
- Eventually I would like to write the documentation in SGML so that it can
be easily converted into many different formats.
4. All num utils should provide summarized help with the -h option simular to
the following:
----------------------------------------------
numutil: process numbers in such a way......
----------------------------------------------
Usage: numutil [options] [file args]
STDIN | numutil [options]
numutil [options] < file
Options:
-h Help: You're looking at it.
-V Increase verbosity.
-d Don't treat the '.' character as a decimal point.
5. All numeric utilities should abstract their functionality as much as
possible so that a numeric processing module can be written to make things
consistent among all numeric utility programs.
6. The num-utils set of utilities should be available in the following forms
for download:
o tar format, both gz and bz2
o rpm package
o gentoo ebuild tree
o deb package
7. The Makefile should be setup to be able to create the different package
formats listed above as well as the man pages from the perldoc information
in each utility's source code.
8. Numeric expressions (regular expressions, plus operations for numbers)
.. -- range operator
i -- increment operator
f -- factor operator
m -- multiple operator
, -- expression separation separator
[] -- character grouping
+ -- quantifier (1 or more)
* -- quantifier (0 or more)
? -- Match the preceding character 0 or 1 times.
{} -- for matching specific number of times.
-- Some goals sent in by areiner@tph.tuwien.ac.at (thanks) --
- Support for locale changes between . and , usage. In other parts of
the world they use a period in the place of a decimal point
(ie. 1.000.000,50 means one million and 5 tenths).
- Some of the utilities should be able to recurse through subdirectories,
where appropriate.
- Support for output and input of numbers in scientific noation.
(ie. 1.3e-7 for 0.0000013)
- A utility for sorting numbers in mixed notations. For instance, it
is hard to sort numbers that are in scientific notation or commified
along side ones that aren't. Maybe this is a call for a numsort utility.
-= Individual program goals =-
-- numsum --
This program adds up all numbers encountered. In it's basic operation, it
will simply add up all the numbers that it encounters. The numbers can be one
per line, multiple per line separated by space or multiple numbers separated
by anything.
Eventually, it would be nice if numsum could do things like sum up individual
rows or columns of input separately. There might be other types of summing
functions that would be useful when dealing with textual input.
Examples
$ cat numbers
1
2
3
4
$ cat numbers | numsum
10
$ numsum numbers
10
Advanced Examples
$ cat columns
1 6 11 16 21
2 7 12 17 22
3 8 13 18 23
4 9 14 19 24
5 10 15 20 25
$ cat columns | numsum -c
15 40 65 90 115
$ cat columns | numsum -r (add up the rows)
55
60
65
70
75
$
Usage options (in addition to the standard options)
-a -- Add all numbers in the file, not just the first ones found on each
line.
X -c -- Treat each line as a set of columns separated by white space, or a
string if the -s option is used. Sum up the values in each column
and print out the result of each column separated by the separation
character. This is shown above in the advanced examples section.
X -s <string> [for columns]
-- Use <string> as the separator between each column. This is allowed
to be more than one character and possibly even a number.
X -r -- Treat each line (by default) as a row of numbers to sum up. The
results of the sums of each row will be printed on separated lines.
-s <string> [for rows]
-- When used with the -r flag, this will specify the separator for
rows. By default it is the new line character. It could be a
character, set of characters or even a number.
-t <x>
-- When used, it will add up every <x>th number and at the end
print out <x> rows showing the sums of each <x>th number. This
would be useful for example if you had data from each day of the
month and you wanted to sum up the date for each week day.
So you might do something like this
$ cat monthly-data | numsum -t 7
Options that might be included eventually.
X -x <n> -- Where <n> is some number. This would be a shortcut for adding up
all the numbers in the <n>th column of the input. By default, the
columns would be determined by white space, but could also be
determined by the -s flag. This must be used with the -c or -r flag.
So you can do something like:
$ numsum -c -x 10 access_log
To get the total bytes transferred in an access_log.
Maybe this could also be able to handle comma separated values, so
1,5,10 would sum up the 1st, 5th and 10th columns.
-- numgrep --
This program is the numeric equivilent of the unix grep utility. numgrep
will search textual input for numbers matching the expression specified from
the command line. The main power of numgrep is in being able to search for
ranges of numbers. Such as searching for all numbers between 1 and 100.
Normal unix grep and regular expressions would not allow you to do this simple
task, but with numgrep's numeric matching expressions it is possible to match
numbers in ways not previously possible.
A few examples of numgrep's usage:
X o Search for all numbers between 1 and 100 in the file data.txt.
numgrep /1..100/ data.txt
X o search for numbers between 1 and 37,even numbers between 50 and 58 as
well as the numbers 79, 86 and 94.
numgrep /1..37,5[2468],79,86,94/ data.txt
X o search for numbers from -10 to 10.
numgrep /-10..10/ data.txt
X o search for numbers that are multiples of 7
numgrep /m7/ data.txt
o search for numbers that are multiples of 7, 12 and 22
numgrep /m7m12m22/ data.txt
o search for numbers that are factors of 1024 and multiples of 12 or numbers
that are factors of 2333 and multiples of 9
numgrep /f1024m12,f2333m9/ data.txt
o search for numbers that are in the set 1, 4, 7, 10, 13 and 16
numgrep /1..16i3/ data.txt
Usage options (in addition to the standard options)
-l -- Instead of keeping the numbers in their textual context, print
them out one number per line.
-R -- Inverse the sense of matching. To match non-matching numbers
-r -- Recurse subdirectories.
-p <file>
-- Obtain the patterns from file <file>
-f -- Suppress normal output and just print the names of the files
that contain a match one per line.
-F -- Suppress normal output and print the names of the files for which
no output would have been printed.
-n -- Print the line number in front of each line that contains a match.
-c -- Don't save non-numeric context. This will cause numgrep to dump all
non-matching/non-numeric values around the numbers that match.
Eventual options
-A [n] -- Print [n] lines of context out after the matching line.
The default is 2.
-B [n] -- Print [n] lines of context out before the matching line.
The default is 2.
Feature submitted by areiner@tph.tuwien.ac.at:
- Intervals: I may be biased due to my work, but usually when you
determine a quantity with only finite precision, people write
something like 1.234 +/- 0.056 or 1.234(56) (the last form is more
common, at least in physics journals; the interpretation is that the
digits in brackets are taken to line up with the last digit shown,
left-padded with zeroes and a decimal point in the correct place). The
natural representation of both of these is as intervals, i.e. as sets
of two numbers representing lower and upper bounds. Thus, the given
number should be turned into something like 1.178..1.290, and there
should be a conversion function that turns it back into either of the
other two.
-- numaverage --
This program finds the average of all numbers encountered. By default it will
find the mean average of all the numbers. Meaning (;-) that it will add up all
the values and divide that sum by the number of values encountered. It should
eventually offer more sophisticated calculations like finding the median and
mode values.
Examples of usage:
$ cat numbers
4
8
9
20
99
$ numaverage numbers
28
Usage options:
X -m -- Print out the mode value of all the numbers entered. The mode is
the most frequently occuring value in the set.
X -M -- Print out the median of the set of numbers entered. The median is
the middle value all numbers encountered. So if the numbers
88, 12, 2, 1, 9, 100 and 1000 are encountered, the median of that
set is 12. Illustrated:
1 2 9 12 88 100 1000
^^
X -l -- Use the lower number of the median on even counted sets.
-a -- average all numbers in the file, not just the first ones found on
each line.
-c -- Treat each line as a set of columns separated by white space, or a
string if the -s option is used. Average the values in each column
and print out the result of each column separated by the separation
character. This is shown above in the advanced examples section.
-s <string> [for columns]
-- Use <string> as the separator between each column. This is allowed
to be more than one character and possibly even a number.
-r -- Treat each line (by default) as a row of numbers to sum up. The
results of the averages of each row will be printed on separate
lines.
-s <string> [for rows]
-- When used with the -r flag, this will specify the separator for
rows. By default it is the new line character. It could be a
character, set of characters or even a number.
Options that might be included eventually.
-n <n> -- Where <n> is some number. This would be a shortcut for averaging
all the numbers in the <n>th column of the input. By default, the
columns would be determined by white space, but could also be
determined by the -s flag. This must be used with the -c or -r flags.
Usage options (in addition to the standard options)
-- numnormalize --
This program will distribute a group of numbers between 0 and 1 by default according
to their initial value. You can change the range using the -R option.
Usage options:
X -R <range> -- This is for specifying a range to normalize for instead of 0..1
-- numround --
This utility will round each number encountered up or down depending on it's
decimal value or it's relation to a number. It will probably also deal with
finding the floor or ceiling values of numbers. It should also be able to
round to factors of certain numbers.
Options:
X -n <n> -- Round to the nearest factor of <n>. Instead of just rounding all
decimal numbers, you can also round to a factor of any number. So
if you set <n> to 1000 and you encounter the number 6777, it will
round that number to 7000. If you set <n> to 3 and encounter the
number 7, it will round it to 6.
X -c -- Find the ceiling of each number encountered. Round up.
X -f -- Find the floor of each number. Round down.
-- numrange --
Print out a range of numbers for use in for loops and such.
o Do zero or character padding with the -p option. So 1 becomes 001 if the
range has an upper limit of 3 digits.
o Accept ranges in the following formats:
n1..n2 (ex. 1..100 ; All integers from 1 to 100)
n1..n2,n3..n4 (ex. 1..10,50..100 ; All integers from 1 to 10 and
from 50 to 100)
n1..n2i2 (ex. 2..20i2 ; All even numbers from 2 to 20)
(ex. 1..19i2 ; All odd numbers from 1 to 19)
(ex. 3..21i3 ; 3, 6, 9, 12, 15, 18, 21)
n1.d..n2.di0.1
(ex. 1.1..2.0i0.1 ; 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8,
1.9, 2.0)
Or any combination of the above.
Usage options (in addition to the standard options)
-p <prefix> -- Put the following prefix before each number.
-s <suffix> -- Put the following suffix after each number.
X -n <separator> -- Use <separator> as a character to separate each number.
By default, a space is used. Use the sequence \n to specify
a newline.
X -N -- Shortcut for using a newline separator.
X -e <set>
-- Exclude the numbers in <set> from the output. This is so that if
you want to do a complex range without including certain numbers.
<set> is a list of numbers separated by a ','.
-- numrandom --
Print out a random number or numbers. This program will take a range of
numbers much like the numrange command does. Except that instead of printing out
all the numbers in that range, it will print out a random number from that
range.
Ex:
$ numrandom 1..100
37
$ numrandom 1.0..100.0
56.9
$ numrandom 1.000..100.000
42.397
$ numrandom 2..100i2 [only pick among the even integers from 2 to 100]
68
Usage options (in addition to the standard options)
-n <n> -- Generate <n> random numbers separated by a space.
-s <c> -- Use <c> as the separation character.
-N -- Shortcut for using a newline separation character.
Feature submitted by areiner@tph.tuwien.ac.at:
- You should specify the distribution you are generating;
probably, this is just an equal distribution. What would come in handy
quite often is to have a way of producing pseudo-random numbers that
realize a chosen distribution with a given set of parameters. But that
should probably be a different program than numrandom, e.g. distribution,
as the name is already taken. So, e.g., ``distribution --gaussian 3.0
4.0 100'' should produce 100 numbers distributed according to a gauss
distribution with mean 3 and variance 4.
-- numbound --
This program is for finding the maximum, minimum and surrounding numbers in a
set. You can use different options to specify how many numbers are returned as
to show top maximum and minimum lists. Also you can show the numbers that
occur around the context of the boundary numbers.
Ex:
$ cat numbers
1 10 8 100 15 1000
2 9 5 15 27 12 136
$ numbound -u numbers (upper)
1000
$ numbound -l numbers (lower)
1
Top 4 upper numbers sorted
$ numbound -u -n 4 numbers
1000
136
100
27
Bottom 4 lower numbers sorted)
$ numbound -l -n 4 numbers
1
2
5
8
Context around lowest number. Show 2 numbers of context.
$ numbound -l -c 2 numbers
1 10 8
Find the 5 closet numbers to the number 75. They will print out
in closeness order.
$ numbound -f 75 -n 6
100
27
15
15
136
10
Usage options (in addition to the standard options)
-u -- Return the upper bound number in the set (the maximum number)
-l -- Return the lower bound number in the set (the minimum number)
-n <n> -- Return the top or bottom <n> numbers or <n> numbers around
a number.
-c <n> -- In addition to the number returned, show <n> numbers on both sides
of the number.
-f <n> -- Find the number <n> or the closest number to <n>.
-- numprocess -- (maybe a name like mutate, alter, or process would be better)
This program mutates numbers as it encounters them. It should do the
following operations:
o Add/Subtract a value to a number
o Multiply/Divide a number by a factor
o Raise a number to a power (includes the concept of roots)
o Round a number up or down
o Do any mathematical function to a number.
o Quantify a number. So 1024 bytes is 1KB and so on.
Ex:
Add 1 to each number
$ numprocess /+1/
Multiply each number by 8 and then divide by 5
$ numprocess /*8,%5/
Convert from Farenheit to Celcius
$ numprocess /-32,*5,%9/
Usage options (in addition to the standard options)
-- numinterval --
This program calculates and displays the interval between one number
and the next.
|