File: NEWS

package info (click to toggle)
datamash 1.9-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 13,600 kB
  • sloc: ansic: 65,320; sh: 8,982; perl: 5,127; makefile: 250; sed: 16
file content (321 lines) | stat: -rw-r--r-- 9,911 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321


* Noteworthy changes in release 1.9 (2025-04-05) [stable]

** Changes in Behavior

  datamash(1), decorate(1): Add short options -h and -V for --help and --version
  respectively.

  datamash(1): the rand operation now uses getrandom(2) for generating a random
  seed, instead of relying on date/time/pid mixing.

** New Features

  datamash(1): add operation dotprod for calculating the scalar product of two
  columns.

  datamash(1): Add option -S/--seed to set a specific seed for pseudo-random
  number generation.

  datamash(1): Add option --vnlog to enable experimental support for the vnlog
  format. More about vnlog is at https://github.com/dkogan/vnlog.

  datamash(1): -g/groupby takes ranges of columns (e.g. 1-4)

** Bug Fixes

  datamash(1) now correctly calculates the "antimode" for a sequence
  of numbers.  Problem reported by Kingsley G. Morse Jr. in
  <https://lists.gnu.org/archive/html/bug-datamash/2023-12/msg00003.html>.

  When using the locale's decimal separator as field separator, numeric
  datamash(1) operations now work correctly.  Problem reported by Jérémie
  Roquet in
  <https://lists.gnu.org/archive/html/bug-datamash/2018-09/msg00000.html>
  and by Jeroen Hoek in
  <https://lists.gnu.org/archive/html/bug-datamash/2023-11/msg00000.html>.

  datamash(1): The "getnum" operation now stays inside the specified field.


* Noteworthy changes in release 1.8 (2022-07-23) [stable]

** Changes in Behavior

  Schedule -f/--full combined with non-linewise operations for deprecation.
  In a future release, -f/--full will only be usable with operations where
  it makes sense. For now, we print a warning to stderr when -f/--full is
  used with non-linewise operations, and such usage will no longer be
  supported.

  The bin operation now uses more intuitive bins. Previously, a command
  such as `datamash bin 1 <<< -0` would output -100; and -100 did not fall
  in its own bin. We now require all bins to take the form `[nx,(n+1)x)`
  with integer n and bin width x. We discard the sign on -0 and gate such
  inputs into the [0,x) bin.

  Operations taking more than one argument now provide more complete output
  with --header-out. Previously, an operation such as `pcov x:y` would
  produce an output header like `pcov(y)`, discarding the `x`. The new
  behavior will output header `pcov(x,y)`.

  datamash(1) no longer ignores --output-delimiter with the rmdup operation.

** New Features

  New datamash option --sort-cmd argument to specify the program used
  by the -s option to sort input, plus enhancements to the security and
  portability of building sort command lines.

  New datamash option -c/--collapse-delimiter=X argument uses character
  X instead of comma between values in collapse and unique lists.

  New datamash operations: mean square (ms) and root mean square (rms).

  Decorate now supports sorting IP addresses of both versions 4 and 6
  together. IPv4 addresses are logically converted to IPv6 addresses,
  either as IPv4-Mapped (ipv6v4map) or IPv4-Compatible (ipv6v4comp)
  addresses.

  Add two command aliases:
    'echo' may now be used instead of 'cut'.
    'uniq' may now be used instead of 'unique'.

** Improvements

  Updated the bash completion script to reflect recent additions.

** Bug Fixes

  Datamash now passes the -z/--zero-terminated flag to the sort(1) child
  process when used with "--sort --zero-terminated". Additionally,
  if the system's sort(1) does not support -z, datamash reports the error
  and exits. Previously it would omit the "-z" when running sort(1),
  resulting in incorrect results.

  Documentation fixes and spelling corrections.

  Incorrect format in a decorate(1) error breaking compilation on some
  systems.

  datamash(1), decorate(1): Fix some minor memory leaks.

  datamash(1) no longer crashes when the unique or countunique operations
  are used with input data containing NUL bytes.  The problem was reported
  in https://lists.gnu.org/archive/html/bug-datamash/2020-11/msg00001.html
  by Catalin Patulea.

  datamash(1) no longer crashes when crosstab with --header-in is called
  by field name instead of index. I.e. `datamash --header-in ct x,y` now
  works as expected.


* Noteworthy changes in release 1.7 (2020-04-23) [testing]

** New Features

  decorate(1): new program - sorts input in non-standard ordering, e.g.
  IPv4, IPv6, roman numerals.

  New operations: sha224/sha384.

  New operations: geomean (Geometric mean) and harmmean (Harmonic mean).


* Noteworthy changes in release 1.6 (2020-02-24) [stable]

** Bug Fixes

  The 'gutnum' operation (introduced in vresion 1.5) now correctly
  prints detected numbers without truncating them.


* Noteworthy changes in release 1.5 (2019-09-17) [stable]

** New Features

  Datamash now accepts backslash-escaped characters in field names.
  This allows working with named fields containing dash/mins,colons,commas
  or field names starting with digits (Note the interplay between
  backslash and shell quoting). The following are equivalent,
  and sum an input field named 'FOO-BAR':
      datamash -H sum FOO\\-BAR < input.txt
      datamash -H sum 'FOO\-BAR' < input.txt
      datamash -H sum "FOO\\-BAR" < input.txt

  New operations: dirname, basename
  These behave just like dirname(1) and basename(1):
     $ echo /home/foo/bar.txt | datamash dirname 1 basename 1
     /home/foo    bar.txt

  New operations: extname, barename
  'extname' extract the extension of the file name.
  'barename' (not to be confused with 'basename') extract the basename
  without the extension.
  Example:
     $ echo /home/foo/bar.tar.gz | datamash barename 1 extame 1
     bar         tar.gz

  New operation: getnum
  This operation extracts a number from a string.
  'getnum' accepts an optional single letter option:
     getnum:n - natural numbers (positive integers, including zero)
     getnum:i - integers
     getnum:d - decimal point numbers
     getnum:p - positive decimal point numbers (this is the default)
     getnum:h - hex numbers
     getnum:o - octal numbers
   Examples:
     $ echo foo-42.0-bar | datamash getnum 1
     42.0
     $ echo foo-42.0-bar | datamash getnum:n 1
     42
     $ echo foo-42.0-bar | datamash getnum:i 1
     -42
     $ echo foo-42.0-bar | datamash getnum:d 1
     -42.0

  New operation: cut
  Similar to cut(1), it copies the input field to the output as-is.
  The advantage over cut(1) is that combined with datamash's other features,
  input fields can be specified by name instead of column number, and
  output fields can be re-ordered and duplicated.
  Example:
    $ printf "a b c\n1 X 6\n" | datamash -W -H cut c,a,c
    cut(c)  cut(a)  cut(c)
    6       1       6

** Bug fixes

  Datamash now correctly calculates mode/antimode for negative values.
  In version 1.4 and earlier, the following produced incorrect results:
    $ echo -1 | datamash-1.4 mode 1
    1.844674407371e+19



* Noteworthy changes in release 1.4 (2018-12-22) [stable]

** New Features

  New option: -C/--skip-comments to skip comment lines (lines starting
  with '#' or ';' and optional whitespace).


* Noteworthy changes in release 1.3 (2018-03-16) [stable]

** New Features

  New option: --format=FMT sets printf style floating-point format.
  Example:
     $ echo '50.5' | datamash --format "%07.3f" sum 1
     050.500
     $ echo '50.5' | datamash --format "%07.3e" sum 1
     5.050e+01

  New option: -R/--round=N rounds numeric values to N decimal places.

  New option: --output-delimiter=X overrides -t/-W.

  New operation: trimmean (trimmed mean value).
  To calculate 20% trimmed mean:
     $ printf "%s\n" 13 3 7 33 3 9 | datamash  trimmean:0.2  1
     8


** Bug fixes

  Datamash now builds correctly with external OpenSSL libraries
  (./configure --with-openssl=yes). The 'configure' script now reports
  whether internal or external libraries are used:

     $ ./configure [OPTIONS]
     [...]
     Configuration summary for datamash
         md5/sha*: internal (gnulib)
  OR
         md5/sha*: external (-lcrypto)


* Noteworthy changes in release 1.2 (2017-08-22) [stable]

** New Features

  New operations:
    perc (percentile),
    range (max-min of values in group/column)

  Improved 'check' operation:
    Expected number of lines/fields can be specified as parameter.

** Improvements

  Improved bash-completion script installation path (see README for details).


* Noteworthy changes in release 1.1.1 (2017-01-19) [stable]

** Bug fixes

  'check' command correctly counts a trailing delimiter at end of lines.

  'transpose' command correctly handles missing fields on the last line.


* Noteworthy changes in release 1.1.0 (2016-01-16) [stable]

** New Features

  Bumped version to 1.1.0 to better comply to semver.

  New operations:
   crosstab (cross-tabulation / pivot-tables),
   check (verify tabular structure),
   bin (bin numeric values)
   strbin (bin strings values)
   pearson correlation,
   covariance,
   rounding functions: round,floor,ceil,trunc,frac

** Improvements

  Speed, Portability, Tests, Coverage improvements.


* Noteworthy changes in release 1.0.7 (2015-06-29) [stable]

** New Features

  New operations: md5, sha1/256/512, base64, rmdup.

  New option --narm to ignore NaN/NA values.

  New feature: ability to specify field by names instead of numbers
  (require using --header-in or -H).

  New translations added.

** Improvements

  Speed, Portability, Coverage improvements.


* Noteworthy changes in release 1.0.6 (2014-07-29) [stable]

** New Features

  New operations: transpose, reverse.

** Improvements

  Tests: improve portability, add I/O error tests, add few edge-case tests.

  Build: improve man-page generation, cross-compiling, auxiliary build scripts.

  Documentation: expand and fix man-page (and shorten --help screen).


* Noteworthy changes in release 1.0.5 (2014-07-15) [stable]

First release as GNU Datamash.