File: datamash.texi

package info (click to toggle)
datamash 1.9-1
  • links: PTS, VCS
  • area: main
  • in suites: trixie
  • size: 13,600 kB
  • sloc: ansic: 65,320; sh: 8,982; perl: 5,127; makefile: 250; sed: 16
file content (1601 lines) | stat: -rw-r--r-- 42,017 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename datamash.info
@include version.texi
@settitle GNU Datamash @value{VERSION}

@c Define a new index for options.
@defcodeindex op
@c Combine everything into one index (arbitrarily chosen to be the
@c concept index).
@syncodeindex op cp
@syncodeindex vr cp
@c %**end of header

@copying
This manual is for GNU Datamash (version @value{VERSION}, @value{UPDATED}),
which provides command-line computations on input files.

Copyright @copyright{} 2014--2021 Assaf Gordon.
Copyright @copyright{} 2022--2025 Timothy Rice.

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts.  A copy of the license is included in the section entitled
``GNU Free Documentation License''.
@end quotation
@end copying
@c If your manual is published on paper by the FSF, it should include
@c the standard FSF Front-Cover and Back-Cover Texts, as given in
@c maintain.texi.

@dircategory Basics
@direntry
* Datamash: (datamash).               datamash
@end direntry

@titlepage
@title GNU Datamash
@subtitle for version @value{VERSION}, @value{UPDATED}
@author GNU Datamash Developers (@email{assafgordon@@gmail.com})
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage

@contents


@ifnottex
@node Top
@top Datamash

This manual is for GNU Datamash (version @value{VERSION}, @value{UPDATED}),
which provides command-line computations on input files.
@end ifnottex

@menu
* Overview::		General purpose and information.
* Invoking datamash::	How to run @command{datamash}.
* Available Operations::        Available operations in @command{datamash}.
* Statistical Operations::	Statistical operations in @command{datamash}.
* Usage Examples::      Usage Examples.
* Reporting bugs::	Sending bug reports and feature suggestions.
* GNU Free Documentation License:: Copying and sharing this documentation.
* Concept index::	Index of concepts.


@end menu

@node Overview
@chapter Overview



@cindex overview

The @command{datamash} program
(@url{https://www.gnu.org/software/datamash}) performs calculation (e.g.
@emph{sum,}, @emph{count}, @emph{min}, @emph{max}, @emph{skewness},
@emph{standard deviation}) on input files.

Example: sum up the values in the first column of the input:

@example
@cindex example, sum
$ seq 10 | datamash sum 1
55
@end example

@command{datamash} can group input data and perform operations on each group.
It can sort the file, and read header lines.

Example: Given a file with three fields (name, subject, score),
find the average score in each subject:

@example
$ cat scores.txt
Name        Subject          Score
Bryan       Arts             68
Isaiah      Arts             80
Gabriel     Health-Medicine  100
Tysza       Business         92
Zackery     Engineering      54
...

@cindex sorting
@cindex grouping
@cindex example, sorting
@cindex example, grouping
$ datamash --sort --headers --group 2 mean 3 sstdev 3 < scores.txt
GroupBy(Subject)   mean(Score)   sstdev(Score)
Arts               68.9474       10.4215
Business           87.3636       5.18214
Engineering        66.5385       19.8814
Health-Medicine    90.6154       9.22441
Life-Sciences      55.3333       20.606
Social-Sciences    60.2667       17.2273
@end example


@command{datamash} is designed for interactive exploration of textual data
and for automating tasks in shell scripts.

@command{datamash} has a rich set of statistical functions to quickly assess
information in textual input files. An example of calculating basic statistic
(mean, 1st quartile, median, 3rd quartile, IQR, sample-standard-deviation,
and p-value of Jarque-Bera test for normal distribution:

@cindex example, statistics
@example
$ datamash -H mean 1 q1 1 median 1 q3 1 iqr 1 sstdev 1 jarque 1 < FILE
mean(x)   q1(x)  median(x)  q3(x)   iqr(x)  sstdev(x)  jarque(x)
45.32     23     37         61.5    38.5    30.4487    8.0113-09
@end example



@node Invoking datamash
@chapter Invoking @command{datamash}

@cindex invoking
@cindex options
@cindex usage
@cindex help

The format for running the @command{datamash} program is:

@example
datamash [@var{option}]@dots{} @var{op1} @var{column1} @
[@var{op2} @var{column2} @dots{}]
@end example

Where @var{op1} is the operation to perform on the values in @var{column1}.
@command{datamash} reads input from stdin and performs one or more operations
on the input data. If @option{--group} is used, each operation is performed
on every group. If @option{--group} is not used, each operation is performed on
all the values in the input file.

@vindex LC_NUMERIC
The @env{LC_NUMERIC} locale specifies the decimal-point character and the
thousands separator.

@exdent @command{datamash} supports the following operations:

@table @asis
@item Primary operations:
@code{groupby}, @code{crosstab}, @code{transpose}, @code{reverse},
@code{check}

@item Line-Filtering operations:
@code{rmdup}

@item Per-Line operations:
@code{base64}, @code{debase64}, @code{md5}, @code{sha1},
@code{sha224}, @code{sha256}, @code{sha384}, @code{sha512}, @code{bin},
@code{strbin}, @code{round}, @code{floor}, @code{ceil}, @code{trunc},
@code{frac}, @code{dirname}, @code{basename}, @code{extname}, @code{barename},
@code{getnum}, @code{cut}, @code{echo}

@item Group-by Numeric operations:
@code{sum}, @code{min}, @code{max}, @code{absmin}, @code{absmax}, @code{range}

@item Group-by Textual/Numeric operations:
@code{count}, @code{first}, @code{last}, @code{rand},
@code{unique}, @code{uniq},
@code{collapse}, @code{countunique}

@item Group-by Statistical operations:
@code{mean}, @code{geomean}, @code{harmmean}, @code{mode},
@code{median}, @code{q1}, @code{q3}, @code{iqr}, @code{perc},
@code{antimode}, @code{pstdev}, @code{sstdev}, @code{pvar}, @code{svar},
@code{ms}, @code{rms}, @code{mad}, @code{madraw}, @code{sskew},
@code{pskew}, @code{skurt}, @code{pkurt}, @code{jarque}, @code{dpo},
@code{scov}, @code{pcov}, @code{spearson}, @code{ppearson},
@code{dotprod}

@end table

@exdent Grouping options:

@table @option
@item --skip-comments
@itemx -C
@opindex --skip-comments
@opindex -C
Skip comment lines (starting with '#' or ';' and optional whitespace).

@item --full
@itemx -f
@opindex --full
@opindex -f
Print entire input line before op results (default: print only the grouped
keys).
While using this option with non-linewise operations was historically permitted,
it never produced very sensible output. Such usage has been deprecated, and in a
future release it will result in an error.

@item --group=@var{X[,Y,Z]}
@itemx -g @var{X[,Y,Z]}
@opindex --group
@opindex -g
@cindex grouping
Group input via fields @var{X[,Y,Z]}. By default, fields are separated by TABs.
Use @option{--field-separator} to change the delimiter character. Input file
must be sorted by the same fields @var{X[,Y,Z]}. Use @option{--sort}
to automatically sort the input.
If @option{--group} is not specified, each operation is performed
in the entire input file.
Ranges of field numbers like @var{X-Z} are also supported.

@item --header-in
@opindex --header-in
Indicates the first input line is column headers, and should not be used for
any calculations.

@item --header-out
@opindex --header-out
Print column headers as first line. If the column header names are known (i.e.
the input file had a header line, and the @command{command} was invoked with
@option{--header-in}, @option{-H} or @option{--headers}), prints the operation
and the name of the field (e.g. @samp{mean(X)}). Otherwise, prints the number
operation and the field number (e.g. @samp{mean(field-3)}).

@item --headers
@itemx -H
@opindex --headers
@opindex -H
Same as @samp{--header-in --header-out}. A short option indicating the input
file has a header line, and the output should contain a header line as well.

@item --vnlog
@opindex --vnlog
Enable experimental support for the vnlog data file format for both input
and output.  This format is explained at @url{https://github.com/dkogan/vnlog}.

@item --ignore-case
@itemx -i
@opindex --ignore-case
@opindex -i
Ignore upper/lower case when comparing text for grouping, sorting, and comparing
unique values in the @samp{countunique} and @samp{unique}
(or @samp{uniq}) operations.

@item --sort
@itemx -s
@opindex --sort
@opindex -s
@cindex sorting
Sort the input before grouping. @command{datamash} requires sorted input. If
the input is not sorted, using @option{--sort} will automatically sort the input
before processing it further. Sorting will be performed based on the specified
@option{--group} parameter, and respecting case @option{--ignore-case} option
(if used). The following commands are equivalent:
@example
$ cat FILE | sort -k1,1 | datamash --group 1 sum 1
$ cat FILE | datamash --sort --group 1 sum 1
@end example

@item --sort-cmd=@var{PATH}
@opindex --sort-cmd
@cindex sorting
Use the given program to sort instead of the system @command{sort}

@end table


@exdent File Operation options:

@table @option

@item --no-strict
@opindex --no-strict
Allow lines with varying number of fields. By default, @option{transpose} and
@option{reverse} will fail with an error message unless all input lines have
the same number of fields.

@item --filler=@var{x}
@opindex --filler
When use @option{--no-strict} option, missing fields will be filled with this
value.
@end table

@exdent General options:

@table @option

@item --format=@var{FORMAT}
@opindex --format
print numeric values with printf style floating-point @var{FORMAT}.


@item --field-separator=@var{x}
@itemx -t @var{x}
@opindex --field-separator
@opindex -t
Use character @var{X} instead of TAB as input and output field delimiter.
If @option{--output-delimiter} is also used, it will override the output
field delimiter.

@item --narm
@opindex --narm
Skip @var{NA} or @var{NaN} values.

@item --output-delimiter=@var{x}
@opindex --output-delimiter
@opindex -t
Use character @var{X} instead as output field delimiter.
This option overrides @option{--field-separator}/@option{-t}/
@option{--whitespace}/@option{-W}.

@item --collapse-delimiter=@var{x}
@itemx -c @var{x}
@opindex --collapse-delimiter
@opindex -c

Use character @var{X} instead of comma to delimit items in a
@samp{collapse} or @samp{unique} (aka @samp{uniq}) list.

@item --round=@var{N}
@itemx -R @var{N}
@opindex --round
@opindex -R
Round numeric output to @var{N} decimal places.

@item --whitespace
@itemx -W
@opindex --whitespace
@opindex -W
Use whitespace (one or more spaces and/or tabs) for field delimiters.
Leading whitespace is ignored, trailing whitespace results in an empty field.
TAB character will be used as output field separator.
If @option{--output-delimiter} is also used, it will override the output
field delimiter.

@item --seed
@itemx -S
@opindex --seed
@opindex -S
Select a specific random seed. By default, GNU Datamash uses getrandom(2),
which should be suitable for most purposes. You may wish to force a specific
seed if you either wish to draw on a specific entropy source or for ensuring
the reproducibility of a specific test.

@item --zero-terminated
@itemx -z
@opindex --zero-terminated
@opindex -z
End lines with a 0 byte, not newline.

@item --help
@itemx -h
@opindex --help
@opindex -h
Print an informative help message on standard output and exit
successfully.

@item --version
@itemx -V
@opindex --version
@opindex -V
Print the version number and licensing information of Datamash on
standard output and then exit successfully.

@end table

@node Available Operations
@chapter Available operations in @command{datamash}

@table @asis
@item Primary operations:
@cindex primary operations
@cindex operations, primary

@table @option
@item groupby
alternative syntax for @option{--group}
@item crosstab
cross-tabulate two fields (also known as 'pivot-tables')
@item transpose
transpose rows, columns of a text file
@item reverse
reverse fields in each line of a text file
@item check
verify tabular structure of input (ensure same number of fields in all lines)
@end table

@item Line-Filtering operation:
@cindex line filtering operation
@cindex operations, line filtering

@table @option
@item rmdup
remove lines with duplicated key value
@end table

@item Per-Line operations:
@cindex Per-Line operations
@cindex operations, per-line

@table @option
@item base64
encode the field as base64
@item debase64
decode the field as base64. Exit with an error if the field is invalid base64
value which cannot be decoded.
@item md5
calculates md5 hash of the field
@item sha1
calculates sha1 hash of the field
@item sha224
calculates sha224 hash of the field
@item sha256
calculates sha256 hash of the field
@item sha384
calculates sha384 hash of the field
@item sha512
calculates sha512 hash of the field
@item dirname
extracts the directory name of the field (assuming the field is a file name).
Similar to @command{dirname(1)}.
@item basename
extracts the base file name of the field (assuming the field is a file name).
Similar to @command{basename(1)}.
@item extname
extracts the extension of the file name of the field (assuming the field is a
file name).
@item extname
extracts the base file name of the field without the extension (assuming the
field is a file name).
@item getnum
extract a number from the field. @code{getnum} accepts an optional single
letter option @samp{n/i/d/p/h/o} affecting the detected value.
@item cut
copy input field to output field (similar to @command{cut(1)}).
When the @code{cut} operation is given a list of fields, the fields are copied
in the given order (in contrast to @command{cut(1)}).
@item echo
an alias for @code{cut}.
@end table

@item Group-by Numeric operations:
@cindex numeric operations
@cindex operations, numeric

@table @option
@item sum
sum the of values
@item min
minimum value
@item max
maximum value
@item absmin
minimum of the absolute values
@item absmax
maximum of the absolute values
@item range
range of values (maximum - minimum)
@end table

@item Group-By Textual/Numeric operations:
@cindex Textual operations
@cindex operations, textual

@table @option
@item count
count number of elements in the group
@item first
the first value of the group
@item last
the last value of the group
@item rand
one random value from the group
@item unique
comma-separated sorted list of unique values
@item uniq
an alias for @code{unique}.

@option{--collapse-delimiter} can be used to use a different character
than comma.

@item collapse
comma-separated list of all input values

@option{--collapse-delimiter} can be used to use a different character
than comma.

@item countunique
number of unique/distinct values
@end table

@item Group-By Statistical operations:
@cindex Statistical operations
@cindex operations, statistical

@table @option
@item mean
mean of the values
@item geomean
geometric mean of the values
@item harmmean
harmonic mean of the values
@item trimmean
trimmed mean of the values
@item ms
mean square of the values
@item rms
root mean square of the values
@item median
median value
@item q1
1st quartile value
@item q3
3rd quartile value
@item iqr
inter-quartile range
@item perc
percentile value
@item mode
mode value (most common value)
@item antimode
anti-mode value (least common value)
@item pstdev
population standard deviation
@item sstdev
sample standard deviation
@item pvar
population variance
@item svar
sample variance
@item mad
Median Absolute Deviation,
scaled by a constant 1.4826 for normal distributions
@item madraw
Median Absolute Deviation, unscaled
@item sskew
skewness of the (sample) group
@item pskew
skewness of the (population) group
@item skurt
Excess Kurtosis of the (sample) group
@item pkurt
Excess Kurtosis of the (population) group
@item jarque
p-value of the Jarque-Beta test for normality
@item dpo
p-value of the D'Agostino-Pearson Omnibus test for normality.
@end table

@end table


@node Statistical Operations
@chapter Statistical Operations

@cindex statistics
@cindex operations, statistical
@cindex statistical operations

@unnumberedsec Equivalent R functions
GNU Datamash is designed to closely follow R project's
(@url{https://www.r-project.org/}) statistical functions.
See the @file{files/operators.R} file
for the R equivalent code for each of datamash's operators.
When building @command{datamash} from source code on your local computer,
operators are compared to known results of the equivalent R functions.



@node Usage Examples
@chapter Usage Examples
@cindex usage examples
@cindex examples, usage

@menu
* Summary Statistics::		  count,min,max,mean,stdev,median,quartiles
* Header Lines and Column Names:: Using files with header lines
* Field Delimiters::              Tabs, Whitespace, other delimiters
* Column Ranges::                 Operating on multiple columns
* Reverse and Transpose::         swapping and transposing rows, columns
* Groupby on @file{/etc/passwd}:: Groupby, count, collapse
* Check::                         Validate tabular structure
* Crosstab::                      Cross-tabulation (pivot-tables)
* Rounding numbers::              round, ceil, floor, trunc, frac
* Binning numbers::               assigning numbers into fixed number of buckets
* Binning strings::               assigning strings into fixed number of buckets
* Extracting numeric values::     using getnum
@end menu


@node Summary Statistics
@section Summary Statistics
@cindex summary statistics example
@cindex examples, summary statistics

The following are examples of using @command{datamash} to quickly
calculate summary statistics. The examples will use a file with three
fields (name, subject, score) representing grades of students:

@example
$ cat scores.txt
Shawn     Arts  65
Marques   Arts  58
Fernando  Arts  78
Paul      Arts  63
Walter    Arts  75
...
@end example

Counting how many students study each subject (@emph{subject} is the
second field in the input file, thus @option{groupby 2}):

@example
$ datamash --sort groupby 2 @option{count} 2 < scores.txt
Arts            19
Business        11
Engineering     13
Health-Medicine 13
Life-Sciences   12
Social-Sciences 15
@end example

@cindex min, examples
@cindex max, examples
@cindex examples, min
@cindex examples, max
Similarly, find the minimum and maximum score in each subject:

@example
$ datamash --sort groupby 2 @option{min} 3 @option{max} 3 < scores.txt
Arts             46      88
Business         79      94
Engineering      39      99
Health-Medicine  72     100
Life-Sciences    14      91
Social-Sciences  27      90
@end example

@cindex mean, examples
@cindex standard deviation, examples
@cindex examples, mean
@cindex examples, standard deviation
find the mean and (population) standard deviation in each subject:

@example
$ datamash --sort groupby 2 @option{mean} 3 @option{pstdev} 3 < scores.txt
Arts              68.947  10.143
Business          87.363   4.940
Engineering       66.538  19.101
Health-Medicine   90.615   8.862
Life-Sciences     55.333  19.728
Social-Sciences   60.266  16.643
@end example


@cindex median, examples
@cindex examples, median
@cindex quartiles, examples
@cindex examples, quartiles
Find the median, first, third quartiles and the inter-quartile range in
each subject:

@example
$ datamash --sort groupby 2 @option{median} 3 @option{q1} 3 @option{q3} @
3 @option{iqr} 3  < scores.txt
Arts              71      61.5      75.5     14
Business          87      83        92        9
Engineering       56      51        83       32
Health-Medicine   91      84       100       16
Life-Sciences     58.5    44.25     67.75    23.5
Social-Sciences   62      55        70.5     15.5
@end example


@xref{Header Lines and Column Names} for examples of dealing with
header lines.

@node Header Lines and Column Names
@section Header Lines and Column Names

@opindex --header-out
@cindex examples, header
@cindex examples, header-out
@cindex header, examples
@cindex header-out, examples
@unnumberedsubsec Output Header Lines

If the input does @emph{not} have a header line, use
@option{--header-out} to add a header in the first line of the output,
indicating which operation was performed:

@example
$ datamash --sort @option{--header-out} groupby 2 @option{min} @
3 @option{max} 3 < scores.txt
GroupBy(field-2)  min(field-3)  max(field-3)
Arts              46            88
Business          79            94
Engineering       39            99
Health-Medicine   72           100
Life-Sciences     14            91
Social-Sciences   27            90
@end example


@unnumberedsubsec Skipping Input Header Lines

@opindex --header-in
@cindex examples, header
@cindex examples, header-in
@cindex header-in, examples

If the input has a header line (first line containing column names),
use @option{--header-in} to skip the line:

@example
$ cat scores_h.txt
Name      Major   Score
Shawn     Arts    65
Marques   Arts    58
Fernando  Arts    78
Paul      Arts    63
...


$ datamash --sort @option{--header-in} groupby 2 mean 3 < scores_h.txt
Arts             68.947
Business         87.363
Engineering      66.538
Health-Medicine  90.615
Life-Sciences    55.333
Social-Sciences  60.266
@end example

If the header line is not skipped, @command{datamash} will show an error
(due to strict input validation):

@example
$ datamash groupby 2 mean 3 < scores_h.txt
datamash: invalid numeric value in line 1 field 3: 'Score'
@end example


@unnumberedsubsec Using Header Lines

@opindex --headers
@opindex -H
@cindex examples, headers
@cindex headers, examples

Column names in the input header lines can be printed
in the output header lines by using @option{--headers}
(or @option{-H}, both are equivalent to @option{--header-in --header-out}):

@example
$ datamash --sort @option{--headers} groupby 2 mean 3 < scores_h.txt
GroupBy(Major)    mean(Score)
Arts              68.947
Business          87.363
Engineering       66.538
Health-Medicine   90.615
Life-Sciences     55.333
Social-Sciences   60.266
@end example

Or in short form (@option{-sH} instead of @option{--sort --headers}),
equivalent to the above command:

@example
$ datamash @option{-sH} groupby 2 mean 3
@end example


@unnumberedsubsec Column Names
@cindex column names
@cindex field names

When the input file has a header line, column names can be used
instead of column numbers. In the example below, @var{Major}
is used instead of the value 2, and @var{Score} is used
instead of the value 3:

@example
$ datamash --sort --headers groupby Major mean Score < scores_h.txt
GroupBy(Major)    mean(Score)
Arts              68.947
Business          87.363
Engineering       66.538
Health-Medicine   90.615
Life-Sciences     55.333
Social-Sciences   60.266
@end example

@command{datamash} will read the first line of the input, and deduce
the correct column number based on the given name. If the column name
is not found, an error will be printed:

@example
$ datamash --sort --headers groupby 2 mean @option{Foo}  < scores_h.txt
datamash: column name 'Foo' not found in input file
@end example


Field names must be escaped with a backslash if they start with a digit
or contain special characters (dash/minus, colons, commas).
Note the interplay between escaping with backslash and shell quoting.
The following equivalent command sum the values of a field named @samp{FOO-BAR}:

@example
$ datamash -H sum FOO\\-BAR < input.txt
$ datamash -H sum 'FOO\-BAR' < input.txt
$ datamash -H sum "FOO\\-BAR" < input.txt
@end example



@node Field Delimiters
@section Field Delimiters
@cindex field delimiters
@cindex whitespace delimiters
@cindex delimiters, whitespace
@cindex tab delimiters
@cindex delimiters, tabs

@command{datamash} uses tabs (ASCII character 0x09) as default field
delimiters.  Use @option{-W} to treat one or more consecutive
whitespace characters as field delimiters. Use @option{-t},
@option{--field-separator} to set a custom field delimiter.

The following examples illustrate the various options.

By default, fields are separated by a single tab. Multiple tabs
denotes multiple fields (this is consistent with GNU coreutils'
@command{cut}):

@example
$ printf '1\t\t2\n' | datamash sum 3
2
$ printf '1\t\t2\n' | cut -f3
2
@end example

Every tab separates two fields.  A line starting with a tab thus starts
with an empty field, and a line ending with a tab ends with an empty field.

Using @option{-W}, one or more consecutive whitespace characters
are treated as a single field delimiter:

@example
$ printf '1  \t  2\n' | datamash -W sum 2
2
$ printf '1  \t  2\n' | datamash -W sum 3
datamash: invalid input: field 3 requested, line 1 has only 2 fields
@end example

With @option{-W}, leading whitespace is ignored, but trailing whitespace
is significant.  A line starting with one or more consecutive whitespace
characters followed by a non-whitespace character starts with a non-empty
field.  A line ending with one or more consecutive whitespace characters
ends with an empty field.

Using @option{-t}, a custom field delimiter character can be specified.
Multiple consecutive delimiters are treated as multiple fields:

@example
$ printf '1,10,,100\n' | datamash -t, sum 4
100
@end example



@node Column Ranges
@section Column Ranges
@cindex column ranges
@cindex ranges, columns
@cindex multiple columns

@command{datamash} accepts column ranges such as @var{1,2,3} and @var{1-3}.


Simulating input with multiple columns:

@example
$ seq 100 | paste - - - -
1    2    3    4
5    6    7    8
9   10   11   12
13  14   15   16
17  18   19   20
...
@end example

The following are equivalent:

@example
$ seq 100 | paste - - - - | datamash sum @option{1 sum 2 sum 3 sum 4}
1225  1250   1275   1300

$ seq 100 | paste - - - - | datamash sum @option{1,2,3,4}
1225  1250   1275   1300

$ seq 100 | paste - - - - | datamash sum @option{1-4}
1225  1250   1275   1300

$ seq 100 | paste - - - - | datamash sum @option{1-3,4}
1225  1250   1275   1300
@end example

Ranges can be used with multiple operations:

@example
$ seq 100 | paste - - - - | datamash @option{sum 1-4 mean 1-4}
1225  1250   1275   1300   49   50   51   52
@end example




@node Reverse and Transpose
@section Reverse and Transpose

@unnumberedsubsec Transpose
@cindex transpose
@cindex swap rows, columns

Use @option{transpose} to swap rows and columns in a file:

@example
$ cat input.txt
Sample   Year   Count
A        2014   1002
B        2013    990
C        2014   2030
D        2014    599

$ datamash @option{transpose} < input.txt
Sample  A       B       C       D
Year    2014    2013    2014    2014
Count   1002    990     2030    599
@end example


@cindex strict mode
@cindex input validation, transpose
@cindex transpose, input validation
By default, @option{transpose} verifies the input has the same number
of fields in each line, and fails with an error otherwise:

@example
$ cat input.txt
Sample   Year   Count
A        2014   1002
B        2013
C        2014   2030
D        2014    599


$ datamash @option{transpose} < input1.txt
datamash: transpose input error: line 3 has 2 fields (previous lines had 3);
see --help to disable strict mode
@end example

Use @option{--no-strict} to allow missing values:

@opindex --no-strict
@cindex strict, transpose
@cindex transpose, strict
@example
$ datamash @option{--no-strict} transpose < input1.txt
Sample  A       B        C        D
Year    2014    2013     2014     2014
Count   1002    N/A      2030     599
@end example

@opindex --filler
@cindex missing values, transpose
@cindex transpose, missing values
@cindex transpose, filler value
Use @option{--filler} to set the missing-field filler value:

@example
$ datamash --no-strict @option{--filler XYZ} transpose < input1.txt
Sample  A       B        C        D
Year    2014    2013     2014     2014
Count   1002    XYZ      2030     599
@end example



@unnumberedsubsec Reverse
@cindex reverse columns
@cindex columns, reverse

Use @option{reverse} to reverse the fields order in a file:

@example
$ cat input.txt
Sample   Year   Count
A        2014   1002
B        2013    990
C        2014   2030
D        2014    599

$ datamash @option{reverse} < input.txt
Count   Year    Sample
1002    2014    A
990     2013    B
2030    2014    C
599     2014    D
@end example

@cindex reverse, strict
@cindex strict, reverse
By default, reverse verifies the input has the same number of fields
in each line, and fails with an error otherwise. Use
@option{--no-strict} to disable this behavior (see section
above for an example).



@unnumberedsubsec Combining Reverse and Transpose

@cindex tac
@cindex reversing lines
@cindex reverse, and transpose
@cindex transpose, and reverse
Reverse and Transpose can be combined to achieve various manipulations.
(reminder: @url{https://www.gnu.org/software/coreutils/tac,tac} can be
used to reverse lines in a file):

@example
$ cat input.txt
A       1       xx
B       2       yy
C       3       zz


$ tac input.txt
C       3       zz
B       2       yy
A       1       xx


$ tac input.txt | datamash reverse
zz      3       C
yy      2       B
xx      1       A


$ cat input.txt | datamash reverse | datamash transpose
xx      yy      zz
1       2       3
A       B       C

$ tac input.txt | datamash reverse | datamash transpose
zz      yy      xx
3       2       1
C       B       A
@end example



@node Groupby on @file{/etc/passwd}
@section Groupby on @file{/etc/passwd}
@cindex groupby
@cindex @file{/etc/passwd}, examples
@cindex examples, @file{/etc/passwd}

@command{datamash} with the @option{groupby} operation mode
can be used to aggregate information.

Using this simulated @file{/etc/passwd} file as input:

@example
$ cat passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
mysql:x:115:124:MySQL Server,,,:/var/lib/mysql:/bin/false
sshd:x:116:65534::/var/run/sshd:/usr/sbin/nologin
guest:x:118:125:Guest,,,:/tmp/guest-home.phc17z:/bin/bash
gordon:x:1004:1000:Assaf Gordon,,,,:/home/gordon:/bin/bash
charles:x:1005:1000:Charles,,,,:/home/charles:/bin/bash
alice:x:1006:1000:Alice,,,,:/home/alice:/bin/bash
bob:x:1007:1000:Bob,,,,:/home/bob:/bin/bash
postgres:x:119:126:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
rabbitmq:x:125:138:RabbitMQ messaging server,,,:/var/lib/rabbitmq:/bin/false
redis:x:126:140:redis server,,,:/var/lib/redis:/bin/false
postfix:x:127:141::/var/spool/postfix:/bin/false
@end example

@opindex -t
@opindex --field-separator
Parameter @option{-t} is used to indicate the field separator @var{:}
(instead of the default @var{tab}).

@cindex groupby, and count
@cindex count
@cindex login shell, examples
Aggregate (@option{groupby}) login shells (column 7) and
@option{count} how many users use each:

@example
$ datamash -t: --sort groupby 7 count 7 < passwd
/bin/bash:7
/bin/false:4
/bin/sync:1
/usr/sbin/nologin:14
@end example

@cindex groupby, and collapse
@cindex collapse
Aggregate (@option{groupby}) login shells (column 7) and print
comma-separated list of users (column 1) for each shell
(@option{collapse}):

@example
$ cat passwd | datamash -t: --sort groupby 7 collapse 1
/bin/bash:root,guest,gordon,charles,alice,bob,postgres
/bin/false:mysql,rabbitmq,redis,postfix
/bin/sync:sync
/usr/sbin/nologin:daemon,bin,sys,games,man,lp,mail,news,uucp,proxy@
,www-data,backup,list,sshd
@end example

Aggregate unix-groups (column 4) and print
comma-separated list of users (column 1) for in each group:

@example
$ datamash -t: --sort groupby 4 collapse 1 < /etc/passwd
0:root
1:daemon
10:uucp
1000:gordon,charles,alice,bob
12:man
124:mysql
125:guest
126:postgres
13:proxy
138:rabbitmq
140:redis
141:postfix
2:bin
3:sys
33:www-data
34:backup
38:list
60:games
65534:sync,sshd
7:lp
8:mail
9:news
@end example



@node Check
@section Check - checking tabular structure
@cindex check
@cindex checking tabular structure

@command{datamash} @option{check} validates the tabular structure of a
file, ensuring all lines have the same number of
fields. @option{check} is meant to be used in scripting and automation
pipelines, as it will terminate with non-zero exit code if the file is
not well structured, while also printing detailed context information
about the offending lines:

@example
$ cat good.txt
A    1    ww
B    2    xx
C    3    yy
D    4    zz


$ cat bad.txt
A    1    ww
B    2    xx
C    3
D    4    zz


$ datamash check < good.txt && echo ok || echo fail
4 lines, 3 fields
ok


$ datamash check < bad.txt && echo ok || echo fail
line 2 (3 fields):
  B  2 xx
line 3 (2 fields):
  C  3
datamash: check failed: line 3 has 2 fields (previous line had 3)
fail
@end example

@subsection Expected number of lines/fields

@option{check} accepts optional @var{lines} and @var{fields} and will
return failure if the input does not have the requested number of lines/fields.

@exdent The syntax is:

@example
datamash check [@var{N} lines] [@var{N} fields]
@end example

@exdent Usage examples:

@example
$ cat file.txt
A    1    ww
B    2    xx
C    3    yy
D    4    zz

$ datamash check 4 lines < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 3 fields < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 4 lines 3 fields < file.txt && echo ok
4 lines, 3 fields
ok

$ datamash check 7 fields < file.txt && echo ok
line 1 (3 fields):
  A    1    ww
datamash: check failed: line 1 has 3 fields (expecting 22)

$ datamash check 10 lines < file.txt && echo ok
datamash: check failed: input had 4 lines (expecting 10)
@end example

For convenience, @var{line},@var{row},@var{rows}
can be used instead of @var{lines};
@var{field},@var{columns},@var{column},@var{col} can be used
instead of @var{fields}.
The following are all equivalent:

@example
datamash check 4 lines 10 fields < file.txt
datamash check 4 rows  10 columns < file.txt
datamash check 10 col 4 row < file.txt
@end example


@subsection checks in automation scripts

@cindex fail fast
@cindex shell scripts, check
@cindex check, in automation and shell scripts
In pipeline/automation context, it is often beneficial to validate
files as early as possible (immediately after file is created, as in
@url{https://en.wikipedia.org/wiki/Fail-fast, fail-fast methodology}).
A typical usage in a shell script would be:

@example
@verbatim
#!/bin/sh

die()
{
    base=$(basename "$0")
    echo "$base: error: $@" >&2
    exit 1
}

custom pipeline-or-program > output.txt \
    || die "program failed"

datamash check < output.txt \
    || die "'output.txt' has invalid structure (missing fields)"
@end verbatim
@end example

If the generated @file{output.txt} file has invalid structure
(i.e. missing fields), @command{datamash} will print the @file{stderr}
enough details to help in troubleshooting (line numbers and offending
line's content).

@node Crosstab
@section Crosstab - Cross-Tabulation (pivot-tables)
@cindex crosstab
@cindex pivot tables
@cindex cross tabulation

Cross-tabulation compares the relationship between two fields.
Given the following input file:

@example
$ cat input.txt
a    x    3
a    y    7
b    x    21
a    x    40
@end example

@opindex count
@cindex count, crosstab and
Show cross-tabulation between the first field (a/b) and the second
field (x/y) - counting how many times each pair appears (note: sorting
is required):

@example
$ datamash -s crosstab 1,2 < input.txt
     x    y
a    2    1
b    1    N/A
@end example

The default operation is @option{count} - in the above example,
@var{a} and @var{x} appear twice in the input file, while @var{b} and @var{y}
never appear together.

An optional grouping operation can be used instead of counting.

@opindex sum
@cindex sum, crosstab and
@cindex crosstab and sum
For each pair, @option{sum} the values in the third column:

@example
$ datamash -s crosstab 1,2 sum 3 < input.txt
     x    y
a    43   7
b    21   N/A
@end example

@opindex unique
@cindex unique, crosstab and
@cindex crosstab and unique
For each pair, list all @option{unique} values in the third column:

@example
$ datamash -s crosstab 1,2 unique 3 < input.txt
     x    y
a    3,40 7
b    21   N/A
@end example

@opindex --header-out
@cindex --header-out, crosstab and
@cindex crosstab and --header-out
Note that using @option{--header-out} with crosstab prints a line showing
how to interpret the rows and columns, and what operation was used.

@example
$ datamash -s --header-in --header-out crosstab 1,2 < input.txt
GroupBy(a) GroupBy(x) count(a)
     x    y
a    1    1
b    1    N/A
@end example

@node Rounding numbers
@section Rounding numbers

@cindex rounding numbers
@opindex round
@opindex ceil
@opindex floor
@opindex trunc
@opindex frac

The following demonstrate the different rounding operations:

@example
$ ( echo X ; seq -1.25 0.25 1.25 ) \
      | datamash --full -H round 1 ceil 1 floor 1 trunc 1 frac 1

  X     round(X)  ceil(X)  floor(X)  trunc(X)   frac(X)
-1.25   -1        -1       -2        -1         -0.25
-1.00   -1        -1       -1        -1          0
-0.75   -1         0       -1         0         -0.75
-0.50   -1         0       -1         0         -0.5
-0.25    0         0       -1         0         -0.25
 0.00    0         0        0         0          0
 0.25    0         1        0         0          0.25
 0.50    1         1        0         0          0.5
 0.75    1         1        0         0          0.75
 1.00    1         1        1         1          0
 1.25    1         2        1         1          0.25
@end example


@node Binning numbers
@section Binning numbers
@opindex bin
@cindex buckets, binning numbers
@cindex binning numbers

Bin input values into buckets of size 5:

@example
$ ( echo X ; seq -10 2.5 10 ) \
      | datamash -H --full bin:5 1
    X  bin(X)
-10.0    -10
 -7.5    -10
 -5.0     -5
 -2.5     -5
  0.0      0
  2.5      0
  5.0      5
  7.5      5
 10.0     10
@end example

@node Binning strings
@section Binning strings
@opindex strbin
@cindex buckets, binning strings
@cindex binning strings

Hash any string input value into a numeric integer.
A typical usage would be to split an input file
into @var{N} chunks, ensuring that all values of a certain key will
be stored in the same chunk:

@example
$ cat input.txt
PatientA   10
PatientB   11
PatientC   12
PatientA   14
PatientC   15
@end example

Each patient ID is hashed into a bin between 0 and 9
and printed in the last field:

@example
$ datamash --full strbin 1 < input.txt
PatientA   10    5
PatientB   11    6
PatientC   12    7
PatientA   14    5
PatientC   15    7
@end example

Splitting the input into chunks can be done with awk:

@example
@verbatim
$ cat input.txt | datamash --full strbin 1 \
    | awk '{print > $NF ".txt"}'
@end verbatim
@end example


@node Extracting numeric values
@section Extracting numeric values - using getnum
@opindex getnum
@cindex numbers, extracting from a field

The @code{getnum} operation extracts a numeric value from the field:

@example
@verbatim
$ echo zoom-123.45xyz | datamash getnum 1
123.45
@end verbatim
@end example


@code{getnum} accepts an optional single-letter @var{TYPE} option:

@table @option
@item getnum:n
natural numbers (positive integers, including zero)
@item getnum:i
integers
@item getnum:d
decimal point numbers
@item getnum:p
positive decimal point numbers (this is the default)
@item getnum:h
hex numbers
@item getnum:o
octal numbers
@end table

Examples:

@example
@verbatim
$ echo zoom-123.45xyz | datamash getnum 1
123.45

$ echo zoom-123.45xyz | datamash getnum:n 1
123

$ echo zoom-123.45xyz | datamash getnum:i 1
-123

$ echo zoom-123.45xyz | datamash getnum:d 1
123.45

$ echo zoom-123.45xyz | datamash getnum:p 1
-123.45

# Hex 0x123 = 291 Decimal
$ echo zoom-123.45xyz | datamash getnum:h 1
291

# Octal 0123 = 83 Decimal
$ echo zoom-123.45xyz | datamash getnum:o 1
83
@end verbatim
@end example




@node Reporting bugs
@chapter Reporting bugs

@cindex bug reporting
@cindex problems
@cindex reporting bugs

To report bugs, suggest enhancements or otherwise discuss GNU Datamash,
please send electronic mail to @email{bug-datamash@@gnu.org}.

@cindex checklist for bug reports
For bug reports, please include enough information for the maintainers
to reproduce the problem.  Generally speaking, that means:

@itemize @bullet
@item The version numbers of Datamash (which you can find by running
      @w{@samp{datamash --version}}) and any other program(s) or
      manual(s) involved.
@item Hardware and operating system names and versions.
@item The contents of any input files necessary to reproduce the bug.
@item The expected behavior and/or output.
@item A description of the problem and samples of any erroneous output.
@item Options you gave to @command{configure} other than specifying
      installation directories.
@item Anything else that you think would be helpful.
@end itemize

When in doubt whether something is needed or not, include it.  It's
better to include too much than to leave out something important.

@cindex patches, contributing
Patches are welcome; if possible, please make them with @samp{@w{diff
-u}} (@pxref{Top,, Overview, diff, Comparing and Merging Files}) and
include @file{ChangeLog} entries (@pxref{Change Log,,, emacs, The GNU
Emacs Manual}).  Please follow the existing coding style.


@node GNU Free Documentation License
@appendix GNU Free Documentation License

@include fdl.texi


@node Concept index
@unnumbered Concept index

@printindex cp

@bye