File: msort.1

package info (click to toggle)
msort 8.53-2.2
  • links: PTS
  • area: main
  • in suites: bullseye, buster, sid
  • size: 2,360 kB
  • sloc: sh: 10,138; ansic: 10,031; makefile: 51
file content (310 lines) | stat: -rw-r--r-- 11,402 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
\" This file was partially generated by help2man 1.33.
.TH MSORT "1" "January 2010" "msort " "User Commands"
.SH NAME
msort \- sort records in complex ways
.SH SYNOPSIS
.B msort
<options> [<input file>]
.SH DESCRIPTION
.PP
.I msort
is a program for sorting text files in sophisticated ways.
It was developed initially for alphabetizing dictionaries of languages in which
the ordering may be quite different from English but has many other uses.
.PP
.I msort
allows you to sort blocks of text delimited in a number of ways rather than just lines
and to specify particular fields of a record as sort keys using either their position,
counted from either end, or by matching regular expressions to their tags.
.PP
.I msort 
is capable of sorting on multiple keys, so that when two records tie on one
key, the tie may be broken on another. Any or all keys may be optional.
How absent optional keys are ordered with respect to present keys may be set
separately for each key.
.PP
.I msort
allows you to specify arbitrary sort orders and to define
virtually unlimited numbers of multigraphs of effectively unlimited length.
The sort order and multigraphs are defined separately for each key. If your
system has locale support, you can also use locale collation rules instead
of specify your own sort order.

.PP
.I msort
provides twelve types of key comparison:
lexicographic,
numeric,
numeric string,
hybrid,
by string length,
by angle,
by date,
by domain name,
by time,
by ISO8601 date/time stamp,
by month name,
and random.

.PP
What month names are used is a bit complicated. If the 
.I -s
flag is used on the same key and its argument is the name of a file,
the month names are read from the file, which should be in the same
format as a sort order definition file. If the
.I -s
flag is used and its argument is a locale name, the month names recognized
will be the month names and abbreviations associated with the specified
locale. If the
.I -s
flag is not used the month names recognized will be the month names and
abbreviations associated with the current locale. If your system does
not have locale support and you do not use the
.I -s
flag to read the month names from a file, the month names recognized
will be the English month names and abbreviations.

.PP
.I
msort
can reverse the characters in a key, allowing it
to be used to generate reverse dictionaries.
.PP
A choice of sorting algorithms is provided.
.PP
.I msort
fully supports Unicode. The text to be sorted, and all specifications, should be in
UTF-8 Unicode. (If you have plain ASCII text, this is not a problem as ASCII is a
subset of Unicode.) Full Unicode case-folding is available, in Turkic and non-Turkic
variants. Unicode normalization is performed before sorting.
.PP
For usage information, execute
.I msort
with no arguments.
.PP
Full information about
.B msort
is currently to be found in the reference manual,
which is distributed as a PDF (Portable Document Format) file. If a
copy is not available locally, you can download it from msort's
home page: 
.br
http://billposer.org/Software/msort.html
.sp 1
.SH OPTIONS
.SS "Informational options"
.TP
\fB\-h,--help\fR
Print usage message
.TP
\fB\-v,--version\fR
Print version message
.TP
\fB\-D,--defaults\fR
List defaults
.TP
\fB\-F,--general-options\fR
List general command line options
.TP
\fB\-G,--gnu-equivalences\fR
List equivalents for GNU sort command line options.
.TP
\fB\-H,--informational-options\fR
List informational command line options
.TP
\fB\-K,--key-specific-options\fR
List key-specific command line options
.TP
\fB\-L,--limits\fR
List limits
.TP
\fB\-N,--number-systems\fR
List the supported number systems.
.SS "General options"
.TP
\fB\-b,--block\fR
A record is terminated by two or more newlines
.TP
\fB\-l,--line\fR
A record consists of a single line
.TP
\fB\-r,--record-separator\fR <separator>
A record is terminated by separator character
.TP
\fB\-O,--fixed-size-record\fR <bytes>
A record consists of the specified number of bytes.
.TP
\fB\-d,--field-separators\fR <character>+
Fields are delimited by the named character(s)
.TP
\fB\-w,--whole\fR
Sort on the entire text of the record
.TP
\fB\-a,--algorithm\fR <algorithm>
Use the specified sort algorithm. The choices are:
I(nsertionSort), M(ergeSort), Q(uickSort), and S(hellSort).
Note that InsertionSort and MergeSort are stable, while
QuickSort and ShellSort are unstable. The default is QuickSort.
.TP
\fB\-M,-initial-maximum-records\fR <records>
Set initial maximum number of records
.TP
\fB\-m,--line-end-carriage-return\fR
End-of-line in the input data is marked by Carriage Return (0x0D) as on the
Macintosh rather than by Line Feed (0x0A) as on Unix systems.
.TP
\fB\-I,--invert-globally\fR
Invert sense of comparisons globally
.TP
\fB\-B,--BMP\fR
No characters fall outside the Basic Multingual Plane (that is, have values
greater than 0xFFFF).
.TP
\fB\-Z,--skip-first-record\fR
Copy the first record in the input to the output without sorting it. This is useful
for sorting files with a header.
.TP
\fB\-p,--reserve-private-use-area\fR
Do not make internal use of the Private Use areas. By default, multigraphs are
assigned internally to codepoints in the Supplementary Private Use areas
if full Unicode is in use or to codepoints in the Private Use area if
input is restricted to the Basic Multilingual Plane by means of the \fI\-B\fR
option. If your input makes use of the Private Use areas, this option prevents
interference with your input. In this case, multigraphs will be assigned
to the Low and High Surrogate areas (0xD800-0xDFFF). Note that this
limits the number of multigraphs to 2,048.
.TP
\fB\-P,--random-seed\fR <seed>
Set the seed for the random number generator. If not set here, it is set to a value
determined by the time. The seed used is reported in the log. This option allows
runs to be replicated. 
.TP
\fB\-Q,--check-only\fR
Check whether the input is already sorted. Do not generate any output.
Exit status is 0 if input is already sorted, 11 if not sorted.
.TP
\fB\-1,--in\fR <input file name>
.TP
\fB\-2,--out\fR <output file name>
If the output file is the same as the input file, the input file will be
overwritten. The input file will not be overwritten if the run is unsuccessful.
.TP
\fB\-j,--suppress-log\fR
Suppress output to the log. If this flag is given before there is any output to the
log from a command line flag, nothing will be written to the log and the log file
will not be created. If a command line flag generates a log message before this flag
is processed, the log file will be created but no log messages will be written to
it once this flag is processed. To guarantee that no attempt will be made to open
a log file, give this flag first.
.TP
\fB\-q,--quiet\fR
Be quiet - do not chat while working
.TP
\fB\-u,--unicode-normalization\fR <mode>
Select Unicode normalization mode. The choices of mode are:
\fIc\fR for normalization form C (NFC),
\fId\fR for normalization form D (NFD),
\fIC\fR for normalization form KC (NFKC),
\fID\fR for normalization form KD (NFKD),
and \fIn\fR for no normalization. The default is NFC.
.SS "Key specific options"
.TP
\fB\-e,--character-range\fR <m,n>
Sort on characters m through n. Positive indices start from one.
Negative indices indicate position with respect to the end of the record.
For example, the range 
\fI3,-2\fR
consists of the third character through the next-to-last character.
.TP
\fB\-n,--position\fR <POS>(,<POS>)
Sort on the specified POS or contiguous range of POSs, where a POS is of the
form <field number>(.<character number>). Both counts begin at one.
Field numbers but not character numbers may be negative, in which case they are
counted from the right. Thus, 1.2 is the second character of the
first field; -2.1 is the first character of the next to last field.
.TP
\fB\-t,--tag\fR <tag regexp>
Sort on the field with the specified tag
.TP
\fB\-o,--optional\fR <comparison>
Optional: compare as (<,=,>) to present key if absent
.TP
\fB\-C,--fold-case\fR
Fold case
.TP
\fB\-z,--fold-case-turkic\fR
Fold case with additional Turkic conversions.
.TP
\fB\-c,--comparison-type\fR <comparison type>
a(ngle),l(exicographic), i(so8601 date/time), t(ime), D(omain name/email address), d(ate), m(onth name), n(umeric), N(umeric string),s(ize), h(hybrid), r(andom) 
.TP
\fB\-y,--number-system\fR <number system>
Specifies the number system expected for this key. This affects only numeric
and numeric string keys. There are two
special values. If the number system is "all", records may contain any number system
that msort can interpret. Different records may contain different number systems.
If the number system is "any", records may contain any writing system that msort can
interpret, but all records must make use of the same number system. 
.I msort
sets the number system on the basis of the first record.
.TP
\fB\-f,--date-format\fR <date format>
Permutation of ymd with separators, e.g. y-m-d for international date format, m/d/y for American date format, or a permutation of yd with separators, e.g. y-d, for day-of-year dates. All three components may be numbers in any available number system. The month field may also be a month name, determined by the same devices as independent month name fields.
.TP
\fB\-W,--sort-order-file-separators\fR <file name>
Read the list of characters to be treated as separators in the sort order definition file.
.TP
\fB\-S,--substitutions\fR <file name>
Read substitutions from named file
.TP
\fB\-s,--sort-order\fR <file name>|<locale name>|"locale"
If the argument is a file name, it is taken to be a sort order file and the sort
order for the key is read from the file. If the argument is a locale name, the collation
rules for that locale are used. If the argument is "locale", the collation rules for
the current locale are used.
.TP
\fB\-T,--transformations\fR <(d)(e)(s)>
Apply the specified transformations.
.I d
specifies that diacritics are to be stripped. Separately encoded combining diacritics
are removed. Characters with diacritics represented by	single codepoints are
replaced with the corresponding ASCII character without the diacritics, if there is one.
.I e
specifies that enclosed characters, that is, characters within circles or parentheses, are
to be replaced with the corresponding plain ASCII character if there is one.
.I s
specifies that characters in special styles are to be replaced with the corresponding
plain ASCII character if there is one. Stylistic equivalents include:
small capitals (e.g. U+1D04),
script forms (e.g. U+212C),
black letter forms (e.g. U+212D),
Arabic presentation forms (e.g. U+FE81),
Hebrew presentation forms (e.g. U+FB1D),
fullwidth forms (e.g. U+FF01),
halfwidth forms (e.g. U+FF7B),
and the mathematical alphanumeric symbols (e.g. U+1D400).
.TP
\fB\-x,--exclusion-file\fR <file name>
Read exclusions from named file
.TP
\fB\-X,--exclude-characters\fR <exclusions>
Exclude specified characters
.TP
\fB\-i,--invert-locally\fR
Invert sense of comparisons
.TP
\fB\-R,--reverse-key\fR
Reverse characters of key
.TP
\fB\-A,--first-character-only\fR
Ignore all but the first character of the field, after substitutions, exclusions, etc.
.PP
Note: long options may not be available on your system.
.SH "SEE ALSO"
sort(1), uninum(3)
.sp 1
.SH AUTHOR
Bill Poser (billposer@alum.mit.edu)
.SH LICENSE
GNU General Public License (http://www.gnu.org/licenses/gpl.html), version 3.