File: datapacker.sgml

package info (click to toggle)
datapacker 1.0.2
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 128 kB
  • sloc: haskell: 228; makefile: 78
file content (610 lines) | stat: -rw-r--r-- 23,875 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
  <!ENTITY datapacker "<application>datapacker</application>">
]>
<!--      "file:///usr/share/sgml/docbook/dtd/xml/4.2/docbookx.dtd"> -->

<reference>
  <title>datapacker Manual</title>

  <refentry>
    <refentryinfo>
      <address><email>jgoerzen@complete.org</email></address>
      <author><firstname>John</firstname><surname>Goerzen</surname></author>
    </refentryinfo>

    <refmeta>
      <refentrytitle>datapacker</refentrytitle>
      <manvolnum>1</manvolnum>
      <refmiscinfo>John Goerzen</refmiscinfo>
    </refmeta>

    <refnamediv>
      <refname>datapacker</refname>
      <refpurpose>Tool to pack files into the minimum number
        of bins</refpurpose>
    </refnamediv>

    <refsynopsisdiv>
      <cmdsynopsis>
        <command>datapacker</command>
        <arg>-0</arg>
        <arg>-a <replaceable>ACTION</replaceable></arg>
        <arg>-b <replaceable>FORMAT</replaceable></arg>
        <arg>-d</arg>
        <arg>-p</arg>
        <arg>-S <replaceable>SIZE</replaceable></arg>
        <arg choice="plain">-s <replaceable>SIZE</replaceable></arg>
        <arg choice="plain" rep="repeat"><replaceable>FILE</replaceable></arg>
      </cmdsynopsis>
      <cmdsynopsis>
	<command>datapacker</command>
	<group choice="plain"><arg>-h</arg><arg>--help</arg></group>
      </cmdsynopsis>
    </refsynopsisdiv>

    <refsect1>
      <title>Description</title>

      <para>&datapacker; is a tool to group files by size.  It is
        designed to group files such that they fill fixed-size
        containers (called "bins") using the minimum number of
        containers.  This is useful, for instance, if you want to
        archive a number of files to CD or DVD, and want to organize
        them such that you use the minimum possible number of CDs or
        DVDs.
      </para>

      <para>
        In many cases, &datapacker; executes almost instantaneously.
        Of particular note, the <literal>hardlink</literal> action
        (see OPTIONS below) can be used to effectively copy data into
        bins without having to actually copy the data at all.
      </para>
      
      <para>
        &datapacker; is a tool in the traditional Unix style; it can
        be used in pipes and call other tools.
      </para>

    <refsect1>
      <title>Options</title>
      <para>
        Here are the command-line options you may set for
        &datapacker;.  Please note that <option>-s</option> and at
        least one file (see FILE SPECIFICATION below) is mandatory.
      </para>

      <variablelist>
        <varlistentry>
          <term>-0</term>
          <term>--null</term>
          <listitem><para>
              When reading a list of files from standard input (see
              FILE SPECIFICATION below), expect the input to be
              separated by NULL (ASCII 0) characters instead of one
              per line.  Especially useful with <command>find
                -print0</command>.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-a <replaceable>ACTION</replaceable></term>
          <term>--action=<replaceable>ACTION</replaceable></term>
          <listitem>
            <para>
              Defines what action to take with the matches.  Please
              note that, with any action, the output will be sorted by
              bin, with bin 1 first.  Possible
              actions include:
            </para>
            <variablelist>
              <varlistentry>
                <term>print</term>
                <listitem><para>Print one human-readable line per
                    file.  Each line contains the bin number (in the
                    format given by <option>-b</option>), an ASCII tab
                    character, then the filename.
                  </para>
                </listitem>
              </varlistentry>

              <varlistentry>
                <term>printfull</term>
                <listitem><para>Print one semi-human-readable line per
                    bin.  Each line contains the bin number, then a list
                    of filenames to place in that bin,
                    with an ASCII tab character after the
                    bin number and between each filename.
                  </para>
                </listitem>
              </varlistentry>

              <varlistentry>
                <term>print0</term>
                <listitem>
                  <para>
                    For each file, output the bin number (according to the
                    format given by <option>-b</option>), an ASCII
                    NULL character, the filename, and another ASCII
                    NULL character.  Ideal for use with <literal>xargs
                      -0 -L 2</literal>.
                  </para>
                </listitem>
              </varlistentry>

              <varlistentry>
                <term>exec:<replaceable>COMMAND</replaceable></term>
                <listitem>
                <para>
                    For each file, execute the specified COMMAND via
                    the shell.  The program COMMAND will be passed
                    information on its command line as indicated below.
                  </para>
                  <para>
                    It is an error if the generated command line for a
                    given bin is too large for the system.
                  </para>
                  <para>
                    A nonzero exit code from any COMMAND will cause
                    &datapacker; to terminate.  If COMMAND contains
                    quotes, don't forget to quote the entire command,
                    as in:
                  </para>
                  <programlisting>
datapacker '--action=exec:echo "Bin: $1"; shift; ls "$@"'
                  </programlisting>
                  <para>
                    The arguments to the given command will be:
                  </para>
                  <itemizedlist>
                    <listitem><para>
                        <literal>argv[0]</literal> ($0 in shell) will
                        be the name of the shell used to invoke the
                        command -- <literal>$SHELL</literal> or
                        <literal>/bin/sh</literal>.
                      </para>
                    </listitem>
                    <listitem><para>
                        <literal>argv[1]</literal> ($1 in shell) will
                        be the bin number, formatted according to
                        <option>-b</option>.</para>
                    </listitem>
                    <listitem><para>
                        <literal>argv[2]</literal> and on ($2 and on
                        in shell) will be the files to place in that
                        bin
                      </para>
                    </listitem>
                  </itemizedlist>
                </listitem>
              </varlistentry>

              <varlistentry>
                <term>hardlink</term>
                <listitem>
                  <para>
                    For each file, create a hardlink at
                    <replaceable>bin</replaceable>/<replaceable>filename</replaceable>
                    pointing to the original input filename.  Creates
                    the directory <replaceable>bin</replaceable> as
                    necessary.  Alternative locations and formats for
                    <replaceable>bin</replaceable> can be specified
                    with <option>-b</option>.  All bin directories and
                    all input must reside on the same filesystem.
                  </para>
                  <para>
                    After you are done processing the results of the
                    bin, you may safely delete the bins without
                    deleting original data.  Alternatively, you could
                    leave the bins and delete the original data.
                    Either approach will be workable.
                  </para>
                  <para>
                    It is an error to attempt to make a hard link
                    across filesystems, or to have two input files
                    with the same filename in different paths.
                    &datapacker; will exit on either of these situations.
                  </para>
                  <para>
                    See also <option>--deep-links</option>.
                  </para>
                </listitem>
              </varlistentry>

              <varlistentry>
                <term>symlink</term>
                <listitem>
                  <para>Like <option>hardlink</option>, but create
                  symlinks instead.  Symlinks can span filesystems,
                  but you will lose information if you remove the
                  original (pre-bin) data.  Like
                  <option>hardlink</option>, it is an error to have a
                  single filename occur in multiple input directories
                  with this option.
                  </para>
                  <para>
                    See also <option>--deep-links</option>.
                </listitem>
              </varlistentry>
            </variablelist>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-b <replaceable>FORMAT</replaceable></term>
          <term>--binfmt=<replaceable>FORMAT</replaceable></term>
          <listitem>
            <para>
              Defines the output format for the bin name.  This format
              is given as a <literal>%d</literal> input to a function
              that interprets it as
              <application>printf</application>(3) would.
              This can be useful both to define the name and the
              location of your bins.  When running &datapacker; with
              certain arguments, the bin format can be taken to be a
              directory in which files in that bin are linked.  The
              default is <literal>%03d</literal>, which outputs
              integers with leading zeros to make all bin names at
              least three characters wide.
            </para>
            <para>
              Other useful variants could include
              <literal>destdir/%d</literal> to put the string
              <literal>"destdir/"</literal> in front of the bin number,
              which is rendered without leading zeros.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-d</term>
          <term>--debug</term>
          <listitem>
            <para>
              Enable debug mode.  This is here for future expansion
              and does not currently have any effect.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-D</term>
          <term>--deep-links</term>
          <listitem>
            <para>
              When used with the symlink or hardlink action, instead
              of making all links in a single flat directory under the
              bin, mimic the source directory structure under the
              bin.  Makes most sense when used with
              <option>-p</option>, but could also be useful without it
              if there are files with the same name in different
              source directories.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>--help</term>
          <listitem>
            <para>
              Display brief usage information and exit.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-p</term>
          <term>--preserve-order</term>
          <listitem>
            <para>
              Normally, &datapacker; uses an efficient algorithm that
              tries to rearrange files such that the number of bins
              required is minimized.  Sometimes you may instead wish
              to preserve the ordering of files at the expense of
              potentially using more bins.  In these cases, you would
              want to use this option.
            </para>
            <para>
              As an example of such a situation: perhaps you have
              taken one photo a day for several years.  You would like
              to archive these photos to CD, but you want them to be
              stored in chronological order.  You have named the files
              such that the names indicate order, so you can pass the
              file list to &datapacker; using <option>-p</option> to
              preserve the ordering in your bins.  Thus, bin 1 will
              contain the oldest files, bin 2 the second-oldest, and
              so on.  If <option>-p</option> wasn't used, you might
              use fewer CDs, but the photos would be spread out across
              all CDs without preserving your chronological order.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>-s <replaceable>SIZE</replaceable></term>
          <term>--size=<replaceable>SIZE</replaceable></term>
          <listitem>
            <para>
              Gives the size of each bin in bytes.  Suffixes such as
              "k", "m", "g", etc. may be used to indicate kilobytes,
              megabytes, gigabytes, and so forth.  Numbers such as
              <literal>1.5g</literal> are valid, and if needed, will
              be rounded to the nearest possible integer value.
            </para>
            <para>
              The size of the first bin may be overridden with
              <option>-S</option>.
            </para>
            <para>
              Here are the sizes of some commonly-used bins.  For each
              item, I have provided you with both the underlying
              recording capacity of the disc and a suggested value for
              <option>-s</option>.  The suggested value for
              <option>-s</option> is lower than the underlying
              capacity because there is overhead imposed by the
              filesystem stored on the disc.  You will perhaps find
              that the suggested value for <option>-s</option> is
              lower than optimal for discs that contain few large
              files, and higher than desired for discs that contain
              vast amounts of small files.
            </para>
            <itemizedlist>
              <listitem><para>CD-ROM, 74-minute (standard): 650m / 600m</para>
              </listitem>
              <listitem><para>CD-ROM, 80-minute: 703m / 650m</para>
              </listitem>
              <listitem><para>CD-ROM, 90-minute: 790m / 740m</para>
              </listitem>
              <listitem><para>CD-ROM, 99-minute: 870m / 820m</para>
              </listitem>
              <listitem><para>DVD+-R: 4.377g / 4g</para>
              </listitem>
              <listitem><para>DVD+R, dual layer: 8.5g / 8g</para>
              </listitem>
            </itemizedlist>
          </listitem>
        </varlistentry>
        
        <varlistentry>
          <term>-S</term>
          <term>--size-first</term>
          <listitem>
            <para>
              The size of the first bin.  If not given, defaults to
              the value given with <option>-s</option>.  This may be
              useful if you will be using a mechanism outside
              &datapacker; to add additional information to the first
              bin: perhaps an index of which bin has which file, the
              information necessary to make a CD bootable, etc.  You
              may use the same suffixes as with <option>-s</option>
              with this option.
            </para>
          </listitem>
        </varlistentry>

        <varlistentry>
          <term>--sort</term>
          <listitem>
            <para>
              Sorts the list of files to process before acting upon
              them.  When combined with <option>-p</option>, causes
              the output to be sorted.  This option has no effect save
              increasing CPU usage when not combined with
              <option>-p</option>.
            </para>
          </listitem>
        </varlistentry>

      </variablelist>
      <refsect2>
        <title>File Specification</title>
        <para>
          After the options, you must supply one or more files to
          consider for packing into bins.  Alternatively, instead of
          listing files on the command line, you may list a single
          hyphen (<literal>-</literal>), which tells &datapacker; to
          read the list of files from standard input (stdin).
        </para>
        <para>
          &datapacker; never recurses into subdirectories.  If you
          want a recursive search -- finding all files in a given
          directory and all its subdirectories -- see the second
          example in the EXAMPLES section below.  &datapacker; is
          designed to integrate with
          <application>find</application>(1) in this situation to
          let you take advantage of find's built-in powerful recursion
          and filtering features.
        </para>
        <para>
          When reading files from standard input, it is assumed that
          the list contains one distinct filename per line.  Seasoned
          POSIX veterans will recognize the inherent limitations in
          this format.  For that reason, when given
          <option>-0</option> in conjunction with the single file
          <literal>-</literal>, &datapacker; will instead expect, on
          standard input, a list of files, each one terminated by an
          ASCII NULL character.  Such a list can be easily generated
          with <application>find</application>(1) using its
          <option>-print0</option> option.
        </para>
      </refsect2>
    </refsect1>

    <refsect1>
      <title>Examples</title>
      <itemizedlist>
        <listitem>
          <para>
            Put all JPEG images in <literal>~/Pictures</literal>
            into bins (using hardlinks) under the pre-existing directory
            <literal>~/bins</literal>, no more than 600MB per bin:
          </para>
          <programlisting>
datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg
          </programlisting>
        </listitem>

        <listitem>
          <para>
            Put all files in <literal>~/Pictures</literal> or any
            subdirectory thereof into 600MB bins under ~/bins, using
            hardlinking.  This is a simple example to follow if you
            simply want a recursive search of all files.
          </para>
          <programlisting>
find ~/Pictures -type f -print0 | \
  datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
          </programlisting>
        </listitem>
        <listitem>
          <para>
            Find all JPEG images in <literal>~/Pictures</literal> or
            any subdirectory thereof, put them into bins (using
            hardlinks) under the pre-existing directory
            <literal>~/bins</literal>, no more than 600MB per bin:
          </para>
          <programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
  datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
          </programlisting>
        </listitem>

        <listitem>
          <para>
            Find all JPEG images as above, put them in 4GB bins,
            but instead of putting them
            anywhere, calculate the size of each bin and display it.
          </para>
          <programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
  datapacker -0 -b ~/bins/%03d -s 4g \
  '--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total' \
  -
          </programlisting>
          <para>
            This will display output like so:
          </para>
          <programlisting>
/home/jgoerzen/bins/001: 4.0G   total
/home/jgoerzen/bins/002: 4.0G   total
/home/jgoerzen/bins/003: 4.0G   total
/home/jgoerzen/bins/004: 992M   total
          </programlisting>
          <para>
            Note: the <literal>grep</literal> pattern in this example
            is simple, but will cause unexpected results if any
            matching file contains the word "total".
          </para>
        </listitem>

        <listitem>
          <para>
            Find all JPEG images as above, and generate 600MB ISO
            images of them in ~/bins.  This will generate the ISO
            images directly without ever hardlinking files into
            ~/bins.
          </para>
          <programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
  datapacker -0 -b ~/bins/%03d.iso -s 4g \
  '--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"' \
  -
          </programlisting>
          <para>
            You could, if you so desired, pipe this result directly
            into a DVD-burning application.  Or, you could use
            <literal>growisofs</literal> to burn a DVD+R in a single
            step.
          </para>
        </listitem>
      </itemizedlist>
    </refsect1>
      
    <refsect1>
      <title>Errors</title>
      <para>
        It is an error if any specified file exceeds the value given
        with <literal>-s</literal> or <literal>-S</literal>.
      </para>
      <para>
        It is also an error if any specified files disappear while
        &datapacker; is running.
      </para>
    </refsect1>
    <refsect1>
	<title>Bugs</title>
	<para>
          Reports of bugs should be reported online at the
          &datapacker; homepage.
          Debian users are encouraged to instead use the
	Debian
          bug-tracking system.
	</para>
    </refsect1>


    <refsect1>
      <title>Copyright</title>
      <para>&datapacker;, and this manual, are Copyright &copy; 2008 John Goerzen.</para>
      <para>
        All code, documentation, and build scripts are under the following
        license unless otherwise noted:
      </para>
      <para>
    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
      </para>
      <para>
    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
      </para>
      <para>
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see
    <ulink url="http://www.gnu.org/licenses/"></ulink>.
      </para>
      <para>
The GNU General Public License is available in the file COPYING in the source
distribution.  Debian GNU/Linux users may find this in
/usr/share/common-licenses/GPL-3.
      </para>
      <para>
If the GPL is unacceptable for your uses, please e-mail me; alternative
terms can be negotiated for your project.
      </para>
    </refsect1>

    <refsect1>
      <title>Author</title>
      <para>&datapacker;, its libraries, documentation, and all included files, except where
	noted, was written by John Goerzen <email>jgoerzen@complete.org</email> and
	copyright is held as stated in the COPYRIGHT section.
      </para>

      <para>
	&datapacker; may be downloaded, and information found, from its
	<ulink url="http://software.complete.org/datapacker">homepage</ulink>.
      </para>

    </refsect1>

    <refsect1>
      <title>See Also</title>
      <para><application>mkisofs</application>(1),
	<application>genisoimage</application>(1)
      </para>
    </refsect1>
  </refentry>
</reference>

<!--
Local Variables:
mode: sgml
sgml-set-face: T
End:
-->