1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610
|
<!DOCTYPE reference PUBLIC "-//OASIS//DTD DocBook V4.1//EN" [
<!ENTITY datapacker "<application>datapacker</application>">
]>
<!-- "file:///usr/share/sgml/docbook/dtd/xml/4.2/docbookx.dtd"> -->
<reference>
<title>datapacker Manual</title>
<refentry>
<refentryinfo>
<address><email>jgoerzen@complete.org</email></address>
<author><firstname>John</firstname><surname>Goerzen</surname></author>
</refentryinfo>
<refmeta>
<refentrytitle>datapacker</refentrytitle>
<manvolnum>1</manvolnum>
<refmiscinfo>John Goerzen</refmiscinfo>
</refmeta>
<refnamediv>
<refname>datapacker</refname>
<refpurpose>Tool to pack files into the minimum number
of bins</refpurpose>
</refnamediv>
<refsynopsisdiv>
<cmdsynopsis>
<command>datapacker</command>
<arg>-0</arg>
<arg>-a <replaceable>ACTION</replaceable></arg>
<arg>-b <replaceable>FORMAT</replaceable></arg>
<arg>-d</arg>
<arg>-p</arg>
<arg>-S <replaceable>SIZE</replaceable></arg>
<arg choice="plain">-s <replaceable>SIZE</replaceable></arg>
<arg choice="plain" rep="repeat"><replaceable>FILE</replaceable></arg>
</cmdsynopsis>
<cmdsynopsis>
<command>datapacker</command>
<group choice="plain"><arg>-h</arg><arg>--help</arg></group>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
<title>Description</title>
<para>&datapacker; is a tool to group files by size. It is
designed to group files such that they fill fixed-size
containers (called "bins") using the minimum number of
containers. This is useful, for instance, if you want to
archive a number of files to CD or DVD, and want to organize
them such that you use the minimum possible number of CDs or
DVDs.
</para>
<para>
In many cases, &datapacker; executes almost instantaneously.
Of particular note, the <literal>hardlink</literal> action
(see OPTIONS below) can be used to effectively copy data into
bins without having to actually copy the data at all.
</para>
<para>
&datapacker; is a tool in the traditional Unix style; it can
be used in pipes and call other tools.
</para>
<refsect1>
<title>Options</title>
<para>
Here are the command-line options you may set for
&datapacker;. Please note that <option>-s</option> and at
least one file (see FILE SPECIFICATION below) is mandatory.
</para>
<variablelist>
<varlistentry>
<term>-0</term>
<term>--null</term>
<listitem><para>
When reading a list of files from standard input (see
FILE SPECIFICATION below), expect the input to be
separated by NULL (ASCII 0) characters instead of one
per line. Especially useful with <command>find
-print0</command>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-a <replaceable>ACTION</replaceable></term>
<term>--action=<replaceable>ACTION</replaceable></term>
<listitem>
<para>
Defines what action to take with the matches. Please
note that, with any action, the output will be sorted by
bin, with bin 1 first. Possible
actions include:
</para>
<variablelist>
<varlistentry>
<term>print</term>
<listitem><para>Print one human-readable line per
file. Each line contains the bin number (in the
format given by <option>-b</option>), an ASCII tab
character, then the filename.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>printfull</term>
<listitem><para>Print one semi-human-readable line per
bin. Each line contains the bin number, then a list
of filenames to place in that bin,
with an ASCII tab character after the
bin number and between each filename.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>print0</term>
<listitem>
<para>
For each file, output the bin number (according to the
format given by <option>-b</option>), an ASCII
NULL character, the filename, and another ASCII
NULL character. Ideal for use with <literal>xargs
-0 -L 2</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>exec:<replaceable>COMMAND</replaceable></term>
<listitem>
<para>
For each file, execute the specified COMMAND via
the shell. The program COMMAND will be passed
information on its command line as indicated below.
</para>
<para>
It is an error if the generated command line for a
given bin is too large for the system.
</para>
<para>
A nonzero exit code from any COMMAND will cause
&datapacker; to terminate. If COMMAND contains
quotes, don't forget to quote the entire command,
as in:
</para>
<programlisting>
datapacker '--action=exec:echo "Bin: $1"; shift; ls "$@"'
</programlisting>
<para>
The arguments to the given command will be:
</para>
<itemizedlist>
<listitem><para>
<literal>argv[0]</literal> ($0 in shell) will
be the name of the shell used to invoke the
command -- <literal>$SHELL</literal> or
<literal>/bin/sh</literal>.
</para>
</listitem>
<listitem><para>
<literal>argv[1]</literal> ($1 in shell) will
be the bin number, formatted according to
<option>-b</option>.</para>
</listitem>
<listitem><para>
<literal>argv[2]</literal> and on ($2 and on
in shell) will be the files to place in that
bin
</para>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
<varlistentry>
<term>hardlink</term>
<listitem>
<para>
For each file, create a hardlink at
<replaceable>bin</replaceable>/<replaceable>filename</replaceable>
pointing to the original input filename. Creates
the directory <replaceable>bin</replaceable> as
necessary. Alternative locations and formats for
<replaceable>bin</replaceable> can be specified
with <option>-b</option>. All bin directories and
all input must reside on the same filesystem.
</para>
<para>
After you are done processing the results of the
bin, you may safely delete the bins without
deleting original data. Alternatively, you could
leave the bins and delete the original data.
Either approach will be workable.
</para>
<para>
It is an error to attempt to make a hard link
across filesystems, or to have two input files
with the same filename in different paths.
&datapacker; will exit on either of these situations.
</para>
<para>
See also <option>--deep-links</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>symlink</term>
<listitem>
<para>Like <option>hardlink</option>, but create
symlinks instead. Symlinks can span filesystems,
but you will lose information if you remove the
original (pre-bin) data. Like
<option>hardlink</option>, it is an error to have a
single filename occur in multiple input directories
with this option.
</para>
<para>
See also <option>--deep-links</option>.
</listitem>
</varlistentry>
</variablelist>
</listitem>
</varlistentry>
<varlistentry>
<term>-b <replaceable>FORMAT</replaceable></term>
<term>--binfmt=<replaceable>FORMAT</replaceable></term>
<listitem>
<para>
Defines the output format for the bin name. This format
is given as a <literal>%d</literal> input to a function
that interprets it as
<application>printf</application>(3) would.
This can be useful both to define the name and the
location of your bins. When running &datapacker; with
certain arguments, the bin format can be taken to be a
directory in which files in that bin are linked. The
default is <literal>%03d</literal>, which outputs
integers with leading zeros to make all bin names at
least three characters wide.
</para>
<para>
Other useful variants could include
<literal>destdir/%d</literal> to put the string
<literal>"destdir/"</literal> in front of the bin number,
which is rendered without leading zeros.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-d</term>
<term>--debug</term>
<listitem>
<para>
Enable debug mode. This is here for future expansion
and does not currently have any effect.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-D</term>
<term>--deep-links</term>
<listitem>
<para>
When used with the symlink or hardlink action, instead
of making all links in a single flat directory under the
bin, mimic the source directory structure under the
bin. Makes most sense when used with
<option>-p</option>, but could also be useful without it
if there are files with the same name in different
source directories.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>--help</term>
<listitem>
<para>
Display brief usage information and exit.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-p</term>
<term>--preserve-order</term>
<listitem>
<para>
Normally, &datapacker; uses an efficient algorithm that
tries to rearrange files such that the number of bins
required is minimized. Sometimes you may instead wish
to preserve the ordering of files at the expense of
potentially using more bins. In these cases, you would
want to use this option.
</para>
<para>
As an example of such a situation: perhaps you have
taken one photo a day for several years. You would like
to archive these photos to CD, but you want them to be
stored in chronological order. You have named the files
such that the names indicate order, so you can pass the
file list to &datapacker; using <option>-p</option> to
preserve the ordering in your bins. Thus, bin 1 will
contain the oldest files, bin 2 the second-oldest, and
so on. If <option>-p</option> wasn't used, you might
use fewer CDs, but the photos would be spread out across
all CDs without preserving your chronological order.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>-s <replaceable>SIZE</replaceable></term>
<term>--size=<replaceable>SIZE</replaceable></term>
<listitem>
<para>
Gives the size of each bin in bytes. Suffixes such as
"k", "m", "g", etc. may be used to indicate kilobytes,
megabytes, gigabytes, and so forth. Numbers such as
<literal>1.5g</literal> are valid, and if needed, will
be rounded to the nearest possible integer value.
</para>
<para>
The size of the first bin may be overridden with
<option>-S</option>.
</para>
<para>
Here are the sizes of some commonly-used bins. For each
item, I have provided you with both the underlying
recording capacity of the disc and a suggested value for
<option>-s</option>. The suggested value for
<option>-s</option> is lower than the underlying
capacity because there is overhead imposed by the
filesystem stored on the disc. You will perhaps find
that the suggested value for <option>-s</option> is
lower than optimal for discs that contain few large
files, and higher than desired for discs that contain
vast amounts of small files.
</para>
<itemizedlist>
<listitem><para>CD-ROM, 74-minute (standard): 650m / 600m</para>
</listitem>
<listitem><para>CD-ROM, 80-minute: 703m / 650m</para>
</listitem>
<listitem><para>CD-ROM, 90-minute: 790m / 740m</para>
</listitem>
<listitem><para>CD-ROM, 99-minute: 870m / 820m</para>
</listitem>
<listitem><para>DVD+-R: 4.377g / 4g</para>
</listitem>
<listitem><para>DVD+R, dual layer: 8.5g / 8g</para>
</listitem>
</itemizedlist>
</listitem>
</varlistentry>
<varlistentry>
<term>-S</term>
<term>--size-first</term>
<listitem>
<para>
The size of the first bin. If not given, defaults to
the value given with <option>-s</option>. This may be
useful if you will be using a mechanism outside
&datapacker; to add additional information to the first
bin: perhaps an index of which bin has which file, the
information necessary to make a CD bootable, etc. You
may use the same suffixes as with <option>-s</option>
with this option.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>--sort</term>
<listitem>
<para>
Sorts the list of files to process before acting upon
them. When combined with <option>-p</option>, causes
the output to be sorted. This option has no effect save
increasing CPU usage when not combined with
<option>-p</option>.
</para>
</listitem>
</varlistentry>
</variablelist>
<refsect2>
<title>File Specification</title>
<para>
After the options, you must supply one or more files to
consider for packing into bins. Alternatively, instead of
listing files on the command line, you may list a single
hyphen (<literal>-</literal>), which tells &datapacker; to
read the list of files from standard input (stdin).
</para>
<para>
&datapacker; never recurses into subdirectories. If you
want a recursive search -- finding all files in a given
directory and all its subdirectories -- see the second
example in the EXAMPLES section below. &datapacker; is
designed to integrate with
<application>find</application>(1) in this situation to
let you take advantage of find's built-in powerful recursion
and filtering features.
</para>
<para>
When reading files from standard input, it is assumed that
the list contains one distinct filename per line. Seasoned
POSIX veterans will recognize the inherent limitations in
this format. For that reason, when given
<option>-0</option> in conjunction with the single file
<literal>-</literal>, &datapacker; will instead expect, on
standard input, a list of files, each one terminated by an
ASCII NULL character. Such a list can be easily generated
with <application>find</application>(1) using its
<option>-print0</option> option.
</para>
</refsect2>
</refsect1>
<refsect1>
<title>Examples</title>
<itemizedlist>
<listitem>
<para>
Put all JPEG images in <literal>~/Pictures</literal>
into bins (using hardlinks) under the pre-existing directory
<literal>~/bins</literal>, no more than 600MB per bin:
</para>
<programlisting>
datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg
</programlisting>
</listitem>
<listitem>
<para>
Put all files in <literal>~/Pictures</literal> or any
subdirectory thereof into 600MB bins under ~/bins, using
hardlinking. This is a simple example to follow if you
simply want a recursive search of all files.
</para>
<programlisting>
find ~/Pictures -type f -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
</programlisting>
</listitem>
<listitem>
<para>
Find all JPEG images in <literal>~/Pictures</literal> or
any subdirectory thereof, put them into bins (using
hardlinks) under the pre-existing directory
<literal>~/bins</literal>, no more than 600MB per bin:
</para>
<programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
</programlisting>
</listitem>
<listitem>
<para>
Find all JPEG images as above, put them in 4GB bins,
but instead of putting them
anywhere, calculate the size of each bin and display it.
</para>
<programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d -s 4g \
'--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total' \
-
</programlisting>
<para>
This will display output like so:
</para>
<programlisting>
/home/jgoerzen/bins/001: 4.0G total
/home/jgoerzen/bins/002: 4.0G total
/home/jgoerzen/bins/003: 4.0G total
/home/jgoerzen/bins/004: 992M total
</programlisting>
<para>
Note: the <literal>grep</literal> pattern in this example
is simple, but will cause unexpected results if any
matching file contains the word "total".
</para>
</listitem>
<listitem>
<para>
Find all JPEG images as above, and generate 600MB ISO
images of them in ~/bins. This will generate the ISO
images directly without ever hardlinking files into
~/bins.
</para>
<programlisting>
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d.iso -s 4g \
'--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"' \
-
</programlisting>
<para>
You could, if you so desired, pipe this result directly
into a DVD-burning application. Or, you could use
<literal>growisofs</literal> to burn a DVD+R in a single
step.
</para>
</listitem>
</itemizedlist>
</refsect1>
<refsect1>
<title>Errors</title>
<para>
It is an error if any specified file exceeds the value given
with <literal>-s</literal> or <literal>-S</literal>.
</para>
<para>
It is also an error if any specified files disappear while
&datapacker; is running.
</para>
</refsect1>
<refsect1>
<title>Bugs</title>
<para>
Reports of bugs should be reported online at the
&datapacker; homepage.
Debian users are encouraged to instead use the
Debian
bug-tracking system.
</para>
</refsect1>
<refsect1>
<title>Copyright</title>
<para>&datapacker;, and this manual, are Copyright © 2008 John Goerzen.</para>
<para>
All code, documentation, and build scripts are under the following
license unless otherwise noted:
</para>
<para>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
</para>
<para>
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public License
along with this program. If not, see
<ulink url="http://www.gnu.org/licenses/"></ulink>.
</para>
<para>
The GNU General Public License is available in the file COPYING in the source
distribution. Debian GNU/Linux users may find this in
/usr/share/common-licenses/GPL-3.
</para>
<para>
If the GPL is unacceptable for your uses, please e-mail me; alternative
terms can be negotiated for your project.
</para>
</refsect1>
<refsect1>
<title>Author</title>
<para>&datapacker;, its libraries, documentation, and all included files, except where
noted, was written by John Goerzen <email>jgoerzen@complete.org</email> and
copyright is held as stated in the COPYRIGHT section.
</para>
<para>
&datapacker; may be downloaded, and information found, from its
<ulink url="http://software.complete.org/datapacker">homepage</ulink>.
</para>
</refsect1>
<refsect1>
<title>See Also</title>
<para><application>mkisofs</application>(1),
<application>genisoimage</application>(1)
</para>
</refsect1>
</refentry>
</reference>
<!--
Local Variables:
mode: sgml
sgml-set-face: T
End:
-->
|