1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491
|
.TH K2pdfopt 1 "2017-12-07" "User Commands"
.SH NAME
K2pdfopt \- PDF Reflow tool
.SH SYNOPSIS
.B k2pdfopt
.RI [ opts "] " "<input pdf/djvu | folder>"
.SH DESCRIPTION
.B K2pdfopt
attempts to optimize PDF (or DJVU) files (especially two\-column ones) for
display on the Kindle (or other mobile readers/smartphones) by looking for
rectangular regions in the file and re\-paginating them without margins and
excess white space. Works on any PDF or DJVU (.djvu) file, but assumes it
has a mostly\-white background. Native PDF files (not scanned) work best.
If given a folder, \fBk2pdfopt\fR first looks for bitmaps in the folder and if
any are found, converts those bitmaps to a PDF as if they were pages of a
PDF file. If there are no bitmaps in the folder and if PDF files are in
the folder, then each PDF file will be converted in sequence.
Output files are always .pdf and have _k2opt added to the source name by
default (see \fB\-o\fR option to specify alternate output name.)
.SH Environment variable
You can supply command\-line options via the environment variable \fBK2PDFOPT\fR,
for example,
.RS
.EX
set K2PDFOPT=-ui- -x -j 0 -m 0.25
.EE
.RE
Command line options from the command line take precedence over the ones in
the environment variable \fBK2PDFOPT\fR.
.SH OPTIONS
.PP
You may not have to read this manual - run
.B k2pdfopt
without any option and use the interactive menu to select desired options.
.TP
.BR \-?[\-] " [\fIpattern\fR]"
Show [don't show] usage only (no file processing).
If pattern is specified, only options with text matching
the pattern are shown. The pattern can use * as a wild
card, e.g. \fB\-? \-col\fR. Use \fB\-?\-\fR to turn off usage.
Combine with \fB\-ui\-\fR to get something you can redirect
to a file.
.TP
.B \-a[\-]
Turn on [off] text coloring (use of ANSI color codes) on
the screen output. Default is on.
.TP
.BR \-ac[\-] " [\fI<aggressiveness>\fR]"
Auto crop. For books or papers that have dark edges
due to copying artifacts, this option will attempt to
automatically crop out those dark regions so that k2pdfopt
can correctly process the source file. The \fI<aggressiveness>\fR
factor is from 0 to 1. Higher is more aggressive cropping.
Default if not specified is 0.1. See also \fB\-m\fR.
Default value is off (\fB\-ac\-\fR).
Note that autocropping does not work on cropped regions
created with \fB\-cbox\fR. See \fB\-dw\fR for a discussion about this.
.TP
.BR \-as[\-] " [\fI<maxdeg>\fR]"
Attempt to automatically straighten tilted source pages.
Will rotate up to +/\-\fI<maxdegrees>\fR degrees if a value is
specified, otherwise defaults to 4 degrees max. Use -1 to
turn off. Default is off (\fB\-as\fR -1 or \fB\-as\-\fR).
Note that autostraighten does not work on cropped regions.
See \fB\-dw\fR for a discussion about this.
.TP
.BI \-author " <author>"
Set the author metadata / property of the PDF output
file(s). Default is to use the author of the source document
(\fB\-author\fR "").
.TP
.BI \-bmp[\-] " <pageno>"
Generate [do not generate] a bitmap rendering of converted
page number <pageno> and write it to file k2pdfopt_out.png.
If this option is used, no other files are written, i.e. the
complete conversion is NOT done\-\-ONLY the bitmap file is
written. If \fB\-sm\fR is also specified, then the bitmap is of
marked source page \fI<pageno>\fR. If \fB\-bmp\-\fR, then \fI<pageno>\fR is not
necessary. Default is \fB\-bmp\-\fR.
.TP
.BR \-bp[+|\-|\-\-] " [m|\fI<inches>\fP] "
Break [do not break] output pages at end of each input
page.
Default is \fB\-bp\-\fR. If a numeric value is put after \fB\-bp\fR,
then rather than breaking the output page at the end of each
input page, a gap is inserted of that many inches, e.g.
\fB\-bp\fR 1 will insert a 1\-inch gap between contents of each
input page. Special option \fB\-bp+\fR will break the pages at
the green boundaries between region as marked by the \fB\-sm\fR
option (see \fB\-sm\fR). If bookmark information is available
and \fB\-toc\fR is specified (on by default) page breaks will be
inserted in the converted file at each bookmark unless \fB\-bp\-\-\fR
is specified. If "\fB\-bp m\fR" is specified, then a page break
is inserted after each major (red\-box) section. This can
help prevent text selection overlap problems in native output
mode. See also \fB\-toc\fR, \fB\-bpl\fR.
.TP
.BI \-bpc " <nn>"
Set the bits per color plane on the output device to \fI<nn>\fR.
The value of \fI<nn>\fR can be 1, 2, 4, or 8. The default is 4
to match the kindle's display capability. This is ignored
if the \fB\-jpg\fR option is specified.
.TP
.BI \-bpl " <srcpagelist>"
Insert page break in destination file before each source
file page listed in \fI<srcpagelist>\fR. This has the same format
as the \fB\-p\fR option. See also \fB\-p\fR, \fB\-bp\fR, \fB\-toc\fR, \fB\-toclist\fR. Default
is no page list. Example: \fB\-bpl\fR 10,25,50,70,93,117,143.
This automatically sets \fB\-bp\fR to it's default value (\fB\-bp\-\fR).
.TP
.BI \-bpm[\fI<type>\fB] " <color>"
Set a page break mark type and color. This option allows
you to put colored marks in the PDF file to specify where to
break pages or where to avoid page breaks. \fI<type>\fR is either
1 to force a page break or 2 to prevent a page break until
next mark. \fI<color>\fR is an R,G,B triplet, 0\-1 for each color
component, no spaces. For example, to break the page
wherever the source file has a green dot or short green
horizontal line: \fB\-bpm1\fR 0,1,0. Use \fI<color>\fR = -1 to clear.
If you omit the \fI<type>\fR, 1 is assumed.
.TP
.B \-c[\-]
Output in color [grayscale]. Default is grayscale.
.TP
.BI \-cbox[\fI<pagelist>\fB|u|\-] " <cropbox>"
Similar to the \fB\-grid\fR option, but allows you to
specify exact crop boxes from the source page which will
then be processed as major (red\-box) regions. These regions
can then become individual output pages or can be processed
further (searched for columns, re\-flowed, etc.) depending on
what other options are selected. By default, they are
processed further, like every other major region.
You may specify the \fB\-cbox\fR option multiple times to crop out
different parts of each source page, each crop being treated
as a major region. See the \fB\-mode\fR command. To have each
crop box become a new page in the output file, for example,
use \fB\-mode\fR crop, e.g.
.RS 14
.EX
k2pdfopt myfile.pdf \-mode crop \-cbox 2in,3in
.EE
.RE
.IP
\fI<cropbox>\fR has the format \fI<left>\fR,\fI<top>\fR,\fI<width>\fR,\fI<height>\fR
where all values are specified from the upper\-left corner of
the source page, with units, like the \fB\-w\fR and \fB\-h\fR options,
except that the default units for \fB\-cbox\fR are inches. If only
\fI<left>\fR and \fI<top>\fR are specified, then \fI<width>\fR and \fI<height>\fR
extend to the edge of the page.
Example:
.EX
\fB\-cbox\fR 1in,1in,6in,9in
.EE
(same as \fB\-cbox\fR 1,1,6,9).
.RS 14
This specifies a crop box that is 6 x 9 inches and which
has an upper left corner which is 1 inch from the left
and top of the source page.
.RE
.IP
Use \fB\-cbox\-\fR to clear all cropboxes, which defaults back to
processing every page without any crop boxes.
You can use a page list, \fI<pagelist>\fR, to specify on which
pages to apply the cropboxes.
Examples:
.RS 14
.TP
\fB\-cbox\fR5\-51o
applies the cropbox on pages 5,7,9,...,51.
('o' = odd. Use 'e' for even.)
.TP
\fB\-cbox\fR1,2\-5,13,15
applies the cropbox on pages 1,2,3,4,5,13, and 15.
.TP
\fB\-cbox\fRc \fI<cropbox>\fR
applies \fI<cropbox>\fR to the cover image.
(see \fB\-ci\fR option.)
.RE
.IP
Be sure not to put a space between \fB\-cbox\fR and the page list.
Use \fB\-cboxu\fR to set a crop box for all unspecified pages.
E.g. \fB\-cbox1\-10 \fI<cbox1>\fR \fB\-cboxu \fI<cbox2>\fR will apply \fI<cbox1>\fR to
all pages 1 to 10 and \fI<cbox2>\fR to all other pages.
The default is no crop boxes (\fB\-cbox\-\fR). See also \fB\-m\fR, \fB\-ac\fR.
USAGE NOTE: Once you specify \fB\-cbox\fR at least one time, only
the crop boxes you specify (and any associated page ranges)
are processed/converted by k2pdfopt. No other pages or
regions are processed. So if you want to specify a special
cropbox for the first page, for example, but then have all
remaining pages treated entirely, you must specify this:
.RS 14
\fB\-cbox1\fR ... \fB\-cboxu\fR 0,0
(\fB\-cboxu\fR 0,0 applies a full\-page cropbox to all other
pages. u = unspecified.)
.RE
.IP
The \fB\-cbox2\-\fR 0,0 will set the cropbox for pages 2 and beyond
to the full page size.
See also: \fB\-ibox\fR.
.TP
.BI \-cg " <inches>"
Minimum column gap width in inches for detecting multiple
columns. Default = 0.1 inches. Setting this too large
will give very poor results for multicolumn files. See also
\fB\-cgmax\fR.
.TP
.BI \-cgmax " <inches>"
Max allowed gap between columns in inches. If the gap
between two regions exceeds this value, they will not be
considered as separate columns. Default = 1.5. Use -1 for
no limit (disable). See also \fB\-cg\fR.
.TP
.BI \-cgr " <range>"
Set column\-gap range, 0 \- 1. This is the horizontal range
over which k2pdfopt will search for a column gap, as a
fraction of the page width. E.g. \fB\-cgr\fR 0.5 will search
from 0.25 to 0.75 of the page width for a column gap.
Set this to a small value, e.g. 0.05, to only search for
column breaks in the middle of the page. Default = 0.33.
.TP
.BI \-ch " <inches>"
Minimum column height in inches for detecting multiple
columns. Default = 1.5 inches.
.TP
.BI \-ci[\-] " <imagefile>"
Specify a cover image for the first page of the converted
PDF. \fI<imagefile>\fR can be a bitmap file (png or jpg) or can be
a page from a PDF file, e.g. myfile.pdf[34] would use page 34
of myfile.pdf. You can just specify an integer, e.g. \fB\-ci\fR 50
to use page 50 of the source file being converted as the
cover page. Default is \fB\-ci\-\fR, which is no cover image.
.RS
.HP
NOTE: \fB\-ci\fR only works with bitmapped output\-\-it does not
(yet) work with native PDF output.
.RE
.TP
.BI \-cmax " <max>"
Set max contrast increase on source pages. 1.0 keeps
contrast from being adjusted. Use a negative value to
specify a fixed contrast adjustment. Def = 2.0.
See also \fB\-er\fR.
.TP
.BI \-col " <maxcol>"
Set max number of columns. \fI<maxcol>\fR can be 1, 2, or 4.
Default is \fB\-col\fR 2.
\fB\-col\fR 1 disables column searching.
.TP
.BR \-colorbg " (or " \-colorfg ") \fI<hexcolor>\fP|\fI<bitmap>\fP[,\fI<hexcolor>\fP|\fI<bitmap>\fP[,...]]"
Map the color white (background color, for \fB\-colorbg\fR) or the
color black (text color, for \fB\-colorfg\fR) to \fI<hexcolor>\fR,
where \fI<hexcolor>\fR is a 6\-digit hex RRGGBB representation of a
color, e.g. ffffff for all white, 000000 for all black,
ff0000 for bright red, etc. If \fI<hexcolor>\fR is not a grayscale
color, the \fB\-c\fR (color output) option will be turned on
automatically. This option only works with bitmapped output
(not native\-\-see \fB\-n\fR). Grayscale colors between black and
white will be linearly interpolated between the specified
\fB\-colorbg\fR and \fB\-colorfg\fR colors. If the source document has
colors, only (mostly) grayscale pixels are affected if ! is
put before the color, e.g. \fB\-colorbg\fR !ffffd0
A bitmap can also be specified, e.g. \fB\-colorbg\fR myfile.jpg.
In this case, the bitmap gets tiled in as the background.
If you specify a comma delimited list of colors (or bitmaps),
then consecutive rows of text are colored with the
consecutive colors. This is a possible way to make the
rows of text easier to follow, e.g. \fB\-colorfg\fR ff0000,00 will
color alternate rows of text red and black.
Default is \fB\-colorbg\fR "" and \fB\-colorfg\fR "" (no mappings).
.TP
.BI \-comax " <range>"
Stands for Column Offset Maximum. The \fI<range>\fR given is as a
fraction of the width of a single column, and it specifies
how much the column divider can move around and still have
the columns considered contiguous. Set to -1 to revert back
to how columns were treated in k2pdfopt v1.34 and before.
Default = 0.3.
.TP
.BI \-crgh " <inches>"
Set the min height of the blank area that separates regions
with different numbers of columns.
Default = 1/72 inch.
.TP
.B \-d[\-]
Turn on [off] dithering for bpc values < 8. See \fB\-bpc\fR.
Default is on.
.TP
.BI \-de " <size>"
Defect size in points. For scanned documents, marks
or defects smaller than this size are ignored when bounding
rectangular regions. The period at the end of a sentence is
typically over 1 point in size. The default is 1.0.
.TP
.BI \-dev " <name>"
Select device profile (sets width, height, dpi, and corner
marking for selected devices). Currently the selection is
limited. \fI<name>\fR just has to have enough characters to
uniquely pick the device. Use \fB\-dev\fR ? to list the devices.
Default is \fB\-dev\fR kindle2.
.TP
.BI \-dpi " <dpival>"
Same as \fB\-odpi\fR.
.TP
.BI \-dr " <value>"
Display resolution multiplier. Default = 1.0. Using a
value greater than 1 should improve the resolution of the
output file (but will make it larger in file size).
E.g. \fB\-dr\fR 2 will double the output DPI, the device width
(in pixels), and the device height (in pixels).
.TP
.BI \-ds " <factor>"
Override the document size with a scale factor. E.g. if
your PDF reader says the PDF file is 17 x 22 inches and
it should actually be 8.5 x 11 inches, use \fB\-ds\fR 0.5. Default
is 1.0.
.TP
.BR \-dw[\-]\fR " [\fI<fitorder>\fR]"
De\-warp [do not de\-warp] pages (uses Leptonica de\-warp
algorithms). Default is not to de\-warp. Does not work
for native mode output. Optional <fitorder> specifies the
fit order for the dewarping curves. Can be 2, 3, or 4.
Default is 4.
[Advanced: You can actually make the fit order a two\-digit
code. E.g. \fB\-dw\fR 24 will use 4th\-order on each row of text
but only 2nd\-order for columns of displacement (see
leptonica dewarpFindVertDisparity() in dewarp2.c)]
Note: de\-warping, like auto\-straighten and auto\-crop, is
intended for entire pages. It does not work on cropped areas.
If you want it to work on cropped areas, you should run
k2pdfopt in two passes\-\-first to create selected crop
areas (e.g. \fB\-mode\fR crop), then to apply dewarping.
Require k2pdfopt built with leptonica.
.TP
.BI \-ehl " <n>"
Same as \fB\-evl\fR, except erases horizontal lines instead of
vertical lines. See \fB\-evl\fR. Default is \fB\-ehl\fR 0.
.TP
.BI \-er " <n>"
Use erosion filter on source bitmaps. Makes the text look
darker. A larger value of <n> makes the text thicker/darker.
Try \fB\-er\fR 1 or \fB\-er\fR 2. Default is 0 (no erosion filtering).
Use a negative value for \fI<n>\fR to do the erosion before the
constrast adjustment is applied. Use a positive value to
to the erosion after the constrast adjustment is applied.
This option may magnify scanning defects, so you might want
to combine with the \fB\-de\fR (defect removal) option.
Has no effect in native mode output. See also \fB\-de\fR, \fB\-g\fR, \fB\-cmax\fR.
.TP
.BI \-evl " <n>"
Detects and erases vertical lines in the source document
which may be keeping k2pdfopt from correctly separating
columns or wrapping text, e.g. column dividers. If \fI<n>\fR is
zero, this is turned off (the default). If \fI<n>\fR is 1, only
free\-standing vertical lines are removed. If \fI<n>\fR is 2,
vertical lines are erased even if they are the sides of
an enclosed rectangle or figure, for example.
.TP
.BI \-f2p " <val>"
Fit\-to\-page option. The quantity \fI<val>\fR controls fitting
tall or small contiguous objects (like figures or
photographs) to the device screen. Normally these are fit
to the width of the device, but if they are too small or
too tall, then if \fI<val>\fR=10, for example, they are allowed
to be 10% wider (if too small) or narrower (if too tall)
than the screen in order to fit better. Use -1 to fit the
object no matter what. Use -2 as a special case\-\-all
"red\-boxed" regions (see \fB\-sm\fR option) are placed one per
page.
Use \fB\-f2p\fR -3 to fit as many "red\-boxed" regions as
possible on each page without breaking them across pages.
(see \fB\-mode\fR concat).
Default is \fB\-f2p\fR 0. See also \fB\-jf\fR, \fB\-fr\fR.
Note: \fB\-f2p\fR -2 will automatically also set \fB\-vb\fR -2 to
exactly preserve the spacing in the red\-boxed region. If
you want to compress the vertical spacing in the red\-boxed
region, use \fB\-f2p\fR -2 \fB\-vb\fR -1.
.TP
.B \-fc[\-]
For multiple column documents, fit [don't fit] columns to
the width of the reader screen regardless of \fB\-odpi\fR.
Default is to fit the columns to the reader.
.TP
.B \-fr[\-]
Figure rotate\-\-rotates wide\-aspect\-ratio figures to landscape
so that they best fit on the reader page. Default is not
to rotate. See also \fB\-f2p\fR.
.TP
.BI \-fs " <points>\fR[+]\fP"
The output document is scaled so that the median font size in
the converted file is \fI<points>\fR points. If the \fI<points>\fR value
is followed by a '+', the scaling is adjusted for every
source page, otherwise the font size is only adjusted once,
based on the median font size for the entire source document.
The default is \fB\-fs\fR 0, which turns off scaling based on font
size. The use of \fB\-fs\fR overrides the \fB\-mag\fR setting.
.TP
.BI \-g " <gamma>"
Set gamma value of output bitmaps. A value less than 1.0
makes the page darker and may make the font more readable.
Default is 0.5. Has no effect with native\-mode output.
See also \fB\-er\fR, \fB\-cmax\fR.
.TP
.BI \-grid " <C>\fRx\fI<R>\fR[x\fI<O>\fR][+]"
Grid the source page into \fI<C>\fR columns by \fI<R>\fR rows with
with \fI<O>\fR percent overlap. No regard will be made for trying
to break the page between columns or rows of text. If a +
is specified, the destination page order will go across and
then down, otherwise it will go down and then across. To
turn off gridding, specify a zero value for the columns or
for the rows. Default is no gridding. The default overlap
is 2%. Example: \fB\-grid\fR 2x2x5. By default, gridding also
sets the following options, which can be overridden by
following the grid option with other command options:
\fB\-n\fR \fB\-wrap\-\fR \fB\-f2p\fR -2 \fB\-vb\fR -2 \fB\-col\fR 1. For example, if you want
a column search done on each grid piece, you can put this:
\fB\-grid\fR 2x2 \fB\-col\fR 2. See also \fB\-cbox\fR.
.TP
.B \-gs[\-][\-]
Force use of Ghostscript instead of MuPDF to read PDFs.
K2pdfopt has built\-in PDF translation (via the MuPDF
library) but will try to use Ghostscript if Ghostscript
is available and the internal (MuPDF) translation fails
(virtually never happens). You can force Ghostscript to
be used with this \fB\-gs\fR option. Use \fB\-gs\-\fR to use Ghostscript
only if MuPDF fails. Use \fB\-gs\-\-\fR to never use Ghostscript.
Download ghostscript at http://www.ghostscript.com.
.TP
.BI \-gtc " <inches>"
Threshold value for detecting column gaps (expert mode).
Sets how many of the pixels in the column shaft can be
non\-white (total height of a line crossing the shaft in
inches). See also \fB\-gtr\fR. Default = .005.
.TP
.BI \-gtr " <inches>"
Threshold for detecting gaps between rows (expert mode).
This sets the maximum total black pixels, in inches, on
average, that can be in each row of pixels before the gap is
no longer considered a gap. A higher value makes it easier
to detect gaps between rows of text. Too high of a value
may inadvertently split figures and other graphics.
Default = 0.006. See also \fB\-rsf\fR.
.TP
.BI \-gtw " <inches>"
Threshold for detecting word gaps (expert mode).
See \fB\-gtr\fR. Default = .0015.
.TP
.B \-gui[\-]
Use [don't use] graphical user interface (MS Windows only).
If k2pdfopt is started from a console (command\-line), the
default is not to launch the gui unless there are no command-
line options given. If k2pdfopt is launched via its icon,
then the default is to launch the GUI.
.TP
.B \-guimin[\-]
Start the k2pdfopt GUI minimized. Def = not minimized.
.TP
.BI \-h " <height>\fR[in|cm|s|t|p|x]"
Set height of output device in pixels, inches, cm,
source page size (s), trimmed source region size (t),
pixels (p), or relative to the OCR text layer (x).
The default units are pixels (p), and the default value
is 735 (the height of the Kindle 2 screen in pixels).
Examples:
.RS 17
.TP
.BR -h " 6.5in"
Sets the device height to 6.5 in
(using the output dpi to convert to
pixels\-\-see \fB\-dpi\fR).
.TP
.BR -h " 1.5s"
Sets the device height to 1.5 times the
source page height (same as \fB\-h\fR -1.5).
.TP
.BR -h " 1t"
Sets the device height to whatever the
trimmed page height is (you can follow
\fB\-mode\fR copy with \fB\-h\fR 1t to make the output
page height equal to the crop box height.
.TP
.BR -h " 0.5x"
Sets the device height to half of the
height of the box exactly surrounding
the OCR text layer on the source page.
.RE
.IP
See also \fB\-w\fR, \fB\-dpi\fR, \fB\-dr\fR.
.TP
.B \-hy[\-]
Turn on [off] hyphen detection/elimination when wrapping
text. Default is on.
.TP
.B \-i
Echo information about the source file (PDF only).
Disables all other processing.
.TP
.BI \-ibox[ <pagelist> |\-|u\fB] " <cropbox>"
Same as \fB\-cbox\fR (see \fB\-cbox\fR), except that these
boxes are ignored by k2pdfopt. This is done by whiting out
the boxes in the source bitmap. For native output, the
area in the \fB\-ibox\fR will not affect the parsing of the source
file, but it may still be visible in the output file.
Default is no iboxes (\fB\-ibox\-\fR). See also \fB\-cbox\fR.
.TP
.BI \-idpi " <dpi>"
Set pixels per inch for input file. Use a negative value
as a multiplier on the output dpi (e.g. -2 will set the
input file dpi to twice the output file dpi (see \fB\-odpi\fR).
Default is -2.0.
.TP
.BR \-j " -1|0|1|2[+/\-]"
Set output text justification. 0 = left, 1 = center,
2 = right. Add a + to attempt full justification or a \-
to explicitly turn it off. The default is -1, which tells
k2pdfopt to try and maintain the justification of the
document as it is. See also \fB\-wrap\fR.
.TP
.BR \-jf " 0|1|2 [\fI<inches>\fR]"
Set figure (tall region) justification. If a figure
has left or right margins available, this option allows
you to set the justification differently than the text.
E.g. you can center figures with \fB\-jf\fR 1. If you want to
specify a minimum height for figures (e.g. minimum region
height where this justification applies), you can tack it
on at the end, e.g. \fB\-jf\fR 1 1.5 to center any region taller
than 1.5 inches. Default is 0.75 inches for the minimum
height and to use the same justification on figures as
the rest of the document (\fB\-jf\fR -1). See also \fB\-f2p\fR to fit
small or tall figures to the page.
.TP
.BI \-jfc[\-|+]
Attempt [do not attempt] to keep figure captions joined
with their figures. If you specify \fB\-jfc\fR+, k2pdfopt will
also try to detect figure captions in multi\-column documents.
This is not done by default because k2pdfopt will sometimes
(more often than not, in my experience) incorrectly choose
the multi\-column layout if it is also trying to detect what
is a figure caption. See also \fB\-cg\fR, \fB\-cgmax\fR, \fB\-cgr\fR, \fB\-crgh\fR.
Default = \fB\-jfc\fR.
.TP
.BR \-jpg " [\fI<quality>\fR]"
Use JPEG compression in PDF file with quality level
\fI<quality>\fR (def=90). A lower quality value will make your
file smaller. See also \fB\-png\fR. Use of \fB\-jpg\fR is incompatible
with the \fB\-bpc\fR option.
.TP
.BI \-l " <lang>"
See \fB\-ocrlang\fR.
.TP
.BI \-lang " <lang>"
See \fB\-ocrlang\fR.
.TP
.BI \-ls[\-][ pagelist ]
Set output to be in landscape [portrait] mode. The default
is \fB\-ls\-\fR (portrait). If an optional pagelist is specified,
only those pages are affected\-\-any other pages are done
oppositely. E.g. \fB\-ls\fR1,3,5\-10 would make source pages 1, 3
and 5 through 10 landscape.
.TP
.BI \-m[l|t|r|b] " <val>\fR[\fI<units>\fR][,\fI<val>\fR[\fIunits\fR][,...]]"
Set global crop margins for
every page. If more than one value is given (comma\-delimited
with no spaces in between), the order is left, top, right,
bottom, e.g. \fB\-m\fR \fI<left>\fR,\fI<top>\fR,\fI<right>\fR,\fI<bottom>\fR. You can also
use the more powerful \fB\-cbox\fR option to do this same thing.
The default units are inches. For available units and their
descriptions, see \fB\-h\fR.
Examples:
.RS
.TP
.BR \-m " 0.5cm"
Sets all margins to 0.5 cm.
.TP
.BR \-m " 0.5cm,1.0cm"
Sets the left margin to 0.5 cm and all the other
margins to 1.0 cm.
.TP
.BR \-m " 0.2in,0.5in,0.2in,0.5in"
Sets the left and right crop margins to
0.2 inches and the top and bottom to 0.5 inches.
.TP
.BR \-mt " 1cm"
Sets the top margin to 0.5 cm.
.TP
.BR \-m " \-0.1x,\-0.1x,1.1x,1.1x"
With the 'x' unit, the behavior is a little
different. Rather than specifying the widths
of each margin, you specify the position of
the crop box relative to the OCR text layer
in the source file, where 0x,0x,1x,1x would
exactly bound the OCR text layer.
.RE
.IP
The default crop margins are 0 inches.
.RS
.HP
[NOTE: The default was 0.25 inches for all margins before
v1.65.]
.RE
.IP
See also \fB\-cbox\fR and \fB\-ac\fR to autocrop scanning artifacts.
.TP
.BI \-mag " <value>"
Magnify the converted document (text) size by \fI<value>\fR.
Default is \fB\-mag\fR 1 (no magnification). See also \fB\-fs\fR.
.TP
.B \-mc[\-]
Mark [don't mark] corners of the output bitmaps with a
small dot to prevent the reading device from re\-scaling.
Default = mark.
.TP
.BI \-mode " <mode>"
Shortcut for setting multiple options at once which
determine the basic way in which k2pdfopt will behave.
Available modes are:
.RS
.TP
copy
Same as \fB\-n\-\fR \fB\-wrap\-\fR \fB\-col\fR 1 \fB\-vb\fR -2 \fB\-w\fR 1s \fB\-h\fR 1s
\fB\-dpi\fR 150 \fB\-rt\fR 0 \fB\-c\fR \fB\-t\-\fR \fB\-f2p\fR -2 \fB\-m\fR 0 \fB\-om\fR 0 \fB\-pl\fR 0
\fB\-pr\fR 0 \fB\-pt\fR 0 \fB\-pb\fR 0 \fB\-mc\-\fR. Makes k2pdfopt
behave exactly like my pdfr program\-\-source
pages are simply copied to the output file, but
rendered as bitmaps. No trimming or re\-sizing
is done. Can also use \fB\-mode\fR pdfr.
.RS
.HP
Note 1: Use \fB\-mode\fR copy \fB\-n\fR if you want an exact
copy (output in native mode).
.HP
Note 2: The default gamma and contrast settings
are not reset by \fB\-mode\fR copy. If you
want a perfect copy, do this:
.RS 14
.EX
\-mode copy \-gamma 1 \-cmax 1
.EE
.RE
.RE
.TP
fp
Also can use fitpage. Same as \fB\-n\fR \fB\-wrap\-\fR \fB\-col\fR 1
\fB\-vb\fR -2 \fB\-f2p\fR -2 \fB\-t\fR.
.TP
fw
Same as \fB\-n\fR \fB\-wrap\-\fR \fB\-col\fR 1 \fB\-vb\fR -2 \fB\-t\fR \fB\-ls\fR. Makes
k2pdfopt behave like sopdf's "fit width"
option. Can also use \fB\-mode\fR sopdf.
.TP
2col
Same as \fB\-n\fR \fB\-wrap\-\fR \fB\-col\fR 2 \fB\-vb\fR -2 \fB\-t\fR.
Optimizes for a 2\-column scientific article with
native PDF output.
.TP
tm
Trim margins\-\-same as \fB\-mode\fR copy, but sets the
output to be trimmed to the margins and the width
and height of the output to match the trimmed
source pages. Also uses native mode. Equivalent
to \fB\-n\fR \fB\-wrap\-\fR \fB\-col\fR 1 \fB\-vb\fR -2 \fB\-f2p\fR -2 \fB\-t\fR \fB\-w\fR 1t \fB\-h\fR 1t
\fB\-rt\fR 0 \fB\-c\fR \fB\-m\fR 0 \fB\-om\fR 0 \fB\-pl\fR 0 \fB\-pr\fR 0 \fB\-pt\fR 0 \fB\-pb\fR 0 \fB\-mc\-\fR.
Can also use \fB\-mode\fR trim.
.TP
crop
Used with \fB\-cbox\fR option, puts each cropped area
on a separate page, untrimmed, and sizes the
page to the cropped region. Same as \fB\-wrap\-\fR
\fB\-col\fR 1 \fB\-vb\fR -2 \fB\-w\fR 1t \fB\-h\fR 1t \fB\-t\-\fR \fB\-rt\fR 0 \fB\-c\fR \fB\-f2p\fR -2
\fB\-m\fR 0 \fB\-om\fR 0 \fB\-pad\fR 0 \fB\-mc\-\fR \fB\-n\fR
concat Keeping the output pages the same size as the
source pages, fit as many crop\-boxed regions on
the output pages as possible without breaking
them across pages. Equivalent to: \fB\-n\fR \fB\-wrap\-\fR
\fB\-col\fR 1 \fB\-vb\fR -2 \fB\-t\-\fR \fB\-f2p\fR -3 \fB\-fc\-\fR \fB\-w\fR 1s \fB\-h\fR 1s \fB\-ocrdef\fR Default k2pdfopt mode: \fB\-wrap\fR \fB\-n\-\fR \fB\-col\fR 2 \fB\-vb\fR 1.75
\fB\-dev\fR k2 \fB\-rt\fR auto \fB\-c\-\fR \fB\-t\fR \fB\-f2p\fR 0 \fB\-m\fR 0 \fB\-om\fR 0.02
\fB\-ls\-\fR.
.TP
concat
Keeping the output pages the same size as the
source pages, fit as many crop\-boxed\ regions on
the output pages as possible without breaking
them across pages. Equivalent to: \fB\-n\fR \fB\-wrap\-\fR
\fB\-col\fR 1 \fB\-vb\fR \-2 \fB\-t\-\fR \fB\-f\fR2p \-3 \fB\-fc\-\fR \fB\-w\fR 1s \fB\-h\fR 1s \fB\-ocr\-\fR
.TP
def
Default k2pdfopt mode: \fB\-wrap\fR \fB\-n\-\fR \fB\-col\fR 2 \fB\-vb\fR 1.75
\fB\-dev\fR k2 \fB\-rt\fR auto \fB\-c\-\fR \fB\-t\fR \fB\-f\fR2p 0 \fB\-m\fR 0 \fB\-om\fR 0.02
\fB\-ls\-\fR.
.RE
.IP
You can modify modes by overriding their options after
specifying the mode, e.g. \fB\-mode\fR fw \fB\-vb\fR -1.
.TP
.B \-n[\-]
Use "native" PDF output format. NOTE: if you want native
PDF output, it's probably best to use a \fB\-mode\fR option like
\fB\-mode\fR fitwidth or \fB\-mode\fR 2col, both of which automatically
turn on native PDF output and optimize other settings for it.
Native PDF output preserves the native source PDF contents,
i.e. the output PDF file is not rendered as a sequence of
bitmapped pages like in the default k2pdfopt output mode.
Instead, the source PDF's native content is used along with
additional PDF instructions to translate, scale, and crop
the source content. With native PDF output, if the source
file has selectable text, the text remains selectable in
the output file. The output file can also be zoomed
without loss of fidelity. This may also result in a
smaller output file (but not always). By default, native
PDF output format is turned off. See also \fB\-mode\fR.
NOTES:
.RS
.IP 1.
Native PDF output cannot be used with text wrapping
on (see \fB\-wrap\fR option).
Turning it on will disable
text wrapping.
.IP 2.
Native PDF output is not recommended for source
files which are scanned (there is no benefit unless
the scanned document includes a layer of OCR text).
.IP 3.
Native PDF output is incompatible with OCR (see \fB\-ocr\fR),
though OCR is typically not necessary if the native PDF
contents are kept. Turning on native PDF output will
disable OCR.
.IP 4.
Native PDF output can only be used with PDF source
files (it does not work with DJVU source files).
.IP 5.
Contrast adjust, gamma correction, and sharpening
are disabled with native PDF output.
.IP 6.
It is recommended that you use \fB\-vb\fR \fB\-2\fR with native PDF
output, particularly if you are having difficulty
selecting/searching text in the output PDF file.
.IP 7.
This option works well with \fB\-mode\fR fw, \fB\-mode\fR 2col, or
with the \fB\-grid\fR option. It is used by default in those
cases.
.RE
.TP
.B \-neg[\-|+]
Inverse [don't inverse] the output images (white letters
on black background, or "night mode"). If \fB\-neg\fR+, inverts
all graphics no matter what. If just \fB\-neg\fR, attempts to
invert text only and not figures. Default = \fB\-neg\-\fR.
See also \fB\-colorbg\fR and \fB\-colorfg\fR.
.TP
.BI \-ng " <gap>"
Set gap between notes and main text in the output document.
The \fI<gap>\fR defaults to inches but can have other units (see
\fB\-h\fR, for example). See \fB\-nl\fR and \fB\-nr\fR for how to turn on notes
processing. Default is \fB\-ng\fR 0.2.
.TP
.BR \-nl[\fI<pages>\fB] " [\fI<leftbound>\fR,\fI<rightbound>\fR]"
.TP
.BR \-nr[\fI<pages>\fB] " [\fI<leftbound>\fR,\fI<rightbound>\fR]"
The source document has notes in the left (\fB\-nl\fR) or right
(\fB\-nr\fR) margins. Specific pages can be specified for the
notes using \fI<pages>\fR (same format as \fB\-cbox\fR or \fB\-p\fR). If
\fI<leftbound>\fR,\fI<rightbound>\fR are specified, they specify the
fraction of the page width where to look for the break
between the notes and the main page. E.g.
\fB\-nl\fR 0.15,0.25 will look for the boundary between the notes
and the text between 15% and 25% of the way across the
source page. Use \fB\-nl\-\fR to turn off all processing of notes
in the margins (default). Default values for \fI<leftbound>\fR
and \fI<rightbound>\fR are 0.05 to 0.35 for \fB\-nl\fR and 0.65 to 0.95
for \fB\-nr\fR.
Notes in the margins are treated differently than other
"columns" of text. They will be interspersed with the
text in the adjacent column of main text.
Note that \fB\-nr\fR... or \fB\-nl\fR... will also set \fB\-cg\fR to 0.05.
.TP
.BI \-nt " <nthreads>"
Use \fI<nthreads>\fR parallel threads when OCR\-ing a document
with the Tesseract OCR engine (GOCR is not thread safe).
This may provide a significant processing speed improvement
when using Tesseract OCR. Note that a higher number is not
always faster. You should experiment with your system to
find the optimum. A negative value is interpreted as a
percentage of available CPUs. The default is \fB\-50\fR, which
tells k2pdfopt to use half of the available CPU threads.
Some performances I measured:
.EX
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
OCR Speed
O/S CPU Nthreads improvement
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
Win 10 x64 Core i5 2 (default) 1.5x
Win 10 x64 Core i5 3 1.6x
Win 10 x64 Core i5 4 1.8x
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
Win 10 x64 Core i7 2 1.8x
Win 10 x64 Core i7 3 2.4x
Win 10 x64 Core i7 4 (default) 2.5x
Win 10 x64 Core i7 5 2.8x
Win 10 x64 Core i7 6 2.7x
Win 10 x64 Core i7 7 2.7x
Win 10 x64 Core i7 8 2.6x
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
Linux x64 Core i5 2 (default) 1.9x
Linux x64 Core i5 3 2.6x
Linux x64 Core i5 4 2.7x
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
Linux x64 Xeon E52690v2 2 1.9x
Linux x64 Xeon E52690v2 4 3.5x
Linux x64 Xeon E52690v2 6 5.1x
Linux x64 Xeon E52690v2 8 6.6x
Linux x64 Xeon E52690v2 10 (default) 8.7x
Linux x64 Xeon E52690v2 14 9.5x
Linux x64 Xeon E52690v2 20 10.2x
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
.EE
Interestingly, Linux seems to have much better multithreading
performance than Windows. I suspect the OS/X results are
similar to the Linux results.
NOTE: \fB\-nt\fR has no effect if you select \fB\-ocrd\fR c or \fB\-ocrd\fR p.
See \fB\-ocrd\fR.
Require k2pdfopt built with OCR lib.
.TP
.BI \-o " <namefmt>"
Set the output file name using \fI<namefmt>\fR. %s will be
replaced with the full name of the source file minus the
extension. %b will be replaced by the base name of the
source file minus the extension. %f will be replaced with
the folder name of the source file. %d will be replaced with
the source file count (starting with 1). The .pdf extension
will be appended if you don't specify an extension.
E.g. \fB\-o\fR out%04d.pdf will result in output files out0001.pdf,
out0002.pdf, ... for the converted files. Def = %s_k2opt
.IP
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
BITMAP OUTPUT: For output to bitmaps, you can put \fB\-o\fR .png
or \fB\-o\fR .jpg (see \fB\-jpeg\fR for quality setting).
.IP
\fB\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\fR
MORE DETAIL: If \fI<namefmt>\fR ends in .jpg or .png, the output
will be in the JPEG or PNG bitmap format, respectively, one
bitmap per page. If your \fI<namefmt>\fR has no %d in it, then
%04d will be appended. If \fI<namefmt>\fR has only one %d, it will
get substituted with the page number. If it has two %d's,
the first will get the file count and the second will get the
page number. Example: if the source PDF is myfile.pdf, then
\fB\-o\fR %s%03d.png would create myfile001.png, myfile002.png,
etc., for each page of the PDF.
.TP
.B \-ocr[\-][g|t|m]
Attempt [don't attempt] to use optical character
recognition (OCR) in order to embed searchable text into
the output PDF document. If followed by t or g, specifies
the ocr engine to use (tesseract or gocr). If followed by
m, and if the PDF document has text in it, then the MuPDF
engine is used to extract the text (sort of a virtual OCR).
If \fB\-ocr\fR is specified with no argument, tesseract is used.
If tesseract fails (e.g. no language files found), GOCR
is used. The overall default operation of k2pdfopt is
\fB\-ocr\fR m. See also \fB\-ocrvis\fR and \fB\-ocrhmax\fR.
NOTE: Turning on OCR will disable native PDF output.
.RS
.HP 4
DISCLAIMER: The main intent of OCR isn't to improve the
visual quality of the text at all\-\-at least not the way
k2pdfopt does it. OCR is most useful on scanned PDFs
that don't have selectable text to begin with, but using
OCR with k2pdfopt on such documents doesn't change the
look of the output PDF file at all. The OCR text is
simply placed invisibly over the scanned text so that
you appear to be able to select the scanned text (when,
in fact, you are selecting the invisibly placed OCR
text). So the only time you will even notice the OCR
errors is if you try to search for a word and can't find
that word because the OCR of that word is incorrect, or
if you copy a selection of the OCR text and paste it
into something else so that you can actually see it.
.RE
.TP
.BI \-ocrcol " <n>"
If you are simply processing a PDF to OCR it (e.g. if you
are using the \fB\-mode\fR copy option) and the source document has
multiple columns of text, set this value to the number of
columns to process (up to 4). Default is to use the same
value as \fB\-col\fR.
.TP
.BR \-ocrd " w|l|c|p"
Set OCR detection type for k2pdfopt and Tesseract. \fI<type>\fR
can be word (w), line (l), columns (c), or page (p). Default
is line.
For \fB\-ocrd\fR w, k2pdfopt locates each word in the scanned
document and passes individual words to Tesseract for
OCR conversion. This was the only type of detection before
v2.42 but is not an optimal OCR conversion method when
using Tesseract.
For \fB\-ocrd\fR l, k2pdfopt passes each line of the converted
file to Tesseract for conversion. This typically gives
better results than \fB\-ocrd\fR w since Tesseract can better
determine the text baseline position with a full line.
For \fB\-ocrd\fR c, k2pdfopt detects each column of the converted
file and passes that to Tesseract for conversion.
For \fB\-ocrd\fR p, k2pdfopt passes the entire output page of text
to Tesseract and lets Tesseract parse it for word positions.
Tesseract has done considerable code development for
detecting words on pages (more than k2pdfopt), so this
should also be a reliable way to create the OCR layer.
One drawback to \fB\-ocrd\fR c or \fB\-ocr\fR p is that there is no benefit
to using the OCR multithreading option (see \fB\-nt\fR).
Require k2pdfopt built with leptonica.
.TP
.BI \-ocrhmax " <in>"
Set max height for an OCR'd word in inches. Any graphic
exceeding this height will not be processed with the OCR
engine. Default = 1.5. See \fB\-ocr\fR.
.TP
.BI \-ocrlang " <lang>\fR|?\fI"
Select the Tesseract OCR Engine language. This is the
root name of the training data, e.g. \fB\-lang\fR eng for English,
\fB\-ocrlang\fR fra for French, \fB\-ocrlang\fR chi_sim for simplified
Chinese. You can also use \fB\-l\fR. The default language is
whatever is in your Tesseract trained data folder. If you
have more than one .traineddata file in that folder, the
one with the most recent time stamp is used.
NOTE 1: Use \fB\-ocrvis\fR ? to see the list of Tesseract language
files in your Tesseract data folder.
NOTE 2: Using the \fB\-ocrvis\fR t option will not show the OCR text
correctly for any character above unicode value 255 since
k2pdfopt does not use any embedded fonts, but the text
will convert to the correct Unicode values when copy /
pasted.
NOTE 3: Tesseract allows the specification of multiple
language training files, e.g. \fB\-ocrlang\fR eng+fra would
specify English as the primary and French as the secondary
OCR language. In practice I have not found this to work
very well. Try multiple languages in different orders.
Require k2pdfopt built with leptonica.
.TP
.BI \-ocrdpi " <dpi>"
Set the desired dpi of the bitmaps passed to the OCR engine
OR set the desired height of a lower case letter (e.g. 'e')
in pixels. If \fI\<dpi>\fR is positive, it is interpreted as dpi.
If \fI\<dpi>\fR is negative, the absolute value is interpreted as
a lowercase letter height in pixels. Any bitmapped text sent
to the OCR engine will be downsampled (if too large) so that
the appropriate dpi or lowercase letter size is achieved.
The default is 300 because I've found this works best
empirically for Tesseract v4.0.0 English OCR with font sizes
in the range 8 - 15 pts. Use a lower value if the font size
in your document is larger than 15 - 20 pts. Or use
\fB\-ocrdpi\fR -24 if you have a wide range of font sizes.
Use \fB\-ocrdpi\fR 0 to disable any downsampling.
.TP
.BI \-ocrout[\-] " <namefmt>"
Write [don't write] UTF\-8 OCR text output to file
\fI<namefmt>\fR. See the \fB\-o\fR option for more about how
\fI<namefmt>\fR works. Default extension is .txt. Default is
no output.
.TP
.B \-ocrsort[\-]
When a PDF document has its own OCR/Text layer, this option
orders the OCR text layer by its position on the page.
This
should not be necessary unless the OCR layer was very poorly
generated. Default is \fB\-ocrsort\-\fR (off).
.TP
.B \-ocrsp[+|\-]
When generating the OCR layer, do an entire row of text at
once, with spaces between each words. By default (\fB\-ocrsp\-\fR),
each word is placed separately in the PDF document's OCR
layer. This causes problems with text selection in some
readers (for example, individual words cannot be selected).
Using \fB\-ocrsp\-\fR may fix behavior like this, but will result in
less accurate word placement since k2pdfopt does not try to
exactly match the font used by the document. Use \fB\-ocrsp\fR+
to allow more than one space between each word in the row
of text in order to optimize the selection position.
.TP
.BR \-ocrvis " <s|t|b>"
Set OCR visibility flags. Put 's' to show the source doc,
\&'t' to show the OCR text, and/or 'b' to put a box around
each word. Default is \fB\-ocrvis\fR s. To show both the source
document and the OCR text overlayed on top: \fB\-ocrvis\fR st.
See also \fB\-ocr\fR. See also \fB\-ocrlang\fR (the note about \fB\-ocrvis\fR t).
.TP
.BI \-odpi " <dpi>"
Set pixels per inch of output screen (def=167). See also
\fB\-dr\fR, \fB\-w\fR, \fB\-h\fR, \fB\-fc\fR. You can also use \fB\-dpi\fR for this.
See also \fB\-fs\fR, \fB\-mag\fR.
.TP
.BI \-om[b|l|r|t] " <val>\fR[\fP<units>\fR][,\fP<val>\fR[\fPunits\fR][,...]]"
Set the blank area margins
on the output device. Works very much like the \fB\-m\fR option.
See \fB\-m\fR for more about the syntax. Default = 0.02 inches.
Note that the 's', 't', and 'x' units for \fB\-om\fR all behave
the same and scale to the device size. E.g. \fB\-om\fR 0.1s will
make the device screen margins 0.1 times the device width
(for the left and right margins) or height (for the top and
bottom margins) of the output device screen.
.TP
.BR \-ow[\-] " [\fI<mb>\fP]"
Set the minimum file size (in MB) where overwriting the
file will not be done without prompting. Set to -1 (or
just \fB\-ow\fR with no value) to overwrite all files with no
prompting. Set to 0 (or just \fB\-ow\-\fR) to prompt for any
overwritten file. Def = \fB\-ow\fR 10 (any existing file
over 10 MB will not be overwritten without prompting).
See also \fB\-y\fR option.
.TP
.BI \-p " <pagelist>"
Specify pages to convert. \fI<pagelist>\fR must not have any
spaces. E.g. \fB\-p\fR 1\-3,5,9,10\- would do pages 1 through 3,
page 5, page 9, and pages 10 through the end. The letters
\&'e' and 'o' can be used to denote even and odd pages, e.g.
.RS
.TP
.BR \-p " o,e"
Process all odd pages, then all even ones.
.TP
.BR \-p " 2\-52e,3\-33o"
Process 2,4,6,...,52,3,5,7,...,33.
.RE
.IP
Overridden by \fB\-px\fR option. See \fB\-px\fR.
.TP
.BI \-pad " <padlist>"
A shortcut for \fB\-pl\fR, \fB\-pt\fR, \fB\-pr\fR, \fB\-pb\fR. E.g. \fB\-pad\fR 15,10,13,20
is the same as \fB\-pl\fR 15 \fB\-pt\fR 10 \fB\-pr\fR 13 \fB\-pb\fR 20. Also, using
\fB\-pad\fR 15 will set all pads to 15, for example.
.TP
.BI \-p[b|l|r|t] " <nn>"
Pad [bottom|left|right|top] side of destination bitmap with
\fI<nn>\fR rows. Defaults = 4 (bottom), 0 (left), 3 (right), and
0 (top). Example: \fB\-pb\fR 10. This is typically only used on
certain devices to get the page to come out just right. For
setting margins on the output device, use \fB\-om\fR. See also \fB\-pad\fR.
.TP
.B \-png
(Default) Use PNG compression in PDF file. See also \fB\-jpeg\fR.
.TP
.B \-ppgs[\-]
Post process [do not post process] with ghostscript. This
will take the final PDF output and process it using
ghostscript's pdfwrite device (assuming ghostscript is
available). A benefit to doing this is that all "invisible"
and/or overlapping text regions (outside cropping areas) get
completely removed, so that text selection capability is
improved. The actual ghostscript command used is:
.EX
gs \fB\-dSAFER\fR \fB\-dBATCH\fR \fB\-q\fR \fB\-dNOPAUSE\fR \fB\-sDEVICE\fR=\fI\,pdfwrite\/\fR
\fB\-dPDFSETTINGS=\fR/prepress \fB\-sOutputFile=\fR\fI<outfile>\fR
\fI<srcfile>\fR
.EE
The default is not to post process with ghostscript.
.TP
.BI \-px " <pagelist>"
Exclude pages from \fI<pagelist>\fR. Overrides \fB\-p\fR option. Default
is no excluded pages (\fB\-px\fR -1).
.TP
.B \-r[\-]
Right\-to\-left [left\-to\-right] page scans. Default is
left to right.
.TP
.BI \-rls[+|\-]
Restore [+] or don't restore [\-] the last command\-line
settings from the environment variable \fBK2PDFOPT_CUSTOM0\fR.
The default (\fB\-rls\fR) is to restore the settings if there are no
other command\-line options specified when running (from.
either the command line or the \fBK2PDFOPT\fR env var.), unless
those options are "\-gui" or specify a file name.
.TP
.BI \-rsf " <val>"
Row Split Figure of merit (expert mode). After k2pdfopt has
looked for gaps between rows of text, it will check to see
if there appear to be missed gaps (e.g. if one row is twice
the height of all the others). Increasing this value makes
it harder for k2pdfopt to split a row. Lowering it makes it
easier. Default value = 20.
.TP
.BI \-rt " <deg>\fR|auto[+]|aep"
Rotate source page counterclockwise by \fI<deg>\fR degrees.
NOTE: If you're trying to get "landscape" output so that
you can turn your reader on its side, use \fB\-ls\fR instead of
\fB\-rt\fR. The \fB\-rt\fR option is intended to be used for when your
source PDF is incorrectly rotated\-\-e.g. if you view it on
a standard PC reader and it comes up sideways.
\fI<deg>\fR can be 90, 180, 270. Or use "\-rt auto" to examine up
to 10 pages of each file to determine the orientation used
on the entire file (this is the default). Or use "\-rt aep"
to auto\-detect the rotation of every page. If you have
different pages that are rotated differently from each other
within one file, you can use this option to try to autorotate each source page. Use \fB\-rt\fR auto+ to turn on autodetect even in preview mode (otherwise it is off).
See also \fB\-ls\fR.
.TP
.B \-s[\-]
Sharpen [don't sharpen] images. Default is to sharpen.
.TP
.B \-sm[\-]
Show [don't show] marked source. This is a debugging tool
where k2pdfopt will mark the source file with the regions it
finds on them and the order in which it processes them and
save it as \fI<srcfile>\fR_marked.pdf. Default is not to show
marked source. Red regions are found on the first pass
(use \fB\-f2p\fR -2 to put each red region on a separate page).
Green lines mark vertical regions affected by \fB\-vb\fR and \fB\-vs\fR.
Gray lines mark individual rows of text (top, bottom, and
baseline). Blue boxes show individual words (passed to OCR
if \fB\-ocr\fR is specified).
.TP
.B \-sp[\-]
For each file on the command\-line, just echo the number
of pages\-\-don't process.
Default = off (\fB\-sp\-\fR).
.TP
.B \-t[\-]
Trim [don't trim] the white space from around the edges of
any output region.
Default is to trim. Using \fB\-t\-\fR is not
recommended unless you want to exactly duplicate the source
document.
.TP
.BI \-title " <title>"
Set the title metadata / property of the PDF output file(s).
Default is to use the title of the source document
(\fB\-title\fR ""). The \fI<title>\fR string will be parsed for
special characters that allow you to substitute the file
name. See the \fB\-o\fR option for a description of these
substitutions.
.TP
.B \-to[\-]
Text only output. Remove figures from output. Figures are
determined empirically as any contiguous region taller than
0.75 inches (or you can specify this using the \fB\-jf\fR option).
Use \fB\-to\-\fR to turn off (default).
.TP
.B \-toc[\-]
Include [don't include] table of contents / outline /
bookmark information in the PDF output if it is available
in the source file (works only for PDF source files and
only if MuPDF is compiled in). By default, a new destination
page is started at each bookmark location. Do disable this,
see the \fB\-bp\fR option. If \fB\-toc\-\fR is specified, bookmark
information from the source file is ignored. See also
\fB\-toclist\fR. Default is \fB\-toc\fR.
.TP
.BI \-toclist " <pagelist>\fR|\fI<file>"
Override the PDF source file's outline information
(bookmarks / table of contents) with either a list of source
pages or a file describing the table of contents. If you
specify a list of pages, e.g. \fB\-toclist\fR 5,10,20,40,100
then those pages are marked as Chapter 1, 2, etc.,
respectively. If you specify a file name, the file should be
a text file formatted like this example:
.RS 14
.EX
1 Introduction
10 Chapter 1
+10 Chapter 1, Part A
+25 Chapter 1, Part B
++25 Chapter 1, Part B, Subsection 1
++27 Chapter 1, Part B, Subsection 2
+30 Chapter 1, Part C
50 Chapter 2
70 Chapter 3
.EE
.RE
.IP
The '+' indicates a sub\-level heading (multiple +'s for
multiple sub\-levels). The first number on the line is the
source page reference number. The rest of the text on the
line is the name of the chapter / subheading.
Note: This option overrides \fB\-toc\fR. To get a template from
an existing PDF file, see the \fB\-tocsave\fR option.
.TP
.BI \-tocsave " <file>"
If an outline exists in the PDF file (and \fB\-toc\fR is specified)
write that outline to text file \fI<file>\fR in the format required
by \fB\-toclist\fR. See \fB\-toc\fR, \fB\-toclist\fR.
.TP
.B \-ui[\-]
User input query turned on [off]. Default = on for linux or
if not run from command line in Windows.
.TP
.B \-v
Verbose output.
.TP
.BI \-vb " <thresh>"
Set gap\-size vertical\-break threshold between regions that
cause them to be treated as separate regions. E.g. \fB\-vb\fR 2
will break the document into separate regions anywhere
there is a vertical gap that exceeds 2 times the median
gap between lines of text. These separate regions may
then be scaled and aligned independently.
Special values: Use \fB\-vb\fR -1 to preserve all horizontal
alignment and scaling across entire regions (vertical
spacing may still be adjusted). Use \fB\-vb\fR -2 to exactly
preserve each region (both horizontal alignment and
vertical spacing\-\-this is the value used by \fB\-mode\fR fw, for
example). The default is \fB\-vb\fR 1.75.
.TP
.BI \-vls " <spacing>"
Set vertical line spacing as a fraction of the text size.
This can be used to override the line spacing in a document.
If 1, then single spacing is used. 2 = double spacing.
If negative, then the absolute value acts as the limiting
case. E.g., if you set \fB\-vls\fR -1.5, then any the line
spacing of the original document is preserved unless it
exceeds 1.5 (times single spacing). Default = -1.2.
See also \fB\-vs\fR.
.TP
.BI \-vs " <maxgap>"
Preserve up to \fI<maxgap>\fR inches of vertical spacing between
regions in the document (marked in green when using \fB\-sm\fR
option). This value has no effect if you use a negative
value for \fB\-vb\fR. The default value is 0.25.
See also \fB\-vls\fR, \fB\-vb\fR.
.TP
.BI \-w " <width>\fR[in|cm|s|t|p]"
Set width of output device. Default is 560. See \fB\-h\fR.
.TP
.B \-wrap[\-|+]
Enable [disable] text wrapping. Default = enabled. If
\fB\-wrap\fR+, regions of text with lines shorter than the mobile
device screen are re\-flowed to fit the screen width. If
you use \fB\-wrap\fR+, you may want to also specify \fB\-fc\-\fR so that
narrow columns of text are not magnified to fit your device.
Text wrapping disables native PDF output (see \fB\-n\fR option).
See also \fB\-ws\fR, \fB\-j\fR, \fB\-fc\fR, \fB\-n\fR.
.TP
.BI \-ws " <spacing>"
Set minimum word spacing for line breaking as a fraction of
the height of a lowercase 'o'. Use a larger value to make it
harder to break lines. If negative, automatic word spacing
is turned on. The automatic spacing leans toward breaking
long words between letters to be sure to fit text to the
device display. Def = -0.20. The absolute value of the
setting, if negative, is used as a minimum allowed value.
If you want k2pdfopt to aggressively break lines (e.g. break
apart long words if they don't fit on a line), use a smaller
absolute value, e.g. \fB\-ws\fR -0.01. A positive value works as
it did in v2.18 and before. The default value was changed
from 0.375 in v2.18 to -0.20 in v2.20. See also \fB\-wrap\fR.
.TP
.BI \-wt[+] " <thresh>"
Any pixels whiter than \fI<thresh>\fR (0\-255) are treated
as "white". Setting this lower can help k2pdfopt better
process some poorly\-quality scanned pages or pages with
watermarks. Note that the pixels which are above \fI<thresh>\fR
threshold value and therefore are treated as white are not
actually changed to pure white (255) unless the '+' is also
included. Otherwise, this only sets a threshold.
The default value for \fB\-wt\fR is -1, which tells k2pdfopt to pick
the optimum value. See also \fB\-cmax\fR, \fB\-colorfg\fR, \fB\-colorbg\fR.
.TP
.B \-x[\-]
Exit [don't exit\-\-wait for <Enter>] after completion.
.TP
.B \-y[\-]
Assume [don't assume] "yes" to queries, such as whether
to overwrite a file. See also \fB\-ow\fR. Also turns off any
warning messages.
.SH SEE ALSO
http://www.willus.com/k2pdfopt/
.SH AUTHOR
.PP
.B K2pdfopt
is written by Willus.
.PP
This manual page was written by
.MT mmyangfl@\:gmail.com
Yangfl
.ME
for the Debian Project (but may be used by others).
|