1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
|
<h2><font color="darkgreen">DiffPDF</font></h2>
<ul>
<li><a href="#bas">Basic Usage</a></li>
<li><a href="#cmp">The Compare Button</a></li>
<li><a href="#txt">Words Comparison Mode</a></li>
<li><a href="#chr">Characters Comparison Mode</a></li>
<li><a href="#vis">Appearance Comparison Mode</a></li>
<li><a href="#zon">Zoning</a></li>
<li><a href="#ran">Page Ranges</a></li>
<li><a href="#mar">Margins</a></li>
<li><a href="#sav">Saving</a></li>
<li><a href="#opt">The Options Dialog</a></li>
<li><a href="#dock">Dock Windows</a></li>
<li><a href="#cli">Command Line Usage</a></li>
</ul>
<h3 id="bas">Basic Usage</h3>
<p>Click the <b>File #1</b> button to choose one PDF file and then
the <b>File #2</b> button to choose another (ideally very similar) PDF
file, then click the <b>Compare</b> button to perform the comparison,
and when that's finished, navigate through the pairs of differing pages
using the <b>View</b> combobox or using the <b>Previous</b> and
<b>Next</b> buttons. Alternatively, drag two files—either
separately or together—and drop them onto <font
color="darkgreen">DiffPDF</font>'s view panels, then click the
<b>Compare</b> button.
<h3 id="cmp">The Compare Button</h3>
<p>When the <b>Compare</b> button is pressed, <font
color="darkgreen">DiffPDF</font> does a high-speed scan of every pair of
pages (~100 pairs of pages per second on the author's machine). To make
the scan as fast as possible <font color="darkgreen">DiffPDF</font> does
a very rough check of each pair of pages—so it is possible that it
identifies some false positives (i.e., page pairs that are really the
same). False positives are quite rare. (There are no false
negatives—differences are never missed.)
<h3 id="txt">Words Comparison Mode</h3>
<p>The default comparison mode is Words which does a smart text
comparison word by word for each pair of pages. This mode is
fairly liberal regarding
whitespace and tries to ignore layout changes (within a page) insofar
as possible. It also treats all hyphens (soft-hyphen, minus sign, etc.),
the same, that is, as a plain hyphen.
This mode is best for alphabetic languages like English.
<h3 id="chr">Characters Comparison Mode</h3>
<p>The Characters comparison mode does a smart text
comparison character by character for each pair of pages. This mode is
liberal regarding whitespace at the ends of lines and tries to
ignore layout changes (within a page) insofar as possible.
It also treats all hyphens (soft-hyphen, minus sign, etc.),
the same, that is, as a plain hyphen.
This mode is best for logographic languages like Chinese and Japanese.
<h3 id="vis">Appearance Comparison Mode</h3>
<p>The Appearance comparison mode
can be used to detect changes in fonts, diagrams, or any other visual
aspects. This mode is absolutely strict and compares each pair of
pages pixel
for pixel. By default this mode shows differences using highlighting
just like the Words and Characters modes do. However, it is also
possible to compare using
composition modes which can be useful to detect very small and subtle
differences that aren't immediately apparent.
<h3 id="zon">Zoning</h3>
<p>Zoning is an experimental feature designed to produce more accurate
results (i.e., fewer false positives). Its main use is for pages that
have tables or that mix alphabetic and logographic text, since these can
cause the underlying popplar PDF library to provide the page's words
mixed up. <font color="red">Warning:</font> using zoning for large
complex pages (bigger than A4, multiple columns, tables) in Characters
mode can be very slow. (The current focus for the zoning code is
functionality not efficiency.) Furthermore, in some cases zoning can
cause an <i>increase</i> in false positives—this can occur because
the zoning code reorders the text that is fed to the sequence matcher
and sometimes the reordering is wrong. Getting this right is
non-trivial; changing the tolerances may help.
<p>The Tolerance/R value is the maximum distance between text (i.e., word)
rectangles for the rectangles to be placed in the same zone. Lower
values create more zones; higher values create fewer zones. More
zones are expensive to compute but can produce more accurate
results; fewer zones may reduce false positives. The Tolerance/Y value
is is used for rounding <i>y</i> coordinates to the nearest multiple of
this value. For example, if Tolerance/Y is 5
and a word at position (452,137) is followed by a superscript at
(468,140), both will be treated as having a <i>y</i> coordinate of 140.
<h3 id="ran">Page Ranges</h3>
<p>By default <font color="darkgreen">DiffPDF</font> compares every pair
of pages in the two PDFs (or as many pairs of pages as the number of
pages in the shorter PDF). It is also possible to compare
particular pages or page ranges. For example, if there are two versions
of a PDF file, one with pages 1-12 and the other with pages 1-13 because
of an extra page having been added as page 4, they can be compared by
specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the
second. This will make DiffPDF compare pages in the pairs (1, 1), (2,
2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13).
<h3 id="mar">Margins</h3>
<p>It is possible to make <font color="darkgreen">DiffPDF</font> ignore
any text that is above a specified top margin, below a specified bottom
margin, left of a specified left margin, and right of a specified right
margin. One or more of these margins can be specified by, first,
checking the <b>Exclude Margins</b> checkbox, and second by setting
any of the margins. Margins can be set by clicking on a page view or by
using the margin spinboxes.
<h3 id="sav">Saving</h3>
<p>Use the <b>Save As</b> button to pop up a Save dialog. This dialog
lets you save a <tt>.pdf</tt> file with the highlighted changes, or
individual image files (e.g., in <tt>.png</tt> or various other common
image formats). The dialog supports saving the current or all left
pages, right pages, or both pages.
<h3 id="opt">The Options Dialog</h3>
<p>This dialog is invoked by clicking the <b>Options</b> button.
The dialog supports changing the highlighting color, whether to use
a pen or fill or both, and the fill's opacity. The Square Size is used
when doing Appearance mode comparisons: the smaller the size the more
fine-grained the highlighting is—and the slower to compute.
The Rule width determines the thickness of the margin rules which are
used to indicate the vertical position of differences; the rules can
be switched off using a Rule width of 0.
<h3 id="dock">Dock Windows</h3>
<p>The Controls, Actions, Margins, Zoning, and Log views are in dock
widgets—these can be dragged into other dock areas (in which case
they will reshape themselves as necessary), or dragged to float free.
The Margins, Zoning, and Log views can also be closed; right click a
dock area splitter and check their checkbox to open them again. These
views may be shown tabbed: if there is enough space they can be dragged
out of their tabs and all shown in full.
<h3 id="cli">Command Line Usage</h3>
<p>Although <font color="darkgreen">DiffPDF</font> is a GUI program, if run from a console with two PDF
files listed on the command line,
<font color="darkgreen">DiffPDF</font> will start up and
immediately compare them in Words mode, or in Appearance mode
if their names are preceded with <tt>-a</tt> or
<tt>--appearance</tt> on the command line,
or in Characters mode if their names are preceded with <tt>-c</tt> or
<tt>--character</tt> on the command line. Run
<font color="darkgreen">DiffPDF</font> with <tt>--help</tt> to see all
the command line options. (This won't work on Windows, although the
other command line options will.) Here is the <tt>--help</tt>
output:
<pre>
usage: diffpdf [options] [file1.pdf [file2.pdf]]
A GUI program that compares two PDF files and shows
their differences.
The files are optional and are normally set through
the user interface.
options:
--help show this usage text and terminate (run the
program without this option and press F1 for
online help)
--appearance -a set the initial comparison mode to Appearance
--characters -c set the initial comparison mode to Characters
--words -w set the initial comparison mode to Words
--language=xx set the program to use the given translation
language, e.g., en for English, cz for Czech;
English will be used if there is no translation
available
--debug=2 write the text fed to the sequence matcher into
temporary files (e.g., /tmp/page1.txt etc.)
--debug=3 as --debug=2 but also includes coordinates in
y, x order
</pre>
<p>
The text reordering is done by the
<tt>TextItems::columnZoneYxOrder()</tt> method in the
<tt>textitem.cpp</tt> file: suggestions for improvement are welcome!
(Note that when using <tt>--debug3</tt> coordinates are output in
<i>y</i>, <i>x</i> order.)
<p>If you're specifically looking for a command line PDF comparison
tool, e.g., for automated testing, try
<a href="http://www.qtrac.eu/comparepdf.html">comparepdf</a>.
|