File: README.md

package info (click to toggle)
extractpdfmark 1.1.0-1.1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 440 kB
  • sloc: cpp: 1,853; makefile: 145; sh: 132
file content (210 lines) | stat: -rw-r--r-- 7,033 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# Extract PDFmark

Extract page mode and named destinations as PDFmark from PDF

https://github.com/trueroad/extractpdfmark  
http://www.ctan.org/pkg/extractpdfmark

When you create a PDF document using something like a TeX system
you may include many small PDF files in the main PDF file.
It is common for each of the small PDF files to use the same fonts.

If the small PDF files contain embedded font subsets,
the TeX system includes them as-is in the main PDF.
As a result,
several subsets of the same font are embedded in the main PDF.
It is not possible to remove the duplicates since they are different subsets.
This vastly increases the size of the main PDF file.

On the other hand,
if the small PDF files contain embedded full font sets,
the TeX system also includes all of them in the main PDF.
This time, the main PDF contains duplicates of the same full sets of fonts.
Therefore, Ghostscript can remove the duplicates.
This may considerably reduce the main PDF-file's size.
(Note: Ghostscript 9.17 - 9.21 needs `-dPDFDontUseFontObjectNum`
commandline option for removing duplicate fonts.
If you use Ghostscript 9.22+, you cannot use this "full set embedding" method
since it cannot remove duplicate fonts.
See https://ghostscript.com/pipermail/gs-devel/2017-September/date.html
and http://lists.gnu.org/archive/html/lilypond-devel/2017-09/index.html .
In this case, you can use "*not* embedding" method as following.)

Finally,
if the small PDF files contain some fonts that are *not* embedded,
the TeX system outputs the main PDF file with some fonts missing.
In this case, Ghostscript can embed the necessary fonts.
It can significantly reduce the required disk size.
(Note: If you use Ghostscript 9.26+ and want to embed CID fonts,
see https://bugs.ghostscript.com/show_bug.cgi?id=700367
and https://bugs.ghostscript.com/show_bug.cgi?id=700436 .)

Either way,
when Ghostscript reads the main PDF produced by the TeX system
and outputs the final PDF
it does not preserve PDF page-mode and named-destinations etc.
As a result,
when you open the final PDF,
it is not displayed correctly.
Also, remote PDF links will not work correctly.

http://bugs.ghostscript.com/show_bug.cgi?id=696943  
http://bugs.ghostscript.com/show_bug.cgi?id=695760

This program is able to extract page mode and named destinations
as PDFmark from PDF.
By using this you can get the small PDF files
that have preserved them.

## Usage

    $ extractpdfmark TeX-System-Outputted.pdf > Extracted-PDFmark.ps
    $ gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
         -dPDFDontUseFontObjectNum -dPrinted=false \
         -sOutputFile=Final.pdf TeX-System-Outputted.pdf Extracted-PDFmark.ps

(Note: Ghostscript 9.26+ needs `-dPrinted=false` commandline option.
See https://bugs.ghostscript.com/show_bug.cgi?id=699830 .)

## Install from binary package

Some distributions have `extractpdfmark` package.

* Debian:
[9 stretch](https://packages.debian.org/stretch/extractpdfmark).
* Ubuntu:
17.04 Zesty Zapus,
17.10 Artful Aardvark,
[18.04 LTS Bionic Beaver](https://packages.ubuntu.com/bionic/extractpdfmark),
[18.10 Cosmic Cuttlefish](https://packages.ubuntu.com/cosmic/extractpdfmark).
* Fedora:
[29](https://apps.fedoraproject.org/packages/extractpdfmark).
* Cygwin:
[2017-05](https://sourceware.org/ml/cygwin-announce/2017-05/msg00030.html).

## Install from [source tarball](https://github.com/trueroad/extractpdfmark/releases/download/v1.1.0/extractpdfmark-1.1.0.tar.gz)

### Required

Extract PDFmark requires one of the two interfaces of poppler.
Please choose which to use when building Extrat PDFmark.

#### poppler-cpp I/F (recommended)

[poppler](https://poppler.freedesktop.org/) 0.74.0+ is required.
Extract PDFmark's configure script selects poppler-cpp I/F
if pkg-config finds poppler-cpp >= 0.74.0.

The configure script's option `--with-poppler=cpp` specifies
explicitly using this interface.

When you would like to use packages for preparing the required library,
the following might be convenient.

* Debian / Ubuntu
    + libpoppler-cpp-dev
* Fedora
    + poppler-cpp-devel
* Cygwin
    + libpoppler-cpp-devel

#### poppler-core I/F

[poppler](https://poppler.freedesktop.org/) 0.13.3+
built with the following option is required
(recommended poppler 0.48.0+).
If you have poppler 0.74.0+, poppler-cpp I/F is recommended.

* --enable-xpdf-headers (poppler 0.59.0 and before)
* -DENABLE_XPDF_HEADERS=ON (poppler 0.60.0 - 0.72.0)
* -DENABLE_UNSTABLE_API_ABI_HEADERS=ON (poppler 0.73.0 and after)

Extract PDFmark's configure script selects poppler-core I/F
if pkg-config does not find poppler-cpp >= 0.74.0
and finds poppler >= 0.24.4.
There are two versions of this interface, private and normal.
For popler 0.24.4 - 0.47.0, private version is selected.
For popler 0.48.0+, normal version is selected.

The configure script's option `--with-poppler=core-private` specifies
explicitly using private version (for poppler 0.13.3+).
The configure script's option `--with-poppler=core` specifies
explicitly using normal version (for poppler 0.48.0+).
If you would like to use poppler 0.13.3 - 0.24.3,
it is necessary to specify explicitly configure script's option
`--with-poppler=core-private`.
However, Extract PDFmark with these versions of popler fails
some tests in `make check`.

When you would like to use packages for preparing the required library,
the following might be convenient.

* Debian / Ubuntu
    + libpoppler-private-dev
    + libpoppler-dev
* Fedora
    + poppler-devel
* Cygwin
    + libpoppler-devel

### Build & install

    $ ./configure
    $ make
    $ make install

If you have `Ghostscript` 9.14+ and `diff` etc.,
you can run tests before installation as follows.

    $ ./configure
    $ make
    $ make check
    $ make install

## Install from [Git repository](https://github.com/trueroad/extractpdfmark)

Source tarball build requirements and additional requirements are necessary.

### Additional required

Autoconf 2.69+  
Automake  
autopoint 0.19.6+ (gettext 0.19.6+)

### Additional recommended

pdfTeX (for generating test PDFs)  
Ghostscript 9.14+ (for `make check`)

### Build & install

    $ git clone https://github.com/trueroad/extractpdfmark.git
    $ cd extractpdfmark
    $ ./autogen.sh
    $ mkdir build
    $ cd build
    $ ../configure
    $ make
    $ make check
    $ make install

## News

[News](./NEWS)

## Licence

Copyright (C) 2016-2019 Masamichi Hosoda

Extract PDFmark is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Extract PDFmark is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Extract PDFmark.  If not, see <http://www.gnu.org/licenses/>.