File: frmt_pdf.html

package info (click to toggle)
gdal 1.10.1+dfsg-8
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 84,320 kB
  • ctags: 74,726
  • sloc: cpp: 677,199; ansic: 162,820; python: 13,816; cs: 11,163; sh: 10,446; java: 5,279; perl: 4,429; php: 2,971; xml: 1,500; yacc: 934; makefile: 494; sql: 112
file content (323 lines) | stat: -rw-r--r-- 16,548 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
<html>
<head>
<title>Geospatial PDF</title>
</head>

<body bgcolor="#ffffff">

<h1>Geospatial PDF</h1>

(Available for GDAL &gt;= 1.8.0)
<p>
GDAL supports reading Geospatial PDF documents, by extracting georeferencing information and rasterizing the data.
Non-geospatial PDF documents will also be recognized by the driver.
<p>
Starting with GDAL &gt;= 1.10.0, PDF documents can be created from other GDAL raster datasets, and OGR datasources can
also optionally be drawn on top of the raster layer (see OGR_* creation options in the below section).
<p>
GDAL must be compiled with libpoppler support (GPL-licensed), and libpoppler itself must have been configured with
--enable-xpdf-headers so that the xpdf C++ headers are available. Note: the poppler C++ API isn't
stable, so the driver compilation may fail with too old or too recent poppler versions.
Successfully tested versions are poppler &gt;= 0.12.X and &lt;= 0.22.0.
<p>
Starting with GDAL 1.9.0, as an alternative, the PDF driver can be compiled against libpodofo (LGPL-licensed)
to avoid the libpoppler dependency. This is sufficient to get the georeferencing information. However, for
getting the imagery, the pdftoppm utility that comes with the poppler distribution must be available in the system PATH.
A temporary file will be generated in a directory determined by the following configuration options : CPL_TMPDIR,
TMPDIR or TEMP (in that order). If none are defined, the current directory will be used.
Successfully tested versions are libpodofo 0.8.4 and 0.9.1.
<p>
The driver supports reading georeferencing encoded in either of the 2 current existing ways : according to the OGC
encoding best practice, or according to the Adobe Supplement to ISO 32000.
<p>
Multipage documents are exposed as subdatasets, one subdataset par page of the document.
<p>
The neatline (for OGC best practice) or the bounding box (Adobe style) will be reported as a NEATLINE metadata item,
so that it can be later used as a cutline for the warping algorithm.
<p>
Starting with GDAL 1.9.0, XMP metadata can be extracted from the file, and will be
stored as XML raw content in the xml:XMP metadata domain.
<p>
Starting with GDAL 1.10.0, additional metadata, such as found in USGS Topo PDF can be extracted from the file, and will be
stored as XML raw content in the EMBEDDED_METADATA metadata domain.
<p>

<h2>Configuration options</h2>

<ul>
<li><i>GDAL_PDF_DPI</i> : To control the dimensions of the raster by specifying the DPI of the
rasterization with the Its default value is 150. Starting with GDAL 1.10, the driver
will make some effort to guess the DPI value either from a specific metadata item contained in some PDF files, or
from the raster images inside the PDF (in simple cases).</li>
<li><i>GDAL_PDF_NEATLINE</i> : (GDAL &gt;= 1.10.0 ) The name of the neatline to select (only available for geospatial PDF, encoded
according to OGC Best Practice). This defaults to "Map Layers" for USGS Topo PDF. If not found, the neatline that
covers the largest area.</li>

<p>
Starting with GDAL &gt;= 1.10.0 and when GDAL is compiled against libpoppler, the following options are also
available :

<li><i>GDAL_PDF_RENDERING_OPTIONS</i> : a combination of VECTOR, RASTER and TEXT separated
by comma, to select whether vector, raster or text features should be rendered. If the option is not
specified, all features are rendered.</li>
<li><i>GDAL_PDF_BANDS</i> = 3 or 4 : whether the PDF should be rendered as a RGB (3) or RGBA (4) image.
Defaults to 3.</li>
<li><i>GDAL_PDF_LAYERS</i> = list of layers (comma separated) to turn ON (or "ALL" to turn all layers ON).
The layer names can be obtained by querying the LAYERS metadata domain. When this option is specified,
layers not explicitly listed will be turned off.</li>
<li><i>GDAL_PDF_LAYERS_OFF</i> = list of layers (comma separated) to turn OFF. The layer names can be obtained by
querying the LAYERS metadata domain. When this option is specified, layers not explicitly listed will be turned
off.</li>
</ul>

<h2>LAYERS Metadata domain</h2>

Starting with GDAL &gt;= 1.10.0 and when GDAL is compiled against libpoppler, the LAYERS metadata domain can be queried
to retrieve layer names that can be turned ON or OFF. This is useful to know which values to specify for the
<i>GDAL_PDF_LAYERS</i> or <i>GDAL_PDF_LAYERS_OFF</i> configuration options.<p>

For example :
<pre>
$ gdalinfo ../autotest/gdrivers/data/adobe_style_geospatial.pdf -mdd LAYERS

Driver: PDF/Geospatial PDF
Files: ../autotest/gdrivers/data/adobe_style_geospatial.pdf
[...]
Metadata (LAYERS):
  LAYER_00_NAME=New_Data_Frame
  LAYER_01_NAME=New_Data_Frame.Graticule
  LAYER_02_NAME=Layers
  LAYER_03_NAME=Layers.Measured_Grid
  LAYER_04_NAME=Layers.Graticule
[...]

$ gdal_translate ../autotest/gdrivers/data/adobe_style_geospatial.pdf out.tif --config GDAL_PDF_LAYERS_OFF "New_Data_Frame"

</pre>

<h2>Restrictions</h2>

The opening of a PDF document (to get the georeferencing) is fast, but at the first access to a raster block,
the whole page will be rasterized, which can be a slow operation.
<p>
Note: starting with GDAL 1.10, some raster-only PDF files (such as some USGS GeoPDF files), that are regularly
tiled are exposed as tiled dataset by the GDAL PDF driver, and can be rendered with either the Poppler or the
Podofo backends.
<p>
Only a few of the possible Datums available in the OGC best practice spec have been currently mapped
in the driver. Unrecognized datums will be considered as being based on the WGS84 ellipsoid.
<p>
For documents that contain several neatlines in a page (insets), the georeferencing will be
extracted from the inset that has the largest area (in term of screen points).
<p>

<h2>Creation Issues (GDAL &gt;= 1.10.0)</h2>

<p>PDF documents can be created from other GDAL raster datasets, that
have 1 band (graylevel or with color table), 3 bands (RGB) or 4 bands (RGBA).</p>

<p>Georeferencing information will be written by default according to
the ISO32000 specification. It is also possible to write it according to the
OGC Best Practice conventions (but limited to a few datum and projection types).</p>

<p>Note: PDF write support does not require linking to poppler or podofo.</p>

<h3>Creation Options</h3>

<ul>
<li><p><b>COMPRESS=[NONE/DEFLATE/JPEG/JPEG2000]</b>:
Set the compression to use for raster data. DEFLATE is the default.</p></li>

<li><p><b>STREAM_COMPRESS=[NONE/DEFLATE]</b>:
Set the compression to use for stream objects. DEFLATE is the default.</p></li>

<li><p><b>DPI=value</b>:
Set the DPI to use. Default to 72.</p></li>

<li><p><b>PREDICTOR=[1/2]</b>:
Only for DEFLATE compression. Might be set to 2 to use horizontal predictor that can make files smaller (but not always!).
1 is the default.</p></li>

<li><p><b>JPEG_QUALITY=[1-100]</b>:  Set the JPEG quality when using JPEG compression.
A value of 100 is best quality (least compression), and 1 is worst quality (best compression). The default is 75.</p></li>

<li><p><b>JPEG2000_DRIVER=[JP2KAK/JP2ECW/JP2OpenJPEG/JPEG2000]</b>:
Set the JPEG2000 driver to use. If not specified, it will be searched in the previous list.</p></li>

<li><p><b>TILED=YES</b>: By default monoblock files are created.  This
option can be used to force creation of tiled PDF files.</p></li>

<li><p><b>BLOCKXSIZE=n</b>: Sets tile width, defaults to 256.</p></li>

<li><p><b>BLOCKYSIZE=n</b>: Set tile height, defaults to 256.</p></li>

<li><p><b>CLIPPING_EXTENT=xmin,ymin,xmax,ymax</b>: Set the clipping extent for the main source dataset and for the
optional extra rasters. The coordinates are expressed in the units of the SRS of the dataset.
If not specified, the clipping extent is set to the extent of the main source dataset.</p></li>

<li><p><b>LAYER_NAME=name</b>: Name for layer where the raster is placed. If specified, the raster will be be placed
into a layer that can be toggled/un-toggled in the "Layer tree" of the PDF reader.</p></li>

<li><p><b>EXTRA_RASTERS=dataset_ids</b>: Comma separated list of georeferenced rasters to insert into
the page. Those rasters are displayed on top of the main source raster. They must be georeferenced in the same projection,
and they will be clipped to CLIPPING_EXTENT if it is specified (otherwise to the extent of the main source raster).</p></li>

<li><p><b>EXTRA_RASTERS_LAYER_NAME=dataset_names</b>: Comma separated list of name for each raster
specified in EXTRA_RASTERS. If specified, each extra raster will be be placed
into a layer, named with the specified value, that can be toggled/un-toggled in the "Layer tree" of the PDF reader. If not
specified, all the extra rasters will be placed in the default layer.</p></li>

<li><p><b>EXTRA_STREAM=content</b>: A PDF content stream to draw after the imagery, typically to add some text. It may refer
the /FTimesRoman and /FTimesBold fonts.</p></li>

<li><p><b>EXTRA_IMAGES=image_file_name,x,y,scale[,link=some_url] (possibly repeated)</b>: A list of (ungeoreferenced) images to insert into
the page as extra content. This is useful to insert logos, legends, etc...
x and y are in user units from the lower left corner of the page, and the anchor point is the lower left pixel of the image.
scale is a magnifying ratio (use 1 if unsure). If link=some_url is specified, the image will be selectable and its selection
will cause a web browser to be opened on the specified URL.</p></li>

<li><p><b>EXTRA_LAYER_NAME=name</b>: Name for layer where the extra content specified with EXTRA_STREAM or EXTRA_IMAGES is placed.
If specified, the extra content will be be placed into a layer that can be toggled/un-toggled in the "Layer tree" of the PDF reader.</p></li>

<li><p><b>MARGIN/LEFT_MARGIN/RIGHT_MARGIN/TOP_MARGIN/BOTTOM_MARGIN=value</b>: Margin around image in user units.</p></li>

<li><p><b>GEO_ENCODING=[NONE/ISO32000/OGC_BP/BOTH]</b>:
Set the Geo encoding method to use. ISO32000 is the default.</p></li>

<li><p><b>NEATLINE=polygon_definition_in_wkt</b>:
Set the NEATLINE to use.</p></li>

<li><p><b>XMP=[NONE/xml_xmp_content]</b>:
By default, if the source dataset has data in the 'xml:XMP' metadata domain, this
data will be copied to the output PDF, unless this option is set to NONE. The XMP
xml string can also be directly set to this option.</p></li>

<li><p><b>WRITE_INFO=[YES/NO]</b>:
By default, the AUTHOR, CREATOR, CREATION_DATE, KEYWORDS, PRODUCER, SUBJECT and TITLE information
will be written into the PDF Info block from the corresponding metadata item from the
source dataset, or if not set, from the corresponding creation option.
If this option is set to NO, no information will be written.</p></li>

<li><p><b>AUTHOR</b>, <b>CREATOR</b>, <b>CREATION_DATE</b>, <b>KEYWORDS</b>, <b>PRODUCER</b>,
    <b>SUBJECT</b>, <b>TITLE</b> : metadata that can be written into the PDF Info block.
Note: the format of the value for CREATION_DATE must be D:YYYYMMDDHHmmSSOHH'mm'
(e.g. D:20121122132447+02'00' for 22 nov 2012 13:24:47 GMT+02)
(see <a href="http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf">PDF Reference, version 1.7</a>, page 160) </p></li>

<li><p><b>OGR_DATASOURCE=name</b> : Name of the OGR datasource to display on top of the raster layer.</p></li>

<li><p><b>OGR_DISPLAY_FIELD=name</b> : Name of the field (matching the name of a field from the OGR layer definition)
to use to build the label of features that appear in the "Model Tree" UI component of a well-known PDF viewer.
For example, if the OGR layer has a field called "ID", this can be used as the value for that option : features
in the "Model Tree" will be labelled from their value for the "ID" field.
If not specified, sequential generic labels will be used ("feature1", "feature2", etc... ).</p></li>

<li><p><b>OGR_DISPLAY_LAYER_NAMES=names</b> : Comma separated list of names to display for the OGR layers in the "Model Tree".
This option is useful to provide custom names, instead of OGR layer name that are used when this option is not specified.
When specified, the number of names should be the same as the number of OGR layers in the datasource (and in the order they appear when
listed by ogrinfo for example). </p></li>

<li><p><b>OGR_WRITE_ATTRIBUTES=YES/NO</b> : Whether to write attributes of OGR features. Defaults to YES</p></li>

<li><p><b>OGR_LINK_FIELD=name</b> : Name of the field (matching the name of a field from the OGR layer definition)
to use to cause clicks on OGR features to open a web browser on the URL specified by the field value.</p></li>

<li><p><b>OFF_LAYERS=names</b>: Comma separated list of layer names that should be initially hidden.
By default, all layers are visible. The layer names can come from LAYER_NAME (main raster layer name), EXTRA_RASTERS_LAYER_NAME,
EXTRA_LAYER_NAME and OGR_DISPLAY_LAYER_NAMES.</p></li>

<li><p><b>EXCLUSIVE_LAYERS=names</b>:
Comma separated list of layer names, such that only one of those layers can be visible at a time.
This is the behaviour of radio-buttons in a graphical user interface.
 he layer names can come from LAYER_NAME (main raster layer name), EXTRA_RASTERS_LAYER_NAME,
EXTRA_LAYER_NAME and OGR_DISPLAY_LAYER_NAMES.
</p></li>

<li><p><b>JAVASCRIPT=script</b>: Javascript content to run at document opening. See
<a href="http://partners.adobe.com/public/developer/en/acrobat/sdk/AcroJS.pdf">Acrobat(R) JavaScript Scripting Reference</a>.</p></li>

<li><p><b>JAVASCRIPT_FILE=script_filename</b>: Name of Javascript file to embed and run at document opening. See
<a href="http://partners.adobe.com/public/developer/en/acrobat/sdk/AcroJS.pdf">Acrobat(R) JavaScript Scripting Reference</a>.</p></li>

</ul>

<h2>Update of existing files (GDAL &gt;= 1.10.0)</h2>

Existing PDF files (created or not with GDAL) can be opened in update mode in order to set or
update the following elements :
<ul>
    <li>Geotransform and associated projection (with SetGeoTransform() and SetProjection()) </li>
    <li>GCPs (with SetGCPs())</li>
    <li>Neatline (with SetMetadataItem("NEATLINE", polygon_definition_in_wkt))</li>
    <li>Content of Info object (with SetMetadataItem(key, value) where key is one of AUTHOR, CREATOR, CREATION_DATE, KEYWORDS, PRODUCER, SUBJECT and TITLE)</li>
    <li>xml:XMP metadata (with SetMetadata(md, "xml:XMP"))</li>
</ul>

For geotransform or GCPs, the Geo encoding method used by default is ISO32000. OGC_BP can be
selected by setting the GDAL_PDF_GEO_ENCODING configuration option to OGC_BP.<p>

Updated elements are written at the end of the file, following the incremental update method described in
the PDF specification.<p>

<h2>Examples</h2>

<ul>
<li>
Create a PDF from 2 rasters (main_raster and another_raster), such that main_raster is initially displayed, and they are exclusively displayed :
<pre>
gdal_translate -of PDF main_raster.tif my.pdf -co LAYER_NAME=main_raster
               -co EXTRA_RASTERS=another_raster.tif -co EXTRA_RASTERS_LAYER_NAME=another_raster
               -co OFF_LAYERS=another_raster -co EXCLUSIVE_LAYERS=main_raster,another_raster
</pre>
</li>

<li>
Create of PDF with some JavaScript :
<pre>
gdal_translate -of PDF my.tif my.pdf -co JAVASCRIPT_FILE=script.js
</pre>

where script.js is :
<pre>
button = app.alert({cMsg: 'This file was generated by GDAL. Do you want to visit its website ?', cTitle: 'Question', nIcon:2, nType:2});
if (button == 4) app.launchURL('http://gdal.org/');
</pre>
</li>
</ul>

<h2>See also</h2>

Other drivers :
<ul>
<li><a href="../ogr/drv_pdf.html">OGR PDF driver</a></li>
</ul>
<p>

Specifications :

<ul>
<li><a href="http://portal.opengeospatial.org/files/?artifact_id=40537">OGC GeoPDF Encoding Best Practice Version 2.2 (08-139r3)</a></li>
<li><a href="http://www.adobe.com/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf">Adobe Supplement to ISO 32000</a></li>
<li><a href="http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf">PDF Reference, version 1.7</a></li>
<li><a href="http://partners.adobe.com/public/developer/en/acrobat/sdk/AcroJS.pdf">Acrobat(R) JavaScript Scripting Reference</a></li>
</ul>

<p>
Libraries :

<ul>
<li><a href="http://poppler.freedesktop.org/">Poppler homepage</a></li>
<li><a href="http://podofo.sourceforge.net/">PoDoFo homepage</a></li>
</ul>

<p>
Samples :

<ul>
<li><a href="http://acrobatusers.com/gallery/geospatial">A few Geospatial PDF samples</a></li>
<li><a href="http://www.agc.army.mil/geopdf_gallery.html">Another set of Geospatial PDF samples</a></li>
<li><a href="http://latuviitta.org/documents/Geospatial_PDF_maps_from_OSM_with_GDAL.pdf">Tutorial to generate Geospatial PDF maps from OSM data</a></li>
</ul>

</body>
</html>