File: archive.rst

package info (click to toggle)
pgloader 3.6.9-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 12,040 kB
  • sloc: sql: 32,321; lisp: 14,792; makefile: 439; sh: 85; python: 26
file content (120 lines) | stat: -rw-r--r-- 3,812 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
Archive (http, zip)
===================

This command instructs pgloader to load data from one or more files contained
in an archive. Currently the only supported archive format is *ZIP*, and the
archive might be downloaded from an *HTTP* URL.

Using advanced options and a load command file
----------------------------------------------

The command then would be:

::

   $ pgloader archive.load

And the contents of the ``archive.load`` file could be inspired from the
following:

::

    LOAD ARCHIVE
       FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
       INTO postgresql:///ip4r

       BEFORE LOAD
         DO $$ create extension if not exists ip4r; $$,
            $$ create schema if not exists geolite; $$,

         EXECUTE 'geolite.sql'

       LOAD CSV
            FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
                 WITH ENCODING iso-8859-1
                 (
                    locId,
                    country,
                    region     null if blanks,
                    city       null if blanks,
                    postalCode null if blanks,
                    latitude,
                    longitude,
                    metroCode  null if blanks,
                    areaCode   null if blanks
                 )
            INTO postgresql:///ip4r?geolite.location
                 (
                    locid,country,region,city,postalCode,
                    location point using (format nil "(~a,~a)" longitude latitude),
                    metroCode,areaCode
                 )
            WITH skip header = 2,
                 fields optionally enclosed by '"',
                 fields escaped by double-quote,
                 fields terminated by ','

      AND LOAD CSV
            FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
                 WITH ENCODING iso-8859-1
                 (
                    startIpNum, endIpNum, locId
                 )
            INTO postgresql:///ip4r?geolite.blocks
                 (
                    iprange ip4r using (ip-range startIpNum endIpNum),
                    locId
                 )
            WITH skip header = 2,
                 fields optionally enclosed by '"',
                 fields escaped by double-quote,
                 fields terminated by ','

       FINALLY DO
         $$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;

Common Clauses
--------------

Please refer to :ref:`common_clauses` for documentation about common
clauses.

Archive Source Specification: FROM
----------------------------------

Filename or HTTP URI where to load the data from. When given an HTTP URL the
linked file will get downloaded locally before processing.

If the file is a `zip` file, the command line utility `unzip` is used to
expand the archive into files in `$TMPDIR`, or `/tmp` if `$TMPDIR` is unset
or set to a non-existing directory.

Then the following commands are used from the top level directory where the
archive has been expanded.

Archive Sub Commands
--------------------

  - command [ *AND* command ... ]

    A series of commands against the contents of the archive, at the moment
    only `CSV`,`'FIXED` and `DBF` commands are supported.

    Note that commands are supporting the clause *FROM FILENAME MATCHING*
    which allows the pgloader command not to depend on the exact names of
    the archive directories.

    The same clause can also be applied to several files with using the
    spelling *FROM ALL FILENAMES MATCHING* and a regular expression.

    The whole *matching* clause must follow the following rule::

      FROM [ ALL FILENAMES | [ FIRST ] FILENAME ] MATCHING

Archive Final SQL Commands
--------------------------
      
  - *FINALLY DO*

    SQL Queries to run once the data is loaded, such as `CREATE INDEX`.