1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
Introduction to pgcopydb
========================
pgcopydb is a tool that automates copying a PostgreSQL database to another
server. Main use case for pgcopydb is migration to a new Postgres system,
either for new hardware, new architecture, or new Postgres major version.
The idea would be to run ``pg_dump -jN | pg_restore -jN`` between two
running Postgres servers. To make a copy of a database to another server as
quickly as possible, one would like to use the parallel options of
``pg_dump`` and still be able to stream the data to as many ``pg_restore``
jobs. Unfortunately, this approach cannot be implemented by using ``pg_dump`` and
``pg_restore`` directly, see :ref:`bypass_intermediate_files`.
When using ``pgcopydb`` it is possible to achieve both concurrency and
streaming with this simple command line::
$ export PGCOPYDB_SOURCE_PGURI="postgres://user@source.host.dev/dbname"
$ export PGCOPYDB_TARGET_PGURI="postgres://role@target.host.dev/dbname"
$ pgcopydb clone --table-jobs 4 --index-jobs 4
See the manual page for :ref:`pgcopydb_clone` for detailed information about
how the command is implemented along with many other supported options.
Feature Matrix
--------------
Here is a comparison of the features available when using ``pg_dump`` and
``pg_restore`` directly versus when using ``pgcopydb`` to handle the database copying:
============================== ======== =====================
Feature pgcopydb pg_dump ; pg_restore
============================== ======== =====================
Single-command operation ✓ ✗
Snapshot consistency ✓ ✓
Ability to resume partial run ✓ ✗
Advanced filtering ✓ ✓
Tables concurrency ✓ ✓
Same-table concurrency ✓ ✗
Index concurrency ✓ ✓
Constraint index concurrency ✓ ✗
Schema ✓ ✓
Large Objects ✓ ✓
Vacuum Analyze ✓ ✗
Copy Freeze ✓ ✗
Roles ✓ ✗ (needs pg_dumpall)
Tablespaces ✗ ✗ (needs pg_dumpall)
Follow changes ✓ ✗
============================== ======== =====================
Refer to the documentation about :ref:`config` for its *Advanced filtering*
capabilities.
pgcopydb uses pg_dump and pg_restore
------------------------------------
The implementation of ``pgcopydb`` actually calls into the ``pg_dump`` and
``pg_restore`` binaries to handle a large part of the work, such as the pre-data
and post-data sections. Refer to `pg_dump docs`__ for more information about the
three sections supported.
__ https://www.postgresql.org/docs/current/app-pgdump.html
After using ``pg_dump`` to obtain the pre-data and the post-data parts, then
``pgcopydb`` restores the pre-data parts to the target Postgres instance using
``pg_restore``.
``pgcopydb`` then uses SQL commands and the `COPY streaming protocol`__ to
migrate the table contents, the large objects data, and to VACUUM ANALYZE
tables as soon as the data becomes available on the target instance.
__ https://www.postgresql.org/docs/current/sql-copy.html
Then ``pgcopydb`` uses SQL commands to build the indexes on the target Postgres
instance, as detailed in the design doc :ref:`index_concurrency`. This
allows to include *constraint indexes* such as Primary Keys in the list of
indexes built at the same time.
Change Data Capture, or fork and follow
---------------------------------------
It is also possible with ``pgcopydb`` to implement Change Data Capture and
replay data modifications happening on the source database to the target
database. See the :ref:`pgcopydb_follow` command and the ``pgcopydb clone
--follow`` command line option at :ref:`pgcopydb_clone` in the manual.
The simplest possible implementation of *online migration* with pgcopydb,
where changes being made to the source Postgres instance database are
replayed on the target system, looks like the following:
.. code-block:: bash
:linenos:
$ pgcopydb clone --follow &
# later when the application is ready to make the switch
$ pgcopydb stream sentinel set endpos --current
# later when the migration is finished, clean-up both source and target
$ pgcopydb stream cleanup
|