File: intro.rst

package info (click to toggle)
pgcopydb 0.17-1
links: PTS, VCS
area: main
in suites: sid, trixie
size: 30,636 kB
sloc: ansic: 217,474; sql: 1,654; sh: 812; makefile: 365; python: 94
file content (101 lines) | stat: -rw-r--r-- 4,472 bytes
Introduction to pgcopydb
========================

pgcopydb is a tool that automates copying a PostgreSQL database to another
server. Main use case for pgcopydb is migration to a new Postgres system,
either for new hardware, new architecture, or new Postgres major version.

The idea would be to run ``pg_dump -jN | pg_restore -jN`` between two
running Postgres servers. To make a copy of a database to another server as
quickly as possible, one would like to use the parallel options of
``pg_dump`` and still be able to stream the data to as many ``pg_restore``
jobs. Unfortunately, this approach cannot be implemented by using ``pg_dump`` and
``pg_restore`` directly, see :ref:`bypass_intermediate_files`.

When using ``pgcopydb`` it is possible to achieve both concurrency and
streaming with this simple command line::

  $ export PGCOPYDB_SOURCE_PGURI="postgres://user@source.host.dev/dbname"
  $ export PGCOPYDB_TARGET_PGURI="postgres://role@target.host.dev/dbname"

  $ pgcopydb clone --table-jobs 4 --index-jobs 4

See the manual page for :ref:`pgcopydb_clone` for detailed information about
how the command is implemented along with many other supported options.

Feature Matrix
--------------

Here is a comparison of the features available when using ``pg_dump`` and
``pg_restore`` directly versus when using ``pgcopydb`` to handle the database copying:

==============================   ========  =====================
Feature                          pgcopydb   pg_dump ; pg_restore
==============================   ========  =====================
Single-command operation          ✓         ✗
Snapshot consistency              ✓         ✓
Ability to resume partial run     ✓         ✗
Advanced filtering                ✓         ✓
Tables concurrency                ✓         ✓
Same-table concurrency            ✓         ✗
Index concurrency                 ✓         ✓
Constraint index concurrency      ✓         ✗
Schema                            ✓         ✓
Large Objects                     ✓         ✓
Vacuum Analyze                    ✓         ✗
Copy Freeze                       ✓         ✗
Roles                             ✓         ✗ (needs pg_dumpall)
Tablespaces                       ✗         ✗ (needs pg_dumpall)
Follow changes                    ✓         ✗
==============================   ========  =====================

Refer to the documentation about :ref:`config` for its *Advanced filtering*
capabilities.

pgcopydb uses pg_dump and pg_restore
------------------------------------

The implementation of ``pgcopydb`` actually calls into the ``pg_dump`` and
``pg_restore`` binaries to handle a large part of the work, such as the pre-data
and post-data sections. Refer to `pg_dump docs`__ for more information about the
three sections supported.

__ https://www.postgresql.org/docs/current/app-pgdump.html

After using ``pg_dump`` to obtain the pre-data and the post-data parts, then
``pgcopydb`` restores the pre-data parts to the target Postgres instance using
``pg_restore``.

``pgcopydb`` then uses SQL commands and the `COPY streaming protocol`__ to
migrate the table contents, the large objects data, and to VACUUM ANALYZE
tables as soon as the data becomes available on the target instance.

__ https://www.postgresql.org/docs/current/sql-copy.html

Then ``pgcopydb`` uses SQL commands to build the indexes on the target Postgres
instance, as detailed in the design doc :ref:`index_concurrency`. This
allows to include *constraint indexes* such as Primary Keys in the list of
indexes built at the same time.

Change Data Capture, or fork and follow
---------------------------------------

It is also possible with ``pgcopydb`` to implement Change Data Capture and
replay data modifications happening on the source database to the target
database. See the :ref:`pgcopydb_follow` command and the ``pgcopydb clone
--follow`` command line option at :ref:`pgcopydb_clone` in the manual.

The simplest possible implementation of *online migration* with pgcopydb,
where changes being made to the source Postgres instance database are
replayed on the target system, looks like the following:

.. code-block:: bash
  :linenos:

   $ pgcopydb clone --follow &

   # later when the application is ready to make the switch
   $ pgcopydb stream sentinel set endpos --current

   # later when the migration is finished, clean-up both source and target
   $ pgcopydb stream cleanup