File: features.rst

package info (click to toggle)
pgcopydb 0.17-1
links: PTS, VCS
area: main
in suites: sid, trixie
size: 30,636 kB
sloc: ansic: 217,474; sql: 1,654; sh: 812; makefile: 365; python: 94
file content (221 lines) | stat: -rw-r--r-- 10,007 bytes
Features Highlights
===================

``pgcopydb`` project was started to allow certain improvements and considerations
which were otherwise not possible to achieve directly with ``pg_dump`` and
``pg_restore`` commands. Below are the details of what ``pgcopydb`` can achieve.

.. _bypass_intermediate_files:

Bypass intermediate files for the TABLE DATA
--------------------------------------------

First aspect is that for ``pg_dump`` and ``pg_restore`` to implement
concurrency, they need to write to an intermediate file first.

The `docs for pg_dump`__ say the following about the ``--jobs`` parameter:

__ https://www.postgresql.org/docs/current/app-pgdump.html

.. admonition:: From the PostgreSQL documentation

  You can only use this option with the directory output format because this
  is the only output format where multiple processes can write their data at
  the same time.

The `docs for pg_restore`__ say the following about the ``--jobs``
parameter:

__ https://www.postgresql.org/docs/current/app-pgrestore.html

.. admonition:: From the PostgreSQL documentation

  Only the custom and directory archive formats are supported with this
  option. The input must be a regular file or directory (not, for example, a
  pipe or standard input).

So the first idea with ``pgcopydb`` is to provide the ``--jobs`` concurrency and
bypass intermediate files (and directories) altogether, at least as far as
the actual TABLE DATA set is concerned.

The trick to achieve that is that ``pgcopydb`` must be able to connect to the
source database during the whole operation, whereas ``pg_restore`` may be used
from an export on-disk, without having to still be able to connect to the
source database. In the context of ``pgcopydb``, requiring access to the source
database is fine. In the context of ``pg_restore``, it would not be
acceptable.

Large-Objects Support
---------------------

The `Postgres Large-Objects`__ API is nobody's favorite, though the
trade-offs implemented in that API are found to be very useful by many application
developers. In the context dump and restore, Postgres separates the large
objects metadata from the large object contents.

__ https://www.postgresql.org/docs/current/largeobjects.html

Specifically, the metadata consists of a large-object OID and ACLs, and is
considered to be part of the pre-data section of a Postgres dump.

This means that pgcopydb relies on ``pg_dump`` to import the large object
metadata from the source to the target Postgres server, but then implements
its own logic to migrate the large objects contents, using several worker
processes depending on the setting of the command-line option
``--large-objects-jobs``.

Concurrency
-----------

A major feature of pgcopydb is how concurrency is implemented, including
options to obtain same-table COPY concurrency. See the
:ref:`pgcopydb_concurrency` chapter of the documentation for more information.

Change Data Capture
-------------------

pgcopydb implements full Postgres replication solution based on the
lower-level API for `Postgres Logical Decoding`__. This allows pgcopydb to
be compatible with old versions of Postgres, starting with version 9.4.

__ https://www.postgresql.org/docs/current/logicaldecoding.html

Always do a test migration first without the ``--follow`` option to have an
idea of the downtime window needed for your very own case. This will inform
your decision about using the Change Data Capture mode, which makes a
migration a lot more complex to drive to success.

PostgreSQL Logical Decoding Client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The replication client of pgcopydb has been designed to be able to fetch
changes from the source Postgres instance concurrently to the initial COPY
of the data. Three worker processes are created to handle the logical
decoding client:

  - The **streaming** process fetches data from the Postgres replication slot
    using the Postgres replication protocol.

  - The **transform** process transforms the data fetched from an intermediate
    JSON format into a derivative of the SQL language. In *prefetch mode*
    this is implemented as a batch operation; in *replay mode* this is done
    in a streaming fashion, one line at a time, reading from a unix pipe.

  - The **apply** process then applies the SQL script to the target Postgres
    database system and uses Postgres APIs for `Replication Progress
    Tracking`__.

    __ https://www.postgresql.org/docs/current//replication-origins.html

During the initial COPY phase of operations, pgcopydb follow runs in
prefetch mode and does not apply changes yet. After the initial COPY is
done, then pgcopy replication system enters a loop that switches between the
following two modes of operation:

  1. In **prefetch mode**, changes are stored to JSON files on-disk, the
     transform process operates on files when a SWITCH occurs, and the apply
     process catches-up with changes on-disk by applying one file at time.

     When the next file to apply does not exists (yet), then the 3 transform
     worker processes stop and the main follow supervisor process then
     switches to *replay mode*.

  2. In **replay mode** changes are streamed from the streaming worker
     process to the transform worker process using a Unix PIPE mechanism,
     and the obtained SQL statements are sent to the replay worker process
     using another Unix PIPE.

     Changes are then replayed in a streaming fashion, end-to-end, with a
     transaction granularity.

The internal SQL-like script format
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Postgres Logical Decoding API does not provide a CDC format, instead it
allows Postgres extension developers to implement *logical decoding output
plugins*. The Postgres core distribution implements such an output plugin
named `test_decoding`__. Another commonly used output plugin is named
`wal2json`__.

__ https://www.postgresql.org/docs/16/test-decoding.html
__ https://github.com/eulerto/wal2json

pgcopydb is compatible with both ``test_decoding`` and ``wal2json`` plugins. 
As a user it's possible to choose an output plugin with the ``--plugin``
command-line option.

The output plugin compatibility means that pgcopydb has to implement code to
parse the output plugin syntax and make sense of it. Internally, the
messages from the output plugin are stored by pgcopydb in a `JSON Lines`__
formatted file, where each line is a JSON record with decoded metadata about
the changes and the output plugin message, as-is.

__ https://jsonlines.org

This JSON Lines format is transformed into SQL scripts. At first, pgcopydb
would just use SQL for the intermediate format, but then support for 
`prepared statements`__ was added  as an optimization. This means that our SQL
script uses commands such as the following examples::

  PREPARE d33a643f AS INSERT INTO public.rental ("rental_id", "rental_date", "inventory_id", "customer_id", "return_date", "staff_id", "last_update") overriding system value VALUES ($1, $2, $3, $4, $5, $6, $7), ($8, $9, $10, $11, $12, $13, $14);
  EXECUTE d33a643f["16050","2022-06-01 00:00:00+00","371","291",null,"1","2022-06-01 00:00:00+00","16051","2022-06-01 00:00:00+00","373","293",null,"2","2022-06-01 00:00:00+00"];

__ https://www.postgresql.org/docs/current/sql-prepare.html

As you can see in the example, pgcopydb is now able to use a single INSERT
statement with multiple VALUES, which is a huge performance boost. In order
to simplify pgcopydb parsing of the SQL syntax, the choice was made to
format the EXECUTE argument list as a JSON array, which does not comply with
the actual SQL syntax, but is simple and fast to process.

Finally, it's not possible for the transform process to anticipate the
actual session management of the apply process, so SQL statements are always
included with both the PREPARE and the EXECUTE steps. The pgcopydb apply
code knows how to skip PREPARing again, of course.

Unfortunately that means that our SQL files are not actually using SQL
syntax and can't be processed as-is with any SQL client software. At the
moment either using :ref:`pgcopydb_stream_apply` or writing your own
processing code is required.

.. _catalogs:

Internal Catalogs (SQLite)
--------------------------

To be able to implement pgcopydb operations, a list of SQL objects such as
tables, indexes, constraints and sequences is needed internally. While
pgcopydb used to handle such a list as an array in-memory, with also a
hash-table for direct lookup (by oid and by *restore list name*), in some
cases the source database contains so many objects that these arrays do not
fit in memory.

As pgcopydb is written in C, the current best approach to handle an array of
objects that needs to spill to disk and supports direct lookup is actually
the SQLite library, file format, and embedded database engine.

That's why the current version of pgcopydb uses SQLite to handle its
catalogs.

Internally pgcopydb stores metadata information in three different catalogs,
all found in the ``${TMPDIR}/pgcopydb/schema/`` directory by default, unless
using the recommended ``--dir`` option.

  - The **source** catalog registers metadata about the source database, and
    also some metadata about the pgcopydb context, consistency, and
    progress.

  - The **filters** catalog is only used when the ``--filters`` option is
    provided, and it registers metadata about the objects in the source database
    that are going to be skipped.

    This is necessary because the filtering is implemented using the
    ``pg_restore --list`` and ``pg_restore --use-list`` options. The
    Postgres archive Table Of Contents format contains an object OID and its
    *restore list name*, and pgcopydb needs to be able to lookup for that
    OID or name in its filtering catalogs.

  - The **target** catalog registers metadata about the target database,
    such as the list of roles, the list of schemas, or the list of already
    existing constraints found on the target database.