File: implementing-tutorial.rst

package info (click to toggle)
postgresql-multicorn 1.4.0-3
links: PTS, VCS
area: main
in suites: bullseye
size: 1,244 kB
sloc: ansic: 3,324; python: 2,258; sql: 751; makefile: 259; sh: 81
file content (385 lines) | stat: -rw-r--r-- 11,575 bytes
parent folder | download | duplicates (3)
.. _tutorial:

*************************
Tutorial: Writing an FDW
*************************

Multicorn provides a simple interface for writing foreign data wrappers: the
``multicorn.ForeignDataWrapper`` interface.

Implementing a foreign data wrapper is as simple as inheriting from ``multicorn.ForeignDataWrapper`` and implemening the ``execute`` method.

What are we trying to achieve ?
===============================

Supposing we want to implement a foreign data wrapper which only returns a set
of 20 rows, containing in each column the name of the column itself concatenated
with the number of the line.

The goal of this tutorial is to be able to execute this:

.. code-block:: sql

    CREATE FOREIGN TABLE constanttable (
        test character varying,
        test2 character varying
    ) server multicorn_srv options (
        wrapper 'myfdw.ConstantForeignDataWrapper'
    )

    SELECT * from constanttable;

And obtain this as a result:

.. code-block:: bash

      test   |  test2   
    ---------+----------
     test 0  | test2 0
     test 1  | test2 1
     test 2  | test2 2
     test 3  | test2 3
     test 4  | test2 4
     test 5  | test2 5
     test 6  | test2 6
     test 7  | test2 7
     test 8  | test2 8
     test 9  | test2 9
     test 10 | test2 10
     test 11 | test2 11
     test 12 | test2 12
     test 13 | test2 13
     test 14 | test2 14
     test 15 | test2 15
     test 16 | test2 16
     test 17 | test2 17
     test 18 | test2 18
     test 19 | test2 19
    (20 lignes)

How do we do that ?
===================

The fdw described above is pretty simple, implementing it should be easy !

First things first, we have to create a new python module.

This can be achieved with the most simple ``setup.py`` file:

.. code-block:: python

  import subprocess
  from setuptools import setup, find_packages, Extension

  setup(
    name='myfdw',
    version='0.0.1',
    author='Ronan Dunklau',
    license='Postgresql',
    packages=['myfdw']
  )

But let's see the whole code. To be usable with the above ``CREATE FOREIGN
TABLE`` statement, this module should be named ``myfdw``.


.. code-block:: python

    from multicorn import ForeignDataWrapper

    class ConstantForeignDataWrapper(ForeignDataWrapper):

        def __init__(self, options, columns):
            super(ConstantForeignDataWrapper, self).__init__(options, columns)
            self.columns = columns

        def execute(self, quals, columns):
            for index in range(20):
                line = {}
                for column_name in self.columns:
                    line[column_name] = '%s %s' % (column_name, index)
                yield line

You should have the following directory structure:

.. code-block:: bash

  .
  |-- myfdw/ 
  |   `-- __init__.py
  `-- setup.py

To install it, just run ``python setup.py install``, and the file will be copied
to your global python installation, which should be the one your PostgreSQL
instance is using.

And that's it !
You just created your first foreign data wrapper. But let's look a bit more
thoroughly to the class...

The first thing to do (although optional, since you can implement the interface
via duck-typing), is to import the base class and subclass it:

.. code-block:: python

    from multicorn import ForeignDataWrapper

    class ConstantForeignDataWrapper(ForeignDataWrapper):

The init method must accept two arguments

``options``
    A dictionary of options given in the ``OPTIONS`` clause of the 
    ``CREATE FOREIGN TABLE`` statement, minus the wrapper option.

``columns``
    A mapping of the columns names given during the table creation, associated
    to their types. 
    Ex: {'test': 'character varying'}


Our access point do not need any options, thus we will only need to keep a
reference to the columns:

.. code-block:: python
   
    def __init__(self, options, columns):
        super(ConstantForeignDataWrapper, self).__init__(options, columns)
        self.columns = columns


The execute method is the core of the API.
It is called with a list of ``Qual`` objects, and a list column names, which we will ignore 
for now but more on that `later <#optimizations>`_.

This method must return an iterable of the resulting lines.
Each line can be either a list containing an item by column,
or a dictonary mappning the column names to their value.

For this example, we chose to build a dictionary.
Each column contains the concatenation of the column name and
the line index.

.. code-block:: python

        def execute(self, quals):
            for index in range(20):
                line = {}
                for column_name in self.columns:
                    line[column_name] = '%s %s' % (column_name, index)
                yield line


And that's it !


Write API
=========

Since PostgreSQL 9.3, foreign data wrappers can implement a write API.

In multicorn, this involves defining which column will be used as a primary key (mandatory) and implementing the following methods at your
discretion:

.. code-block:: python

  def insert(self, new_values)
  def update(self, old_values, new_values)
  def delete(self, old_values)

Each of these arguments will be dictionaries, containing at least the column you
defined as a primary key, and the values to insert or those which have changed
(for an update). In addition, other values may be present depending on the query
involved.

These methods should return a dictionary containing the new values (after
insertion or update). This will be used in the case of RETURNING clauses of the
form:

.. code-block:: sql

  INSERT INTO my_ft VALUES (some_value) RETURNING *;

You can return new values if the values that were given in sql are not the ones
that are actually stored (think about default values, triggers...).

The row_id_column attribute must be set to the name of a column acting as a
primary key. For example:

.. code-block:: python

  class MyFDW(ForeignDataWrapper):

    def __init__(self, fdw_options, fdw_columns):
      self.row_id_column = fdw_columns.keys()[0]

If you want to handle transaction hooks, you can implement the following
methods:

.. code-block:: python

  def commit(self)
  def rollback(self)
  def pre_commit(self)

The pre_commit method will be called just before the local transaction commits.
You can raise an exception here to abort the current transaction were your
remote commit to fail.

The commit method will be called just at commit time, while the rollback method
will be called whenever the local transaction is rollbacked.


Optimizations
=============

As was noted in the code commentaries, the execute methods accept a ``quals`` argument.
This argument is a list of quals object, which are defined in `multicorn/__init__.py`_.
A Qual object defines a simple condition wich can be used by the foreign data
wrapper to restrict the number of the results.
The Qual class defines three instance's attributes:

- field_name: the name of the column concerned by the condition.
- operator: the name of the operator.
- value: the value expressed in the condition.

Let's suppose we write the following query:

.. code-block:: sql

    SELECT * from constanttable where test = 'test 2' and test2 like '%3%';

The method execute would be called with the following quals:

.. code-block:: python 
    
    [Qual('test', '=', 'test 2'), Qual('test', '~~', '3')]

Now you can use this information to reduce the set of results to return to the
postgresql server.

.. note:: 

    You don't HAVE to enforce those quals, Postgresql will check them anyway.
    It's nonetheless useful to reduce the amount of results you fetch over the
    network, for example.


.. _multicorn/__init__.py: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/__init__.py

Similarly, the columns argument contains the list of needed columns.
You can use this information to reduce the amount of data that has to be
fetched.

For example, the following query:

.. code-block:: sql

    select test, test2 from constanttable;

would result in the following columns argument:

.. code-block:: python

    ['test', 'test2']

Once again, if you returns more than these columns everything should be fine.

Parameterized paths
-------------------

The python FDW implementor can affect the planner by implementing the
get_path_keys and get_rel_size methods.


.. code-block:: python

    def get_rel_size(self, quals, columns):

This method must return a tuple of the form (expected_number_of_row,
expected_mean_width_of_a_row (in bytes)).

The quals and columns arguments can be used to compute those estimates.

For example, the imapfdw computes a huge width whenever the payload column is
requested.

.. code-block:: python

    def get_path_keys(self):

This method must return a list of tuple of the form (column_name,
expected_number_of_row).

The expected_number_of_row must be computed as if a "where column_name =
some_value" filter were applied.

This helps the planner to estimate parameterized paths cost, and change the plan
accordingly.

For example, informing the planner that a filter on a column may return exactly
one row, instead of the full billion, may help it on deciding to use a
nested-loop instead of a full sequential scan.

Error reporting
===============

In the `multicorn.utils`_ module lies a simple utility function,
``log_to_postgres``.


.. _multicorn.utils: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/utils.py

This function is mapped to the Postgresql function erreport.

It accepts three arguments:

``message`` (required)
    A python string containing the message to report.

``level`` (optional, defaults to ``logging.INFO``)
    The severity of the message. The following values are accepted:
        ``logging.DEBUG``
            Maps to a postgresql DEBUG1 message. In most configurations, it won't
            show at all.
        ``logging.INFO``
            Maps to a postgresql NOTICE message. A NOTICE message is passed to the
            client, as well as in the server logs.
        ``logging.WARNING``
            Maps to a postgresql WARNING message. A WARNING message is passed to the
            client, as well as in the server logs.
        ``logging.ERROR``
            Maps to a postgresql ERROR message. An ERROR message is passed to the
            client, as well as in the server logs.
            
            .. important:: 
                
                An ERROR message results in the current transaction being aborted.
                Think about the consequences when you use it !

        ``logging.CRITICAL``
            Maps to a postgresql FATAL message. Causes the current server process
            to abort.

            .. important:: 
                
                A CRITICAL message results in the current server process to be aborted
                Think about the consequences when you use it !

``hint`` (optional)
    An hint given to the user to resolve the cause of the message (ex:``Try
    adding the missing option in the table creation statement``) 


Foreign Data Wrapper lifecycle
==============================

The foreign data wrapper associated to a table is instantiated on a per-process
basis, and it happens when the first query is run against it.

Usually, postgresql server processes are spawned on a per-connection basis.

During the life time of a server process, the instance is cached.
That means that if you have to keep references to resources such as connections,
you should establish them in the ``__init__`` method and cache them as instance
attributes.