File: implementing-tutorial.rst

package info (click to toggle)
postgresql-multicorn 1.4.0-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 1,244 kB
  • sloc: ansic: 3,324; python: 2,258; sql: 751; makefile: 259; sh: 81
file content (385 lines) | stat: -rw-r--r-- 11,575 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
.. _tutorial:

*************************
Tutorial: Writing an FDW
*************************

Multicorn provides a simple interface for writing foreign data wrappers: the
``multicorn.ForeignDataWrapper`` interface.

Implementing a foreign data wrapper is as simple as inheriting from ``multicorn.ForeignDataWrapper`` and implemening the ``execute`` method.

What are we trying to achieve ?
===============================

Supposing we want to implement a foreign data wrapper which only returns a set
of 20 rows, containing in each column the name of the column itself concatenated
with the number of the line.

The goal of this tutorial is to be able to execute this:

.. code-block:: sql

    CREATE FOREIGN TABLE constanttable (
        test character varying,
        test2 character varying
    ) server multicorn_srv options (
        wrapper 'myfdw.ConstantForeignDataWrapper'
    )

    SELECT * from constanttable;

And obtain this as a result:

.. code-block:: bash

      test   |  test2   
    ---------+----------
     test 0  | test2 0
     test 1  | test2 1
     test 2  | test2 2
     test 3  | test2 3
     test 4  | test2 4
     test 5  | test2 5
     test 6  | test2 6
     test 7  | test2 7
     test 8  | test2 8
     test 9  | test2 9
     test 10 | test2 10
     test 11 | test2 11
     test 12 | test2 12
     test 13 | test2 13
     test 14 | test2 14
     test 15 | test2 15
     test 16 | test2 16
     test 17 | test2 17
     test 18 | test2 18
     test 19 | test2 19
    (20 lignes)

How do we do that ?
===================

The fdw described above is pretty simple, implementing it should be easy !

First things first, we have to create a new python module.

This can be achieved with the most simple ``setup.py`` file:

.. code-block:: python

  import subprocess
  from setuptools import setup, find_packages, Extension

  setup(
    name='myfdw',
    version='0.0.1',
    author='Ronan Dunklau',
    license='Postgresql',
    packages=['myfdw']
  )

But let's see the whole code. To be usable with the above ``CREATE FOREIGN
TABLE`` statement, this module should be named ``myfdw``.


.. code-block:: python

    from multicorn import ForeignDataWrapper

    class ConstantForeignDataWrapper(ForeignDataWrapper):

        def __init__(self, options, columns):
            super(ConstantForeignDataWrapper, self).__init__(options, columns)
            self.columns = columns

        def execute(self, quals, columns):
            for index in range(20):
                line = {}
                for column_name in self.columns:
                    line[column_name] = '%s %s' % (column_name, index)
                yield line

You should have the following directory structure:

.. code-block:: bash

  .
  |-- myfdw/ 
  |   `-- __init__.py
  `-- setup.py

To install it, just run ``python setup.py install``, and the file will be copied
to your global python installation, which should be the one your PostgreSQL
instance is using.

And that's it !
You just created your first foreign data wrapper. But let's look a bit more
thoroughly to the class...

The first thing to do (although optional, since you can implement the interface
via duck-typing), is to import the base class and subclass it:

.. code-block:: python

    from multicorn import ForeignDataWrapper

    class ConstantForeignDataWrapper(ForeignDataWrapper):

The init method must accept two arguments

``options``
    A dictionary of options given in the ``OPTIONS`` clause of the 
    ``CREATE FOREIGN TABLE`` statement, minus the wrapper option.

``columns``
    A mapping of the columns names given during the table creation, associated
    to their types. 
    Ex: {'test': 'character varying'}


Our access point do not need any options, thus we will only need to keep a
reference to the columns:

.. code-block:: python
   
    def __init__(self, options, columns):
        super(ConstantForeignDataWrapper, self).__init__(options, columns)
        self.columns = columns


The execute method is the core of the API.
It is called with a list of ``Qual`` objects, and a list column names, which we will ignore 
for now but more on that `later <#optimizations>`_.

This method must return an iterable of the resulting lines.
Each line can be either a list containing an item by column,
or a dictonary mappning the column names to their value.

For this example, we chose to build a dictionary.
Each column contains the concatenation of the column name and
the line index.

.. code-block:: python

        def execute(self, quals):
            for index in range(20):
                line = {}
                for column_name in self.columns:
                    line[column_name] = '%s %s' % (column_name, index)
                yield line


And that's it !


Write API
=========

Since PostgreSQL 9.3, foreign data wrappers can implement a write API.

In multicorn, this involves defining which column will be used as a primary key (mandatory) and implementing the following methods at your
discretion:

.. code-block:: python

  def insert(self, new_values)
  def update(self, old_values, new_values)
  def delete(self, old_values)

Each of these arguments will be dictionaries, containing at least the column you
defined as a primary key, and the values to insert or those which have changed
(for an update). In addition, other values may be present depending on the query
involved.

These methods should return a dictionary containing the new values (after
insertion or update). This will be used in the case of RETURNING clauses of the
form:

.. code-block:: sql

  INSERT INTO my_ft VALUES (some_value) RETURNING *;

You can return new values if the values that were given in sql are not the ones
that are actually stored (think about default values, triggers...).

The row_id_column attribute must be set to the name of a column acting as a
primary key. For example:

.. code-block:: python

  class MyFDW(ForeignDataWrapper):

    def __init__(self, fdw_options, fdw_columns):
      self.row_id_column = fdw_columns.keys()[0]

If you want to handle transaction hooks, you can implement the following
methods:

.. code-block:: python

  def commit(self)
  def rollback(self)
  def pre_commit(self)

The pre_commit method will be called just before the local transaction commits.
You can raise an exception here to abort the current transaction were your
remote commit to fail.

The commit method will be called just at commit time, while the rollback method
will be called whenever the local transaction is rollbacked.


Optimizations
=============

As was noted in the code commentaries, the execute methods accept a ``quals`` argument.
This argument is a list of quals object, which are defined in `multicorn/__init__.py`_.
A Qual object defines a simple condition wich can be used by the foreign data
wrapper to restrict the number of the results.
The Qual class defines three instance's attributes:

- field_name: the name of the column concerned by the condition.
- operator: the name of the operator.
- value: the value expressed in the condition.

Let's suppose we write the following query:

.. code-block:: sql

    SELECT * from constanttable where test = 'test 2' and test2 like '%3%';

The method execute would be called with the following quals:

.. code-block:: python 
    
    [Qual('test', '=', 'test 2'), Qual('test', '~~', '3')]

Now you can use this information to reduce the set of results to return to the
postgresql server.

.. note:: 

    You don't HAVE to enforce those quals, Postgresql will check them anyway.
    It's nonetheless useful to reduce the amount of results you fetch over the
    network, for example.


.. _multicorn/__init__.py: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/__init__.py

Similarly, the columns argument contains the list of needed columns.
You can use this information to reduce the amount of data that has to be
fetched.

For example, the following query:

.. code-block:: sql

    select test, test2 from constanttable;

would result in the following columns argument:

.. code-block:: python

    ['test', 'test2']

Once again, if you returns more than these columns everything should be fine.

Parameterized paths
-------------------

The python FDW implementor can affect the planner by implementing the
get_path_keys and get_rel_size methods.


.. code-block:: python

    def get_rel_size(self, quals, columns):

This method must return a tuple of the form (expected_number_of_row,
expected_mean_width_of_a_row (in bytes)).

The quals and columns arguments can be used to compute those estimates.

For example, the imapfdw computes a huge width whenever the payload column is
requested.

.. code-block:: python

    def get_path_keys(self):

This method must return a list of tuple of the form (column_name,
expected_number_of_row).

The expected_number_of_row must be computed as if a "where column_name =
some_value" filter were applied.

This helps the planner to estimate parameterized paths cost, and change the plan
accordingly.

For example, informing the planner that a filter on a column may return exactly
one row, instead of the full billion, may help it on deciding to use a
nested-loop instead of a full sequential scan.

Error reporting
===============

In the `multicorn.utils`_ module lies a simple utility function,
``log_to_postgres``.


.. _multicorn.utils: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/utils.py

This function is mapped to the Postgresql function erreport.

It accepts three arguments:

``message`` (required)
    A python string containing the message to report.

``level`` (optional, defaults to ``logging.INFO``)
    The severity of the message. The following values are accepted:
        ``logging.DEBUG``
            Maps to a postgresql DEBUG1 message. In most configurations, it won't
            show at all.
        ``logging.INFO``
            Maps to a postgresql NOTICE message. A NOTICE message is passed to the
            client, as well as in the server logs.
        ``logging.WARNING``
            Maps to a postgresql WARNING message. A WARNING message is passed to the
            client, as well as in the server logs.
        ``logging.ERROR``
            Maps to a postgresql ERROR message. An ERROR message is passed to the
            client, as well as in the server logs.
            
            .. important:: 
                
                An ERROR message results in the current transaction being aborted.
                Think about the consequences when you use it !

        ``logging.CRITICAL``
            Maps to a postgresql FATAL message. Causes the current server process
            to abort.

            .. important:: 
                
                A CRITICAL message results in the current server process to be aborted
                Think about the consequences when you use it !

``hint`` (optional)
    An hint given to the user to resolve the cause of the message (ex:``Try
    adding the missing option in the table creation statement``) 


Foreign Data Wrapper lifecycle
==============================

The foreign data wrapper associated to a table is instantiated on a per-process
basis, and it happens when the first query is run against it.

Usually, postgresql server processes are spawned on a per-connection basis.

During the life time of a server process, the instance is cached.
That means that if you have to keep references to resources such as connections,
you should establish them in the ``__init__`` method and cache them as instance
attributes.