1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385
|
.. _tutorial:
*************************
Tutorial: Writing an FDW
*************************
Multicorn provides a simple interface for writing foreign data wrappers: the
``multicorn.ForeignDataWrapper`` interface.
Implementing a foreign data wrapper is as simple as inheriting from ``multicorn.ForeignDataWrapper`` and implemening the ``execute`` method.
What are we trying to achieve ?
===============================
Supposing we want to implement a foreign data wrapper which only returns a set
of 20 rows, containing in each column the name of the column itself concatenated
with the number of the line.
The goal of this tutorial is to be able to execute this:
.. code-block:: sql
CREATE FOREIGN TABLE constanttable (
test character varying,
test2 character varying
) server multicorn_srv options (
wrapper 'myfdw.ConstantForeignDataWrapper'
)
SELECT * from constanttable;
And obtain this as a result:
.. code-block:: bash
test | test2
---------+----------
test 0 | test2 0
test 1 | test2 1
test 2 | test2 2
test 3 | test2 3
test 4 | test2 4
test 5 | test2 5
test 6 | test2 6
test 7 | test2 7
test 8 | test2 8
test 9 | test2 9
test 10 | test2 10
test 11 | test2 11
test 12 | test2 12
test 13 | test2 13
test 14 | test2 14
test 15 | test2 15
test 16 | test2 16
test 17 | test2 17
test 18 | test2 18
test 19 | test2 19
(20 lignes)
How do we do that ?
===================
The fdw described above is pretty simple, implementing it should be easy !
First things first, we have to create a new python module.
This can be achieved with the most simple ``setup.py`` file:
.. code-block:: python
import subprocess
from setuptools import setup, find_packages, Extension
setup(
name='myfdw',
version='0.0.1',
author='Ronan Dunklau',
license='Postgresql',
packages=['myfdw']
)
But let's see the whole code. To be usable with the above ``CREATE FOREIGN
TABLE`` statement, this module should be named ``myfdw``.
.. code-block:: python
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns):
super(ConstantForeignDataWrapper, self).__init__(options, columns)
self.columns = columns
def execute(self, quals, columns):
for index in range(20):
line = {}
for column_name in self.columns:
line[column_name] = '%s %s' % (column_name, index)
yield line
You should have the following directory structure:
.. code-block:: bash
.
|-- myfdw/
| `-- __init__.py
`-- setup.py
To install it, just run ``python setup.py install``, and the file will be copied
to your global python installation, which should be the one your PostgreSQL
instance is using.
And that's it !
You just created your first foreign data wrapper. But let's look a bit more
thoroughly to the class...
The first thing to do (although optional, since you can implement the interface
via duck-typing), is to import the base class and subclass it:
.. code-block:: python
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
The init method must accept two arguments
``options``
A dictionary of options given in the ``OPTIONS`` clause of the
``CREATE FOREIGN TABLE`` statement, minus the wrapper option.
``columns``
A mapping of the columns names given during the table creation, associated
to their types.
Ex: {'test': 'character varying'}
Our access point do not need any options, thus we will only need to keep a
reference to the columns:
.. code-block:: python
def __init__(self, options, columns):
super(ConstantForeignDataWrapper, self).__init__(options, columns)
self.columns = columns
The execute method is the core of the API.
It is called with a list of ``Qual`` objects, and a list column names, which we will ignore
for now but more on that `later <#optimizations>`_.
This method must return an iterable of the resulting lines.
Each line can be either a list containing an item by column,
or a dictonary mappning the column names to their value.
For this example, we chose to build a dictionary.
Each column contains the concatenation of the column name and
the line index.
.. code-block:: python
def execute(self, quals):
for index in range(20):
line = {}
for column_name in self.columns:
line[column_name] = '%s %s' % (column_name, index)
yield line
And that's it !
Write API
=========
Since PostgreSQL 9.3, foreign data wrappers can implement a write API.
In multicorn, this involves defining which column will be used as a primary key (mandatory) and implementing the following methods at your
discretion:
.. code-block:: python
def insert(self, new_values)
def update(self, old_values, new_values)
def delete(self, old_values)
Each of these arguments will be dictionaries, containing at least the column you
defined as a primary key, and the values to insert or those which have changed
(for an update). In addition, other values may be present depending on the query
involved.
These methods should return a dictionary containing the new values (after
insertion or update). This will be used in the case of RETURNING clauses of the
form:
.. code-block:: sql
INSERT INTO my_ft VALUES (some_value) RETURNING *;
You can return new values if the values that were given in sql are not the ones
that are actually stored (think about default values, triggers...).
The row_id_column attribute must be set to the name of a column acting as a
primary key. For example:
.. code-block:: python
class MyFDW(ForeignDataWrapper):
def __init__(self, fdw_options, fdw_columns):
self.row_id_column = fdw_columns.keys()[0]
If you want to handle transaction hooks, you can implement the following
methods:
.. code-block:: python
def commit(self)
def rollback(self)
def pre_commit(self)
The pre_commit method will be called just before the local transaction commits.
You can raise an exception here to abort the current transaction were your
remote commit to fail.
The commit method will be called just at commit time, while the rollback method
will be called whenever the local transaction is rollbacked.
Optimizations
=============
As was noted in the code commentaries, the execute methods accept a ``quals`` argument.
This argument is a list of quals object, which are defined in `multicorn/__init__.py`_.
A Qual object defines a simple condition wich can be used by the foreign data
wrapper to restrict the number of the results.
The Qual class defines three instance's attributes:
- field_name: the name of the column concerned by the condition.
- operator: the name of the operator.
- value: the value expressed in the condition.
Let's suppose we write the following query:
.. code-block:: sql
SELECT * from constanttable where test = 'test 2' and test2 like '%3%';
The method execute would be called with the following quals:
.. code-block:: python
[Qual('test', '=', 'test 2'), Qual('test', '~~', '3')]
Now you can use this information to reduce the set of results to return to the
postgresql server.
.. note::
You don't HAVE to enforce those quals, Postgresql will check them anyway.
It's nonetheless useful to reduce the amount of results you fetch over the
network, for example.
.. _multicorn/__init__.py: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/__init__.py
Similarly, the columns argument contains the list of needed columns.
You can use this information to reduce the amount of data that has to be
fetched.
For example, the following query:
.. code-block:: sql
select test, test2 from constanttable;
would result in the following columns argument:
.. code-block:: python
['test', 'test2']
Once again, if you returns more than these columns everything should be fine.
Parameterized paths
-------------------
The python FDW implementor can affect the planner by implementing the
get_path_keys and get_rel_size methods.
.. code-block:: python
def get_rel_size(self, quals, columns):
This method must return a tuple of the form (expected_number_of_row,
expected_mean_width_of_a_row (in bytes)).
The quals and columns arguments can be used to compute those estimates.
For example, the imapfdw computes a huge width whenever the payload column is
requested.
.. code-block:: python
def get_path_keys(self):
This method must return a list of tuple of the form (column_name,
expected_number_of_row).
The expected_number_of_row must be computed as if a "where column_name =
some_value" filter were applied.
This helps the planner to estimate parameterized paths cost, and change the plan
accordingly.
For example, informing the planner that a filter on a column may return exactly
one row, instead of the full billion, may help it on deciding to use a
nested-loop instead of a full sequential scan.
Error reporting
===============
In the `multicorn.utils`_ module lies a simple utility function,
``log_to_postgres``.
.. _multicorn.utils: https://github.com/Kozea/Multicorn/blob/master/python/multicorn/utils.py
This function is mapped to the Postgresql function erreport.
It accepts three arguments:
``message`` (required)
A python string containing the message to report.
``level`` (optional, defaults to ``logging.INFO``)
The severity of the message. The following values are accepted:
``logging.DEBUG``
Maps to a postgresql DEBUG1 message. In most configurations, it won't
show at all.
``logging.INFO``
Maps to a postgresql NOTICE message. A NOTICE message is passed to the
client, as well as in the server logs.
``logging.WARNING``
Maps to a postgresql WARNING message. A WARNING message is passed to the
client, as well as in the server logs.
``logging.ERROR``
Maps to a postgresql ERROR message. An ERROR message is passed to the
client, as well as in the server logs.
.. important::
An ERROR message results in the current transaction being aborted.
Think about the consequences when you use it !
``logging.CRITICAL``
Maps to a postgresql FATAL message. Causes the current server process
to abort.
.. important::
A CRITICAL message results in the current server process to be aborted
Think about the consequences when you use it !
``hint`` (optional)
An hint given to the user to resolve the cause of the message (ex:``Try
adding the missing option in the table creation statement``)
Foreign Data Wrapper lifecycle
==============================
The foreign data wrapper associated to a table is instantiated on a per-process
basis, and it happens when the first query is run against it.
Usually, postgresql server processes are spawned on a per-connection basis.
During the life time of a server process, the instance is cached.
That means that if you have to keep references to resources such as connections,
you should establish them in the ``__init__`` method and cache them as instance
attributes.
|