File: conflict_handling.rst

package info (click to toggle)
python-django-postgres-extra 2.0.9-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,096 kB
  • sloc: python: 9,057; makefile: 17; sh: 7; sql: 1
file content (387 lines) | stat: -rw-r--r-- 13,020 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
.. include:: ./snippets/postgres_doc_links.rst
.. include:: ./snippets/manager_model_warning.rst

.. _conflict_handling_page:

Conflict handling
=================

The :class:`~psqlextra.manager.PostgresManager` comes with full support for PostgreSQL's `ON CONFLICT`_ clause.

This is an extremely useful feature for doing concurrency safe inserts. Often, when you want to insert a row, you want to overwrite it already exists, or simply leave the existing data there. This would require a ``SELECT`` first and then possibly a ``INSERT``. Within those two queries, another process might make a change to the row.

The alternative of trying to insert, ignoring the error and then doing a ``UPDATE`` is also not good. That would result in a lot of write overhead (due to logging).

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel
   from psqlextra.query import ConflictAction

   class MyModel(PostgresModel):
       myfield = models.CharField(max_length=255, unique=True)

   # insert or update if already exists, then fetch, all in a single query
   obj2 = (
       MyModel.objects
       .on_conflict(['myfield'], ConflictAction.UPDATE)
       .insert_and_get(myfield='beer')
   )

   # insert, or do nothing if it already exists, then fetch
   obj1 = (
       MyModel.objects
       .on_conflict(['myfield'], ConflictAction.NOTHING)
       .insert_and_get(myfield='beer')
   )

   # insert or update if already exists, then fetch only the primary key
   id = (
       MyModel.objects
       .on_conflict(['myfield'], ConflictAction.UPDATE)
       .insert(myfield='beer')
   )

.. warning::

   The standard Django methods for inserting/updating are not affected by :meth:`~psqlextra.query.PostgresQuerySet.on_conflict`. It was a conscious decision to not override or change their behavior. The following completely ignores the :meth:`~psqlextra.query.PostgresQuerySet.on_conflict`:

   .. code-block:: python

      obj = (
          MyModel.objects
          .on_conflict(['first_name', 'last_name'], ConflictAction.UPDATE)
          .create(first_name='Henk', last_name='Jansen')
      )

   The same applies to methods such as :meth:`~django:django.db.models.query.QuerySet.update`, :meth:`~django:django.db.models.query.QuerySet.get_or_create` or :meth:`~django:django.db.models.query.QuerySet.update_or_create` etc.


Constraint specification
------------------------

The :meth:`~psqlextra.query.PostgresQuerySet.on_conflict` function's first parameter denotes the name of the column(s) in which the conflict might occur. Although you can specify multiple columns, these columns must somehow have a single constraint. For example, in case of a :attr:`~django:django.db.models.Options.unique_together` constraint.


Multiple columns
****************

Specifying multiple columns is necessary in case of a constraint that spans multiple columns, such as when using Django's :attr:`~django:django.db.models.Options.unique_together`.

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel

   class MyModel(PostgresModel)
       class Meta:
           unique_together = ('first_name', 'last_name',)

       first_name = models.CharField(max_length=255)
       last_name = models.CharField(max_length=255)

   obj = (
       MyModel.objects
       .on_conflict(['first_name', 'last_name'], ConflictAction.UPDATE)
       .insert_and_get(first_name='Henk', last_name='Jansen')
   )


Specific constraint
*******************

Alternatively, instead of specifying the columns the constraint you're targetting applies to, you can also specify the exact constraint to use:

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel

   class MyModel(PostgresModel)
       class Meta:
           constraints = [
               models.UniqueConstraint(
                   name="myconstraint",
                   fields=["first_name", "last_name"]
               ),
           ]

       first_name = models.CharField(max_length=255)
       last_name = models.CharField(max_length=255)

   constraint = next(
       constraint
       for constraint in MyModel._meta.constraints
       if constraint.name == "myconstraint"
    ), None)

   obj = (
       MyModel.objects
       .on_conflict(constraint, ConflictAction.UPDATE)
       .insert_and_get(first_name='Henk', last_name='Jansen')
   )


HStore keys
***********
Catching conflicts in columns with a ``UNIQUE`` constraint on a :class:`~psqlextra.fields.HStoreField` key is also supported:

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel
   from psqlextra.fields import HStoreField

   class MyModel(PostgresModel)
       name = HStoreField(uniqueness=['en'])

   id = (
       MyModel.objects
       .on_conflict([('name', 'en')], ConflictAction.NOTHING)
       .insert(name={'en': 'Swen'})
   )

This also applies to "unique together" constraints in a :class:`~psqlextra.fields.HStoreField` field:

.. code-block:: python

   class MyModel(PostgresModel)
       name = HStoreField(uniqueness=[('en', 'ar')])

   id = (
       MyModel.objects
       .on_conflict([('name', 'en'), ('name', 'ar')], ConflictAction.NOTHING)
       .insert(name={'en': 'Swen'})
   )


insert vs insert_and_get
------------------------

After specifying :meth:`~psqlextra.query.PostgresQuerySet.on_conflict` you can use either :meth:`~psqlextra.query.PostgresQuerySet.insert` or :meth:`~psqlextra.query.PostgresQuerySet.insert_and_get` to perform the insert.


Conflict actions
----------------
There's currently two actions that can be taken when encountering a conflict. The second parameter of :meth:`~psqlextra.query.PostgresQuerySet.on_conflict` allows you to specify that should happen.


ConflictAction.UPDATE
*********************

:attr:`psqlextra.types.ConflictAction.UPDATE`

* If the row does **not exist**, insert a new one.
* If the row **exists**, update it.

This is also known as a "upsert".

Condition
"""""""""

Optionally, a condition can be added. PostgreSQL will then only apply the update if the condition holds true. A condition is specified as a custom expression.

A row level lock is acquired before evaluating the condition and proceeding with the update.

.. note::

    The update condition is translated as a condition for `ON CONFLICT`_. The PostgreSQL documentation states the following:

        An expression that returns a value of type boolean. Only rows for which this expression returns true will be updated, although all rows will be locked when the ON CONFLICT DO UPDATE action is taken. Note that condition is evaluated last, after a conflict has been identified as a candidate to update.


.. code-block:: python

    from psqlextra.expressions import CombinedExpression, ExcludedCol

    pk = (
        MyModel
        .objects
        .on_conflict(
            ['name'],
            ConflictAction.UPDATE,
            update_condition=CombinedExpression(
                MyModel._meta.get_field('priority').get_col(MyModel._meta.db_table),
                '>',
                ExcludedCol('priority'),
            )
        )
        .insert(
            name='henk',
            priority=1,
        )
    )

    if pk:
        print('update applied or inserted')
    else:
        print('condition was false-ish and no changes were made')


When writing expressions, refer to the data you're trying to upsert with the :class:`psqlextra.expressions.ExcludedCol` expression.

Alternatively, with Django 3.1 or newer, :class:`~django:django.db.models.Q` objects can be used instead:

.. code-block:: python

    from django.db.models import Q
    from psqlextra.expressions import ExcludedCol

    Q(name=ExcludedCol('name'))
    Q(name__isnull=True)
    Q(name__gt=ExcludedCol('priority'))


Update values
"""""""""""""

Optionally, the fields to update can be overriden. The default is to update the same fields that were specified in the rows to insert.

Refer to the insert values using the :class:`psqlextra.expressions.ExcludedCol` expression which translates to PostgreSQL's ``EXCLUDED.<column>`` expression. All expressions and features that can be used with Django's :meth:`~django:django.db.models.query.QuerySet.update` can be used here.

.. warning::

   Specifying an empty ``update_values`` (``{}``) will transform the query into :attr:`~psqlextra.types.ConflictAction.NOTHING`. Only ``None`` makes the default behaviour kick in of updating all fields that were specified.

.. code-block:: python

    from django.db.models import F

    from psqlextra.expressions import ExcludedCol

    (
        MyModel
        .objects
        .on_conflict(
            ['name'],
            ConflictAction.UPDATE,
            update_values=dict(
                name=ExcludedCol('name'),
                count=F('count') + 1,
            ),
        )
        .insert(
            name='henk',
            count=0,
        )
    )



ConflictAction.NOTHING
**********************

:attr:`psqlextra.types.ConflictAction.NOTHING`

* If the row does **not exist**, insert a new one.
* If the row **exists**, do nothing.

This is preferable when the data you're about to insert is the same as the one that already exists. This is more performant because it avoids a write in case the row already exists.

.. warning::

   When using :attr:`~psqlextra.types.ConflictAction.NOTHING`, PostgreSQL only returns the row(s) that were created. Conflicting rows are not returned. See example below:

   .. code-block:: python

      # obj1 is _not_ none
      obj1 = MyModel.objects.on_conflict(['name'], ConflictAction.NOTHING).insert(name="me")

      # obj2 is none! object alreaddy exists
      obj2 = MyModel.objects.on_conflict(['name'], ConflictAction.NOTHING).insert(name="me")

   This applies all methods: :meth:`~psqlextra.query.PostgresQuerySet.insert`, :meth:`~psqlextra.query.PostgresQuerySet.insert_and_get`, :meth:`~psqlextra.query.PostgresQuerySet.bulk_insert`


Bulk
----

:meth:`~psqlextra.query.PostgresQuerySet.bulk_insert` allows your to use conflict resolution for bulk inserts:

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel

   class MyModel(PostgresModel):
       name = models.CharField(max_length=255, unique=True)

   obj = (
       MyModel.objects
       .on_conflict(['name'], ConflictAction.UPDATE)
       .bulk_insert([
           dict(name='swen'),
           dict(name='henk'),
           dict(name='adela')
       ])
   )

:meth:`~psqlextra.query.PostgresQuerySet.bulk_insert` uses a single query to insert all specified rows at once. It returns a ``list`` of ``dict`` with each ``dict`` being a merge of the ``dict`` passed in along with any index returned from Postgres.

.. note::

   In order to stick to the "everything in one query" principle, various, more advanced usages of :meth:`~psqlextra.query.PostgresQuerySet.bulk_insert` are impossible. It is not possible to have different rows specify different amounts of columns. The following example does **not work**:

   .. code-block:: python

      from django.db import models
      from psqlextra.models import PostgresModel

      class MyModel(PostgresModel):
          first_name = models.CharField(max_length=255, unique=True)
          last_name = models.CharField(max_length=255, default='kooij')

      obj = (
          MyModel.objects
          .on_conflict(['name'], ConflictAction.UPDATE)
          .bulk_insert([
              dict(name='swen'),
              dict(name='henk', last_name='poepjes'), # invalid, different column configuration
              dict(name='adela')
          ])
      )

   An exception is thrown if this behavior is detected.


Shorthands
----------

The :meth:`~psqlextra.query.PostgresQuerySet.on_conflict`, :meth:`~psqlextra.query.PostgresQuerySet.insert` and :meth:`~psqlextra.query.PostgresQuerySet.insert_or_create` methods were only added in v1.6. Before that, only :attr:`~psqlextra.types.ConflictAction.UPDATE` was supported in the following form:

.. code-block:: python

   from django.db import models
   from psqlextra.models import PostgresModel

   class MyModel(PostgresModel):
       myfield = models.CharField(max_length=255, unique=True)

   obj = (
       MyModel.objects
       .upsert_and_get(
           conflict_target=['myfield']
           fields=dict(myfield='beer')
       )
   )

   id = (
       MyModel.objects
       .upsert(
           conflict_target=['myfield']
           fields=dict(myfield='beer')
       )
   )

   (
       MyModel.objects
       .bulk_upsert(
           conflict_target=['myfield']
           rows=[
               dict(myfield='beer'),
               dict(myfield='wine')
           ]
       )
   )

These two short hands still exist and **are not** deprecated. They behave exactly the same as :attr:`~psqlextra.types.ConflictAction.UPDATE` and are there for convenience. It is up to you to decide what to use.