File: guide.md

package info (click to toggle)
python-django-pgbulk 3.2.4-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 556 kB
  • sloc: python: 1,815; makefile: 101; sh: 9
file content (230 lines) | stat: -rw-r--r-- 5,732 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
# User Guide

`django-pgbulk` comes with the following functions:

- Use [pgbulk.upsert][] to do an `INSERT ON CONFLICT` statement.
- Use [pgbulk.update][] to do a bulk `UPDATE` statement.
- Use [pgbulk.copy][] to do a `COPY FROM` statement.
- Use [pgbulk.aupsert][], [pgbulk.aupdate][], or [pgbulk.acopy][] for async versions.

Below we show examples and advanced functionality.

## Using `pgbulk.upsert`

[pgbulk.upsert][] allows for updating or inserting rows atomically and returning results based on inserts or updates. Update fields, returned values, and ignoring unchanged rows can be configured.

See [the Postgres INSERT docs](https://www.postgresql.org/docs/current/sql-insert.html) for more information on how `ON CONFLICT` works.

#### A basic bulk upsert on a model

```python
import pgbulk

pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr="some_val1"),
        MyModel(int_field=2, some_attr="some_val2"),
    ],
    # These are the fields that identify the uniqueness constraint.
    ["int_field"],
    # These are the fields that will be updated if the row already
    # exists. If not provided, all fields will be updated
    ["some_attr"]
)
```

#### Return the results of an upsert

```python
results = pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr="some_val1"),
        MyModel(int_field=2, some_attr="some_val2"),
    ],
    ["int_field"],
    ["some_attr"],
    # `True` will return all columns. One can also explicitly
    # list which columns will be returned
    returning=True
)

# Print which results were created
print(results.created)

# Print which results were updated.
# By default, if an update results in no changes, it will not
# be updated and will not be returned.
print(results.updated)
```

#### Use an expression for updates

In this example, we increment `some_int_field` by one whenever an update happens. Otherwise it defaults to zero:

```python
pgbulk.upsert(
    MyModel,
    [
        MyModel(some_int_field=0, some_key="a"),
        MyModel(some_int_field=0, some_key="b")
    ],
    ["some_key"],
    [
        # Use UpdateField to specify an expression for the update.
        pgbulk.UpdateField(
            "some_int_field",
            expression=models.F("some_int_field") + 1
        )
    ],
)
```

#### Ignore updates to unchanged rows

```python
pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr="some_val1"),
        MyModel(int_field=2, some_attr="some_val2"),
    ],
    ["int_field"],
    ["some_attr"],
    ignore_unchanged=True
)
```

!!! warning

    Triggers and auto-generated fields not in the update won't be applied. Unchanged rows also won't be returned if using `returning=True`.

## Using `pgbulk.update`

[pgbulk.update][] issues updates to multiple rows with an `UPDATE SET ... FROM VALUES` statement. Update fields, returned values, and ignoring unchanged rows can be configured.

#### Update an attribute of multiple models in bulk

```python
import pgbulk

pgbulk.update(
    MyModel,
    [
        MyModel(id=1, some_attr='some_val1'),
        MyModel(id=2, some_attr='some_val2')
    ],
    # These are the fields that will be updated. If not provided,
    # all fields will be updated
    ['some_attr']
)
```

#### Use an expression in an update

In the example, we increment `some_int_field` by one:

```python
pgbulk.update(
    MyModel,
    [
        MyModel(some_int_field=0, some_key="a"),
        MyModel(some_int_field=0, some_key="b")
    ],
    [
        # Use UpdateField to specify an expression for the update.
        pgbulk.UpdateField(
            "some_int_field",
            expression=models.F("some_int_field") + 1
        )
    ],
)
```

#### Return the results of an update

```python
results = pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr="some_val1"),
        MyModel(int_field=2, some_attr="some_val2"),
    ],
    ["int_field", "some_attr"],
    # `True` will return all columns. One can also explicitly
    # list which columns will be returned.
    returning=True
)

# Results can be accessed as a tuple
print(results[0].int_field)
```

#### Ignore updates to unchanged rows

```python
pgbulk.upsert(
    MyModel,
    [
        MyModel(int_field=1, some_attr="some_val1"),
        MyModel(int_field=2, some_attr="some_val2"),
    ],
    ["int_field"],
    ignore_unchanged=True
)
```

!!! warning

    Triggers and auto-generated fields not in the update won't be applied. Unchanged rows also won't be returned if using `returning=True`.

## Using `pgbulk.copy`

Using `pgbulk.copy` issues a `COPY ... FROM STDIN` statement to insert rows, which can be substantially faster than bulk `INSERT` statement or Django's `bulk_create`. Unlike `bulk_create`, `pgbulk.copy` cannot return inserted results.

!!! note

    `pgbulk.copy` is not only available when using psycopg2.

#### Inserting Rows

```python
import pgbulk

pgbulk.copy(
    models.TestModel,
    [
        models.TestModel(int_field=5, float_field=1),
        models.TestModel(int_field=6, float_field=2),
        models.TestModel(int_field=7, float_field=3),
    ],
)
```

#### Inserting Specific Columns

Specify columns or use `exclude` to configure which columns are copied:

```python
pgbulk.copy(
    models.TestModel,
    [
        models.TestModel(int_field=5, float_field=1),
        models.TestModel(int_field=6, float_field=2),
        models.TestModel(int_field=7, float_field=3),
    ],
    ["int_field"]  # Only copy the int_field
)
```

```python
pgbulk.copy(
    ...,
    exclude=["generated_field"]  # Exclude only this field
)
```

!!! note

    Columns that are excluded from the copy must be generated, nullable, or have database defaults.