File: quickstart.rst

package info (click to toggle)
python-clickhouse-driver 0.2.5-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,516 kB
  • sloc: python: 10,950; pascal: 42; makefile: 31; sh: 3
file content (236 lines) | stat: -rw-r--r-- 6,802 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
.. _quickstart:

Quickstart
==========

This page gives a good introduction to clickhouse-driver.
It assumes you already have clickhouse-driver installed.
If you do not, head over to the :ref:`installation` section.

A minimal working example looks like this:

    .. code-block:: python

        >>> from clickhouse_driver import Client
        >>>
        >>> client = Client(host='localhost')
        >>>
        >>> client.execute('SHOW DATABASES')
        [('default',)]


This code will show all tables from ``'default'`` database.

There are two conceptual types of queries:

- Read only queries: SELECT, SHOW, etc.
- Read and write queries: INSERT.


Every query should be executed by calling one of the client's execute
methods: `execute`, `execute_with_progress`, `execute_iter method`.

- SELECT queries can use `execute`, `execute_with_progress`, `execute_iter`
  methods.
- INSERT queries can use only `execute` method.

Selecting data
--------------

Simple select query looks like:

    .. code-block:: python

        >>> client.execute('SELECT * FROM system.numbers LIMIT 5')
        [(0,), (1,), (2,), (3,), (4,)]


Of course queries can and should be parameterized to avoid SQL injections:

    .. code-block:: python

        >>> from datetime import date
        >>> client.execute(
        ...     'SELECT %(date)s, %(a)s + %(b)s',
        ...     {'date': date.today(), 'a': 1, 'b': 2}
        ... )
        [('2018-10-21', 3)]

Percent symbols in inlined constants should be doubled if you mix constants
with ``%`` symbol and ``%(myvar)s`` parameters.

    .. code-block:: python

        >>> client.execute(
        ...     "SELECT 'test' like '%%es%%', %(myvar)s",
        ...     {'myvar': 1}
        ... )

NOTE: formatting queries using Python's f-strings or concatenation can lead to SQL injections.
Use ``%(myvar)s`` parameters instead.

Customisation ``SELECT`` output with ``FORMAT`` clause is not supported.

.. _execute-with-progress:

Selecting data with progress statistics
---------------------------------------

You can get query progress statistics by using `execute_with_progress`. It can be useful for cancelling long queries.

    .. code-block:: python

        >>> from datetime import datetime
        >>>
        >>> progress = client.execute_with_progress(
        ...     'LONG AND COMPLICATED QUERY'
        ... )
        >>>
        >>> timeout = 20
        >>> started_at = datetime.now()
        >>>
        >>> for num_rows, total_rows in progress:
        ...     if total_rows:
        ...         done = float(num_rows) / total_rows
        ...     else:
        ...         done = total_rows
        ...
        ...     now = datetime.now()
        ...     elapsed = (now - started_at).total_seconds()
        ...     # Cancel query if it takes more than 20 seconds
        ...     # to process 50% of rows.
        ...     if elapsed > timeout and done < 0.5:
        ...         client.cancel()
        ...         break
        ... else:
        ...     rv = progress.get_result()
        ...     print(rv)
        ...


.. _execute-iter:

Streaming results
-----------------

When you are dealing with large datasets block by block results streaming may be useful:

    .. code-block:: python

        >>> settings = {'max_block_size': 100000}
        >>> rows_gen = client.execute_iter(
        ...     'QUERY WITH MANY ROWS', settings=settings
        ... )
        >>>
        >>> for row in rows_gen:
        ...     print(row)
        ...


Inserting data
--------------

Insert queries in `Native protocol <https://clickhouse.com/docs/en/interfaces/tcp/>`_
are a little bit tricky because of ClickHouse's columnar nature. And because we're using Python.

INSERT query consists of two parts: query statement and query values. Query values are split into chunks called blocks.
Each block is sent in binary columnar form.

As data in each block is sent in binary we should not serialize into string by
using substitution ``%(a)s`` and then deserialize it back into Python types.

This INSERT would be extremely slow if executed with thousands rows of data:

    .. code-block:: python

        >>> client.execute(
        ...     'INSERT INTO test (x) VALUES (%(a)s), (%(b)s), ...',
        ...     {'a': 1, 'b': 2, ...}
        ... )


To insert data efficiently, provide data separately, and end your statement with a ``VALUES`` clause:

    .. code-block:: python

        >>> client.execute(
        ...     'INSERT INTO test (x) VALUES',
        ...     [{'x': 1}, {'x': 2}, {'x': 3}, {'x': 100}]
        ... )
        4
        >>> client.execute(
        ...     'INSERT INTO test (x) VALUES',
        ...     [[200]]
        ... )
        1
        >>> client.execute(
        ...     'INSERT INTO test (x) VALUES',
        ...     ((x, ) for x in range(5))
        ... )
        5

You can use any iterable yielding lists, tuples or dicts.

If data is not passed, connection will be terminated after a timeout.

    .. code-block:: python

        >>> client.execute('INSERT INTO test (x) VALUES')  # will hang

The following WILL NOT work:

    .. code-block:: python

        >>> client.execute(
        ...     'INSERT INTO test (x) VALUES (%(a)s), (%(b)s)',
        ...     {'a': 1, 'b': 2}
        ... )


Of course for ``INSERT ... SELECT`` queries data is not needed:

    .. code-block:: python

        >>> client.execute(
        ...     'INSERT INTO test (x) '
        ...     'SELECT * FROM system.numbers LIMIT %(limit)s',
        ...     {'limit': 5}
        ... )
        []

ClickHouse will execute this query like a usual ``SELECT`` query.

Inserting data in different formats with ``FORMAT`` clause is not supported.

See :ref:`insert-from-csv-file` if you need to data in custom format.

DDL
---

DDL queries can be executed in the same way SELECT queries are executed:

    .. code-block:: python

        >>> client.execute('DROP TABLE IF EXISTS test')
        []
        >>> client.execute('CREATE TABLE test (x Int32) ENGINE = Memory')
        []


Async and multithreading
------------------------

Every ClickHouse query is assigned an identifier to enable request execution
tracking. However, ClickHouse native protocol is synchronous: all incoming
queries are executed consecutively. Clickhouse-driver does not yet implement
a connection pool.

To utilize ClickHouse's asynchronous capability you should either use multiple
Client instances or implement a queue.

The same thing is applied to multithreading. Queries from different threads
can't use one Client instance with single connection. You should use different
clients for different threads.

However, if you are using DB API for communication with the server each cursor create
its own Client instance. This makes communication thread-safe.