File: parser.rst

package info (click to toggle)
pglast 7.11-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,368 kB
  • sloc: python: 13,349; sql: 2,405; makefile: 159
file content (198 lines) | stat: -rw-r--r-- 6,294 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
.. -*- coding: utf-8 -*-
.. :Project:   pglast — Parser module
.. :Created:   gio 10 ago 2017 10:19:26 CEST
.. :Author:    Lele Gaifax <lele@metapensiero.it>
.. :License:   GNU General Public License version 3 or later
.. :Copyright: © 2017, 2018, 2021, 2023, 2024 Lele Gaifax
..

==========================================================
 :mod:`pglast.parser` --- The interface with libpg_query
==========================================================

.. module:: pglast.parser
   :synopsis: The interface with libpg_query

This module is a C extension written in Cython__ that exposes a few functions from the
underlying ``libpg_query`` library it links against.

.. data:: LONG_MAX

   The highest integer that can be stored in a C ``long`` variable: it is used as a marker, for
   example in PG's ``FetchStmt.howMany``, that uses the constant ``FETCH_ALL``.

.. exception:: ParseError

   Exception representing the error state returned by the parser.

.. exception:: DeparseError

   Exception representing the error state returned by the deparser.

.. class:: Displacements(string)

   Helper class used to find the index of Unicode character from its offset in the
   corresponding UTF-8 encoded array.

   Example:

   .. doctest::

      >>> from pglast.parser import Displacements
      >>> unicode = '€ 0.01'
      >>> utf8 = unicode.encode('utf-8')
      >>> d = Displacements(unicode)
      >>> for offset in range(len(utf8)):
      ...   idx = d(offset)
      ...   print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]')
      ...
      0 [e2] -> 0 [€]
      1 [82] -> 0 [€]
      2 [ac] -> 0 [€]
      3 [20] -> 1 [ ]
      4 [30] -> 2 [0]
      5 [2e] -> 3 [.]
      6 [30] -> 4 [0]
      7 [31] -> 5 [1]

   The underlying ``libpg_parse`` library operates on ``UTF-8`` strings: its parser functions
   emit tokens with a ``location``, that is actually the offset within the ``UTF-8``
   representation of the statement. With this class you can fixup those offsets, like in the
   following example:

   .. doctest::

      >>> import json
      >>> from pglast.parser import parse_sql_json
      >>> stmt = 'select alias.bar as alìbàbà from foo as alias'
      >>> parsed = json.loads(parse_sql_json(stmt))
      >>> select = parsed['stmts'][0]['stmt']['SelectStmt']
      >>> rangevar = select['fromClause'][0]['RangeVar']
      >>> loc = rangevar['location']
      >>> print(stmt[loc:loc+3])
       as
      >>> d = Displacements(stmt)
      >>> adjloc = d(loc)
      >>> print(stmt[adjloc:adjloc+3])
      foo

.. function:: deparse_protobuf(buffer)

   :param bytes buffer: a ``Protobuf`` buffer
   :returns: str

   Return the ``SQL`` statement from the given `buffer` argument, something generated by
   :func:`parse_sql_protobuf()`.

.. function:: fingerprint(query)

   :param str query: The SQL statement
   :returns: str

   Fingerprint the given `query`, a string with the ``SQL`` statement(s), and return a
   hash digest that can identify similar queries. For similar queries that are different
   only because of the queried object or formatting, the returned digest will be the same.

.. function:: get_postgresql_version()

   :returns: a tuple

   Return the PostgreSQL version as a tuple (`major`, `minor`, `patch`).

.. function:: parse_sql(query)

   :param str query: The SQL statement
   :returns: tuple

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   corresponding *parse tree* as a tuple of :class:`pglast.ast.RawStmt` instances.

.. function:: parse_sql_json(query)

   :param str query: The SQL statement
   :returns: str

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   ``libpg_query``\ 's ``JSON``\ -serialized parse tree.

.. function:: parse_sql_protobuf(query)

   :param str query: The SQL statement
   :returns: bytes

   Parse the given `query`, a string with the ``SQL`` statement(s), and return the
   ``libpg_query``\ 's ``Protobuf``\ -serialized parse tree.

.. function:: parse_plpgsql_json(query)

   :param str query: The PLpgSQL statement
   :returns: str

   Parse the given `query`, a string with the ``plpgsql`` statement(s), and return the
   ``libpg_query``\ 's ``JSON``\ -serialized parse tree.

.. function:: scan(query)

   :param str query: The SQL statement
   :returns: sequence of tuples

   Split the given `query` into its *tokens*. Each token is a `namedtuple` with the following
   slots:

   start : int
     the index of the start of the token

   end : int
     the index of the end of the token

   name : str
     the name of the token

   kind : str
     the kind of the token

   Example:

   .. doctest::

      >>> from pglast.parser import scan
      >>> stmt = 'select bar as alìbàbà from foo'
      >>> tokens = scan(stmt)
      >>> print(tokens[0])
      Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD')
      >>> print([stmt[t.start:t.end+1] for t in tokens])
      ['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']

.. function:: split(query, with_parser=True, only_slices=False)

   :param str query: The SQL statement
   :param bool with_parser: Whether to use the parser or the scanner
   :param bool only_slices: Return slices instead of statement's text
   :returns: tuple

   Split the given `stmts` string into a sequence of the single ``SQL`` statements.

   By default this uses the *parser* to perform the job; when `with_parser` is ``False``
   the *scanner* variant is used, indicated when the statements may contain parse errors.

   When `only_slices` is ``True``, return a sequence of :class:`slice` instances, one for each
   statement, instead of statements text.

   .. note:: Leading and trailing whitespace are removed from the statements.

   Example:

   .. doctest::

      >>> from pglast.parser import split
      >>> split('select 1 for; select 2')
      Traceback (most recent call last):
        ...
      pglast.parser.ParseError: syntax error at or near ";", at index 12
      >>> split('select 1 for; select 2', with_parser=False)
      ('select 1 for', 'select 2')
      >>> stmts = "select 'fòò'; select 'bàr'"
      >>> print([stmts[r] for r in split(stmts, only_slices=True)])
      ["select 'fòò'", "select 'bàr'"]

__ http://cython.org/