1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
|
.. -*- coding: utf-8 -*-
.. :Project: pglast — Parser module
.. :Created: gio 10 ago 2017 10:19:26 CEST
.. :Author: Lele Gaifax <lele@metapensiero.it>
.. :License: GNU General Public License version 3 or later
.. :Copyright: © 2017, 2018, 2021, 2023, 2024 Lele Gaifax
..
==========================================================
:mod:`pglast.parser` --- The interface with libpg_query
==========================================================
.. module:: pglast.parser
:synopsis: The interface with libpg_query
This module is a C extension written in Cython__ that exposes a few functions from the
underlying ``libpg_query`` library it links against.
.. data:: LONG_MAX
The highest integer that can be stored in a C ``long`` variable: it is used as a marker, for
example in PG's ``FetchStmt.howMany``, that uses the constant ``FETCH_ALL``.
.. exception:: ParseError
Exception representing the error state returned by the parser.
.. exception:: DeparseError
Exception representing the error state returned by the deparser.
.. class:: Displacements(string)
Helper class used to find the index of Unicode character from its offset in the
corresponding UTF-8 encoded array.
Example:
.. doctest::
>>> from pglast.parser import Displacements
>>> unicode = '€ 0.01'
>>> utf8 = unicode.encode('utf-8')
>>> d = Displacements(unicode)
>>> for offset in range(len(utf8)):
... idx = d(offset)
... print(f'{offset} [{utf8[offset]:2x}] -> {idx} [{unicode[idx]}]')
...
0 [e2] -> 0 [€]
1 [82] -> 0 [€]
2 [ac] -> 0 [€]
3 [20] -> 1 [ ]
4 [30] -> 2 [0]
5 [2e] -> 3 [.]
6 [30] -> 4 [0]
7 [31] -> 5 [1]
The underlying ``libpg_parse`` library operates on ``UTF-8`` strings: its parser functions
emit tokens with a ``location``, that is actually the offset within the ``UTF-8``
representation of the statement. With this class you can fixup those offsets, like in the
following example:
.. doctest::
>>> import json
>>> from pglast.parser import parse_sql_json
>>> stmt = 'select alias.bar as alìbàbà from foo as alias'
>>> parsed = json.loads(parse_sql_json(stmt))
>>> select = parsed['stmts'][0]['stmt']['SelectStmt']
>>> rangevar = select['fromClause'][0]['RangeVar']
>>> loc = rangevar['location']
>>> print(stmt[loc:loc+3])
as
>>> d = Displacements(stmt)
>>> adjloc = d(loc)
>>> print(stmt[adjloc:adjloc+3])
foo
.. function:: deparse_protobuf(buffer)
:param bytes buffer: a ``Protobuf`` buffer
:returns: str
Return the ``SQL`` statement from the given `buffer` argument, something generated by
:func:`parse_sql_protobuf()`.
.. function:: fingerprint(query)
:param str query: The SQL statement
:returns: str
Fingerprint the given `query`, a string with the ``SQL`` statement(s), and return a
hash digest that can identify similar queries. For similar queries that are different
only because of the queried object or formatting, the returned digest will be the same.
.. function:: get_postgresql_version()
:returns: a tuple
Return the PostgreSQL version as a tuple (`major`, `minor`, `patch`).
.. function:: parse_sql(query)
:param str query: The SQL statement
:returns: tuple
Parse the given `query`, a string with the ``SQL`` statement(s), and return the
corresponding *parse tree* as a tuple of :class:`pglast.ast.RawStmt` instances.
.. function:: parse_sql_json(query)
:param str query: The SQL statement
:returns: str
Parse the given `query`, a string with the ``SQL`` statement(s), and return the
``libpg_query``\ 's ``JSON``\ -serialized parse tree.
.. function:: parse_sql_protobuf(query)
:param str query: The SQL statement
:returns: bytes
Parse the given `query`, a string with the ``SQL`` statement(s), and return the
``libpg_query``\ 's ``Protobuf``\ -serialized parse tree.
.. function:: parse_plpgsql_json(query)
:param str query: The PLpgSQL statement
:returns: str
Parse the given `query`, a string with the ``plpgsql`` statement(s), and return the
``libpg_query``\ 's ``JSON``\ -serialized parse tree.
.. function:: scan(query)
:param str query: The SQL statement
:returns: sequence of tuples
Split the given `query` into its *tokens*. Each token is a `namedtuple` with the following
slots:
start : int
the index of the start of the token
end : int
the index of the end of the token
name : str
the name of the token
kind : str
the kind of the token
Example:
.. doctest::
>>> from pglast.parser import scan
>>> stmt = 'select bar as alìbàbà from foo'
>>> tokens = scan(stmt)
>>> print(tokens[0])
Token(start=0, end=5, name='SELECT', kind='RESERVED_KEYWORD')
>>> print([stmt[t.start:t.end+1] for t in tokens])
['select', 'bar', 'as', 'alìbàbà', 'from', 'foo']
.. function:: split(query, with_parser=True, only_slices=False)
:param str query: The SQL statement
:param bool with_parser: Whether to use the parser or the scanner
:param bool only_slices: Return slices instead of statement's text
:returns: tuple
Split the given `stmts` string into a sequence of the single ``SQL`` statements.
By default this uses the *parser* to perform the job; when `with_parser` is ``False``
the *scanner* variant is used, indicated when the statements may contain parse errors.
When `only_slices` is ``True``, return a sequence of :class:`slice` instances, one for each
statement, instead of statements text.
.. note:: Leading and trailing whitespace are removed from the statements.
Example:
.. doctest::
>>> from pglast.parser import split
>>> split('select 1 for; select 2')
Traceback (most recent call last):
...
pglast.parser.ParseError: syntax error at or near ";", at index 12
>>> split('select 1 for; select 2', with_parser=False)
('select 1 for', 'select 2')
>>> stmts = "select 'fòò'; select 'bàr'"
>>> print([stmts[r] for r in split(stmts, only_slices=True)])
["select 'fòò'", "select 'bàr'"]
__ http://cython.org/
|