File: grammar.rst

package info (click to toggle)
python-omegaconf 2.3.0-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 5,056 kB
  • sloc: python: 26,412; makefile: 42; sh: 11
file content (250 lines) | stat: -rw-r--r-- 9,854 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
=====================
The OmegaConf grammar
=====================

.. contents::
   :local:

.. testsetup:: *

    from omegaconf import OmegaConf

OmegaConf uses an `ANTLR <https://www.antlr.org/>`_-based grammar to parse string expressions,
where the `lexer rules <https://github.com/omry/omegaconf/blob/master/omegaconf/grammar/OmegaConfGrammarLexer.g4>`_
rules define the tokens used by the `parser rules <https://github.com/omry/omegaconf/blob/master/omegaconf/grammar/OmegaConfGrammarParser.g4>`_.
Currently this grammar's main usage is in the parsing of :ref:`interpolations<interpolation>`, detailed below.


.. _interpolation-strings:

Interpolation strings
^^^^^^^^^^^^^^^^^^^^^

An interpolation string is any string containing the ``${`` character sequence (denoting the start of an interpolation),
and is parsed using the ``text`` rule of the grammar:

    .. code-block:: antlr

        text: (interpolation |
               ANY_STR | ESC | ESC_INTER | TOP_ESC | QUOTED_ESC)+;

Such a string can either be a single interpolation, or the concatenation of multiple fragments
that can either be interpolations or regular strings
(with a special handling of escaped characters, see :ref:`escaping-in-interpolation-strings` below).
These are all examples of interpolation strings:

    - ``${foo.bar}``
    - ``https://${host}:${port}``
    - ``Hello ${name}``
    - ``${a}${oc.env:B}${c}``


Interpolation types
^^^^^^^^^^^^^^^^^^^

An ``interpolation`` as found in the rule above can either be a :ref:`config-node-interpolation`
(e.g., ``${host}``) or a call to a :ref:`resolver<resolvers>` (e.g., ``${oc.env:B}``).
This is reflected in the following parser rules:

    .. code-block:: antlr

        interpolation: interpolationNode | interpolationResolver;

        interpolationNode:
            INTER_OPEN  // ${
            DOT* 
            (configKey | BRACKET_OPEN configKey BRACKET_CLOSE)
            (DOT configKey | BRACKET_OPEN configKey BRACKET_CLOSE)*
            INTER_CLOSE;  // }

        interpolationResolver:
            INTER_OPEN  // ${
            resolverName COLON sequence?
            BRACE_CLOSE;  // }

The following are all valid examples of config node interpolations according to the ``interpolationNode`` rule
(note in particular that it supports both dot and bracket notations to access child nodes):

    - ``${host}``
    - ``${.sibling}``
    - ``${..uncle.cousin}``
    - ``${some_list[3]}``
    - ``${some_deep_dict[key1][subkey2].subsubkey3}``

Here are also examples of resolver calls from the ``interpolationResolver`` rule:

    - ``${oc.env:B}``
    - ``${my_resolver_without_args:}``
    - ``${oc.select: missing, default}``

Resolver arguments must be provided in a comma-separated list as per the following
``sequence`` parser rule:

    .. code-block:: antlr

        sequence: (element (COMMA element?)*) | (COMMA element?)+;

*Note that this rule currently supports empty arguments to preserve backward compatibility
with OmegaConf 2.0, but this has been deprecated (see* `#572 <https://github.com/omry/omegaconf/issues/572>`_ *).*


.. _element-types:

Element types
^^^^^^^^^^^^^

As seen in the ``sequence`` rule above, each resolver argument is parsed by an ``element`` rule,
which currently supports four main types of arguments:

    .. code-block:: antlr

        element:
            quotedValue
            | listContainer
            | dictContainer
            | primitive
        ;

A ``quotedValue`` is a quoted string that may contain basically anything in-between either double or single quotes
(including interpolations, which will be resolved at evaluation time).
For instance:

    - ``"Hello World!"``
    - ``'Hello ${name}!'``
    - ``"I ${can: ${nest}, ${interpolations}, 'and quotes'}"``

The ``quotedValue`` parser rule is formally defined as:

    .. code-block:: antlr

        quotedValue:
            (QUOTE_OPEN_SINGLE | QUOTE_OPEN_DOUBLE)
            text?
            MATCHING_QUOTE_CLOSE;


``listContainer`` and ``dictContainer`` are respectively lists and dictionaries, using a familiar syntax:

    - List examples: ``[]``, ``[1, 2, 3]``, ``[${a}, ${oc.env:B}, c]``
    - Dict examples: ``{}``, ``{a: 1, b: 2}``, ``{a: ${a}, b: ${oc.env:B}}``

Their corresponding parser rules are:

    .. code-block:: antlr

        listContainer: BRACKET_OPEN sequence? BRACKET_CLOSE;
        dictContainer: BRACE_OPEN
                       (dictKeyValuePair (COMMA dictKeyValuePair)*)?
                       BRACE_CLOSE;

Regarding dictionaries, note that although values can be any ``element``, keys are more
restricted, and in particular quoted strings and interpolations are currently *not* allowed as
dictionary keys (see the definition of ``dictKey`` in the `grammar <https://github.com/omry/omegaconf/blob/master/omegaconf/grammar/OmegaConfGrammarParser.g4>`_).

Finally, a ``primitive`` is everything else that is allowed, including in particular (see the `full grammar <https://github.com/omry/omegaconf/blob/master/omegaconf/grammar/OmegaConfGrammarParser.g4>`_
for details):

    - Unquoted strings (that support only a subset of characters, contrary to quoted ones): ``foo``, ``foo_bar``, ``hello world 123``
    - Integer numbers: ``123``, ``-5``, ``+1_000_000``
    - Floating point numbers (with special case-independent keywords for infinity and NaN): ``0.1``, ``1e-3``, ``inf``, ``-INF``, ``nan``
    - Other special keywords (also case-independent): ``null``, ``true``, ``false``, ``NULL``, ``True``, ``fAlSe``.
      **IMPORTANT**: ``None`` is *not* a special keyword and will be parsed as an unquoted string, you must
      use the ``null`` keyword instead (as in YAML).
    - Interpolations (thus allowing for nested interpolations)


Escaped characters
^^^^^^^^^^^^^^^^^^

Some characters need to be escaped, with varying escaping requirements depending on the situation.
In general, however, you can use the following rule of thumb:
*you only need to escape characters that otherwise have a special meaning in the current context*.

.. _escaping-in-interpolation-strings:

Escaping in interpolation strings
+++++++++++++++++++++++++++++++++

In order to define fields whose value is an interpolation-like string, interpolations can be escaped with ``\${``.
For instance:

.. doctest::

    >>> c = OmegaConf.create({"path": r"\${dir}", "dir": "tmp"})
    >>> print(c.path)  # does *not* interpolate into the `dir` node
    ${dir}

If you actually want to follow a ``\`` with a resolved interpolation, this backslash
needs to be escaped into ``\\`` to differentiate it from an escaped interpolation:

.. doctest::

    >>> c = OmegaConf.create({"path": r"C:\\${dir}", "dir": "tmp"})
    >>> print(c.path)  # *does* interpolate into the `dir` node
    C:\tmp

Note that we use Python raw strings here to make code
more readable -- otherwise all ``\`` characters would need be duplicated due to how Python handles
escaping in regular string literals.

Finally, since the ``\`` character has no special meaning unless followed by ``${``,
it does *not* need to be escaped anywhere else:

.. doctest::

    >>> c = OmegaConf.create({"path": r"C:\foo_${dir}", "dir": "tmp"})
    >>> print(c.path)  # a single \ is preserved...
    C:\foo_tmp
    >>> c = OmegaConf.create({"path": r"C:\\foo_${dir}", "dir": "tmp"})
    >>> print(c.path)  # ... and multiple \\ too (no escape sequence)
    C:\\foo_tmp

Escaping in unquoted strings
++++++++++++++++++++++++++++

Unquoted strings can be found in a number of contexts, including dictionary keys/values,
list elements, etc. As a result, the  escape sequences are used for some
special characters
(``\\``, ``\[``, ``\]``, ``\{``, ``\}``, ``\(``, ``\)``, ``\:``, ``\=``, ``\,``),
for instance:

    - ``C\:\\$\{dir\}`` resolves to the string ``"C:\${dir}"``
    - ``\[a\, b\, c\]`` resolves to the string ``"[a, b, c]"``

In addition, leading and trailing whitespaces must be escaped in unquoted strings
if we do not want them to be stripped (while inner whitespaces are always preserved):

.. doctest::

    >>> c = OmegaConf.create({"esc": r"${oc.decode: \ hi u \  }"})
    >>> c.esc  # one leading whitespace and two trailing ones
    ' hi u  '
    >>> # Tabs are handled similarly (NB: r-strings can't be used below)
    >>> c = OmegaConf.create({"esc": "${oc.decode:\t\\\thi u\t\\\t\t}"})
    >>> c.esc  # one leading tab and two trailing ones
    '\thi u\t\t'

Escaping in unquoted strings can lead to hard-to-read expressions, and it is recommended
to switch to quoted strings instead of relying heavily on the above escape sequences.

Escaping in quoted strings
++++++++++++++++++++++++++

As can be seen from the definition of the ``quotedValue`` parser rule above, quoted strings
are just ``text`` fragments surrounded by quotes, and are thus very similar to :ref:`interpolation-strings`.
As a result, the ``\${`` escape sequence can also be used to escape interpolations
in quoted strings (as described in :ref:`escaping-in-interpolation-strings`):

    - ``"\${dir}"`` resolves to the string ``"${dir}"``
    - ``"C:\\${dir}"`` resolves to the string ``"C:\<value of dir>"``

However, one key difference with interpolation strings is that quotes of the same type
as the enclosing quotes must be escaped, unless they are within a nested interpolation.
For instance:

    - ``'\'Hi you\', I said'`` resolves to the string ``"'Hi you', I said"``
    - ``"'Hi ${concat: 'y', "o", u}', I said"`` also resolves to the string ``"'Hi you', I said"``
      if ``concat`` is a :doc:`custom resolver<custom_resolvers>` concatenating its inputs.
      The main point to pay attention to in this example is that the quoted strings ``'y'`` and
      ``"o"`` found within the resolver interpolation ``${concat: ...}`` do *not* need to be
      escaped, regardless of existing quotes outside of this interpolation.