File: README.rst

package info (click to toggle)
python-gjson 1.1.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 348 kB
  • sloc: python: 2,034; makefile: 20
file content (307 lines) | stat: -rw-r--r-- 16,506 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
.. image:: https://github.com/volans-/gjson-py/actions/workflows/run-tox.yaml/badge.svg
   :alt: CI results
   :target: https://github.com/volans-/gjson-py/actions/workflows/run-tox.yaml

Introduction
============

gjson-py is a Python package that provides a simple way to filter and extract data from JSON-like objects or JSON
files, using the `GJSON`_ syntax.

It is, compatibly with the language differences and with some limitation, the Python equivalent of the Go
`GJSON`_ package.
The main difference from GJSON is that gjson-py doesn't work directly with JSON strings but instead with
JSON-like Python objects, that can either be the resulting object when calling ``json.load()`` or ``json.loads()``,
or any Python object that is JSON-serializable.

A detailed list of the GJSON features supported by gjson-py is provided below.

See also the full `gjson-py documentation`_.

Installation
------------

gjson-py is available on the `Python Package Index`_ (PyPI) and can be easily installed with::

    pip install gjson

It's also available as a Debian package (`python3-gjson`_) on Debian systems starting from Debian 12 (*bookworm*) and
can be installed with::

    apt-get install python3-gjson

A ``.deb`` package for the current stable and unstable Debian versions is also available for download on the
`releases page on GitHub`_.

How to use the library
----------------------

gjson-py provides different ways to perform queries on JSON-like objects.

``gjson.get()``
^^^^^^^^^^^^^^^

A quick accessor to GJSON functionalities exposed for simplicity of use. Particularly useful to perform a single
query on a given object::

    >>> import gjson
    >>> data = {'name': {'first': 'Tom', 'last': 'Anderson'}, 'age': 37}
    >>> gjson.get(data, 'name.first')
    'Tom'

It's also possible to make it return a JSON-encoded string and decide on failure if it should raise an exception
or return `None`. See the full API documentation for more details.

``GJSON`` class
^^^^^^^^^^^^^^^

The ``GJSON`` class provides full access to the gjson-py API allowing to perform multiple queries on the same object::

    >>> import gjson
    >>> data = {'name': {'first': 'Tom', 'last': 'Anderson'}, 'age': 37}
    >>> source = gjson.GJSON(data)
    >>> source.get('name.first')
    'Tom'
    >>> str(source)
    '{"name": {"first": "Tom", "last": "Anderson"}, "age": 37}'
    >>> source.getj('name.first')
    '"Tom"'
    >>> name = source.get_gjson('name')
    >>> name.get('first')
    'Tom'
    >>> name
    <gjson.GJSON object at 0x102735b20>

See the full API documentation for more details.

How to use the CLI
------------------

gjson-py provides also a command line interface (CLI) for ease of use:

.. code-block:: console

    $ echo '{"name": {"first": "Tom", "last": "Anderson"}, "age": 37}' > test.json
    $ cat test.json | gjson 'name.first'  # Read from stdin
    "Tom"
    $ gjson test.json 'age'  # Read from a file
    37
    $ cat test.json | gjson - 'name.first'  # Explicitely read from stdin
    "Tom"

JSON Lines
^^^^^^^^^^

JSON Lines support in the CLI allows for different use cases. All the examples in this section operates on a
``test.json`` file generated with:

.. code-block:: console

    $ echo -e '{"name": "Gilbert", "age": 61}\n{"name": "Alexa", "age": 34}\n{"name": "May", "age": 57}' > test.json

Apply the same query to each line
"""""""""""""""""""""""""""""""""

Using the ``-l/--lines`` CLI argument, for each input line gjson-py applies the query and filters the data according
to it. Lines are read one by one so there is no memory overhead for the processing. It can be used while tailing log
files in JSON format for example.


.. code-block:: console

    $ gjson --lines test.json 'age'
    61
    34
    57
    $ tail -f log.json | gjson --lines 'bytes_sent'  # Dummy example

Encapsulate all lines in an array, then apply the query
"""""""""""""""""""""""""""""""""""""""""""""""""""""""

Using the special query prefix syntax ``..``, as described in GJSON's documentation for `JSON Lines`_, gjson-py will
read all lines from the input and encapsulate them into an array. This approach has of course the memory overhead of
loading the whole input to perform the query.

.. code-block:: console

    $ gjson test.json '..#.name'
    ["Gilbert", "Alexa", "May"]

Filter lines based on their values
""""""""""""""""""""""""""""""""""

Combining the ``-l/--lines`` CLI argument with the special query prefix ``..`` described above, it's possible to filter
input lines based on their values. In this case gjson-py encapsulates each line in an array so that is possible to use
the `Queries`_ GJSON syntax to filter them. As the ecapsulation is performed on each line, there is no memory overhead.
Because technically when a line is filtered is because there was no match on the whole line query, the final exit code,
if any line is filtered, will be ``1``.

.. code-block:: console

    $ gjson --lines test.json '..#(age>40).name'
    "Gilbert"
    "May"

Filter lines and apply query to the result
""""""""""""""""""""""""""""""""""""""""""

Combining the methods above is possible for example to filter/extract data from the lines first and then apply a query
to the aggregated result. The memory overhead in this case is based on the amount of data resulting from the first
filtering/extraction.

.. code-block:: console

    $ gjson --lines test.json 'age' | gjson '..@sort'
    [34, 57, 61]
    $ gjson --lines test.json '..#(age>40).age' | gjson '..@sort'
    [57, 61]

Query syntax
------------

For the generic query syntax refer to the original `GJSON Path Syntax`_ documentation.

Supported GJSON features
^^^^^^^^^^^^^^^^^^^^^^^^

This is the list of GJSON features and how they are supported by gjson-py:


+------------------------+------------------------+------------------------------------------------------+
| GJSON feature          | Supported by gjson-py  | Notes                                                |
+========================+========================+======================================================+
| `Path Structure`_      | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Basic`_               | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Wildcards`_           | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Escape Character`_    | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Arrays`_              | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Queries`_             | YES                    | Using Python's operators [#]_ [#]_                   |
+------------------------+------------------------+------------------------------------------------------+
| `Dot vs Pipe`_         | YES                    |                                                      |
+------------------------+------------------------+------------------------------------------------------+
| `Modifiers`_           | YES                    | See the table below for all the details              |
+------------------------+------------------------+------------------------------------------------------+
| `Modifier arguments`_  | YES                    | Only a JSON object is accepted as argument           |
+------------------------+------------------------+------------------------------------------------------+
| `Custom modifiers`_    | YES                    | Only a JSON object is accepted as argument [#]_      |
+------------------------+------------------------+------------------------------------------------------+
| `Multipaths`_          | YES                    | Object keys, if specified, must be JSON strings [#]_ |
+------------------------+------------------------+------------------------------------------------------+
| `Literals`_            | YES                    | Including infinite and NaN values [#]_               |
+------------------------+------------------------+------------------------------------------------------+
| `JSON Lines`_          | YES                    | CLI support [#]_ [#]_                                |
+------------------------+------------------------+------------------------------------------------------+

.. [#] The queries matching is based on Python's operator and as such the results might be different than the ones from
   the Go GJSON package. In particular for the ``~`` operator that checks the truthy-ness of objects.
.. [#] When using nested queries, only the outermost one controls whether to return only the first item or all items.
.. [#] Custom modifiers names cannot contain reserved characters used by the GJSON grammar.
.. [#] For example ``{"years":age}`` is valid while ``{years:age}`` is not, although that's valid in GJSON.
.. [#] Those special cases are handled according to `Python's JSON documentation`_.
.. [#] Both for applying the same query to each line using the ``-l/--lines`` argument and to automatically encapsulate
   the input lines in a list and apply the query to the list using the ``..`` special query prefix described in
   `JSON Lines`_.
.. [#] Library support is not currently present because gjson-py accepts only Python objects, making it impossible to
   pass JSON Lines directly. The client is free to choose if calling gjson-py for each line or to encapsulate them in
   a list before calling gjson-py.

This is the list of modifiers present in GJSON and how they are supported by gjson-py:

+----------------+-----------------------+------------------------------------------+
| GJSON Modifier | Supported by gjson-py | Notes                                    |
+----------------+-----------------------+------------------------------------------+
| ``@reverse``   | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@ugly``      | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@pretty``    | PARTIALLY             | The ``width`` argument is not supported  |
+----------------+-----------------------+------------------------------------------+
| ``@this``      | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@valid``     | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@flatten``   | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@join``      | PARTIALLY             | Preserving duplicate keys not supported  |
+----------------+-----------------------+------------------------------------------+
| ``@keys``      | YES                   | Valid only on JSON objects (mappings)    |
+----------------+-----------------------+------------------------------------------+
| ``@values``    | YES                   | Valid only on JSON objects (mappings)    |
+----------------+-----------------------+------------------------------------------+
| ``@tostr``     | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@fromstr``   | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+
| ``@group``     | YES                   |                                          |
+----------------+-----------------------+------------------------------------------+


Additional features
^^^^^^^^^^^^^^^^^^^


Additional modifiers
""""""""""""""""""""

This is the list of additional modifiers specific to gjson-py not present in GJSON:

* ``@ascii``: escapes all non-ASCII characters when printing/returning the string representation of the object,
  ensuring that the output is made only of ASCII characters. It's implemented using the ``ensure_ascii`` arguments in
  the Python's ``json`` module. This modifier doesn't accept any arguments.
* ``@sort``: sorts a mapping object by its keys or a sequence object by its values. This modifier doesn't accept any
  arguments.
* ``@top_n``: given a sequence object groups the items in the sequence counting how many occurrences of each value are
  present. It returns a mapping object where the keys are the distinct values of the list and the values are the number
  of times the key was present in the list, ordered from the most common to the least common item. The items in the
  original sequence object must be Python hashable. This modifier accepts an optional argument ``n`` to return just the
  N items with the higher counts. When the ``n`` argument is not provided all items are returned. Example usage:

  .. code-block:: console

    $ echo '["a", "b", "c", "b", "c", "c"]' | gjson '@top_n'
    {"c": 3, "b": 2, "a": 1}
    $ echo '["a", "b", "c", "b", "c", "c"]' | gjson '@top_n:{"n":2}'
    {"c": 3, "b": 2}

* ``@sum_n``: given a sequence of objects, groups the items in the sequence using a grouping key and sum the values of a
  sum key provided. It returns a mapping object where the keys are the distinct values of the grouping key and the
  values are the sums of all the values of the sum key for each distinct grouped key, ordered from the highest sum to
  the lowest. The values of the grouping key must be Python hashable. The values of the sum key must be integers or
  floats. This modifier required two mandatory arguments, ``group`` and ``sum`` that have as values the respective keys
  in the objects of the sequence. An optional ``n`` argument is also accepted to return just the top N items with the
  highest sum. Example usage:

  .. code-block:: console

    $ echo '[{"key": "a", "time": 1}, {"key": "b", "time": 2}, {"key": "c", "time": 3}, {"key": "a", "time": 4}]' > test.json
    $ gjson test.json '@sum_n:{"group": "key", "sum": "time"}'
    {"a": 5, "c": 3, "b": 2}
    $ gjson test.json '@sum_n:{"group": "key", "sum": "time", "n": 2}'
    {"a": 5, "c": 3}

.. _`GJSON`: https://github.com/tidwall/gjson
.. _`Python Package Index`: https://pypi.org/project/gjson/
.. _`GJSON Path Syntax`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md
.. _`gjson-py documentation`: https://volans-.github.io/gjson-py/index.html
.. _`releases page on GitHub`: https://github.com/volans-/gjson-py/releases
.. _`Python's JSON documentation`: https://docs.python.org/3/library/json.html#infinite-and-nan-number-values
.. _`python3-gjson`: https://packages.debian.org/sid/python3-gjson

.. _`Path Structure`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#path-structure
.. _`Basic`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#basic
.. _`Wildcards`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#wildcards
.. _`Escape Character`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#escape-character
.. _`Arrays`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#arrays
.. _`Queries`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#queries
.. _`Dot vs Pipe`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#dot-vs-pipe
.. _`Modifiers`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#modifiers
.. _`Modifier arguments`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#modifiers
.. _`Custom modifiers`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#custom-modifiers
.. _`Multipaths`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#multipaths
.. _`Literals`: https://github.com/tidwall/gjson/blob/master/SYNTAX.md#literals
.. _`JSON Lines`: https://github.com/tidwall/gjson#json-lines