File: gbytes_marshaller_tutorial.rst

package info (click to toggle)
pygobject 3.54.5-7
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 5,864 kB
  • sloc: ansic: 40,281; python: 26,363; sh: 477; makefile: 81; xml: 35; cpp: 1
file content (191 lines) | stat: -rw-r--r-- 8,986 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
========================================
Tutorial - Adding a GBytes Marshaler
========================================

This tutorial is based on a request for adding a Python bytes to GBytes argument
marshaler for introspected functions (`GnomeBug 729541 <http://bugzilla.gnome.org/show_bug.cgi?id=729541>`_).

Working with GI Interactively
=============================

To get started, we can inspect a GI function which takes a GBytes argument interactively
to determine what the argument is in terms of GI (ipython is very helpful here):

PyGI exposes GI functions as custom callable objects which also implement the
`GIFunctionInfo API <https://docs.gtk.org/girepository/class.FunctionInfo.html>`_ and its base classes.

.. code-block:: python

    >>> from gi.repository import GLib
    >>> Gio.MemoryInputStream.new_from_bytes
    gi.FunctionInfo(new_from_bytes)

Get the GIArgInfo:

.. code-block:: python

    >>> Gio.MemoryInputStream.new_from_bytes.get_arguments()
    (gi.ArgInfo(bytes),)
    >>> arg, = _
    >>> arg
    gi.ArgInfo(bytes)

Determine argument type using:

.. code-block:: python

    >>> ty = arg.get_type()
    >>> ty
    gi.TypeInfo(type_type_instance)
    >>> ty.get_tag_as_string()
    'interface'

At this point we know the argument type tag is an "interface" or
`GI_TYPE_TAG_INTERFACE <https://docs.gtk.org/girepository/enum.TypeTag.html>`_
so `g_type_info_get_interface <https://docs.gtk.org/girepository/method.TypeInfo.get_interface.html>`_
should be valid for the `GITypeInfo <https://docs.gtk.org/girepository/class.TypeInfo.html>`_.

.. code-block:: python

    >>> iface = ty.get_interface()
    StructInfo(Bytes)

In this case get_interface() is giving us a
`GIStructInfo <https://docs.gtk.org/girepository/class.StructInfo.html>`_.
We can then verify a valid GType is available using `gi_registered_type_info_get_g_type() <https://docs.gtk.org/girepository/method.RegisteredTypeInfo.get_g_type.html>`_
(GIRegisteredTypeInfo is a base class of GIStructInfo):

.. code-block:: python

    >>> gtype = iface.get_g_type()
    >>> gtype
    <GType GBytes (12737104)>

We also have to find the fundamental GType for GBytes:

.. code-block:: python

    >>> gtype.fundamental
    <GType GBoxed (72)>

Mapping out the C Code
======================

We now have enough information to correlate to the various switch statements in
the PyGI caching system which will help us place our new marshaling code. Starting
with `pygi-cache.c:pygi_arg_cache_new <https://git.gnome.org/browse/pygobject/tree/gi/pygi-cache.c?id=3.13.1#n345>`_
you can trace through `_arg_cache_new_for_interface <https://git.gnome.org/browse/pygobject/tree/gi/pygi-cache.c?id=3.13.1#n291>`_
and finally land in `pygi-struct-marshal.c:pygi_arg_struct_new_from_info <https://git.gnome.org/browse/pygobject/tree/gi/pygi-struct-marshal.c?id=3.13.1#n489>`_.

For this bug, we are looking to add a "from py" marshaling convenience.
So we could add a new conditional in `_pygi_marshal_from_py_interface_struct <https://git.gnome.org/browse/pygobject/tree/gi/pygi-struct-marshal.c?id=3.13.1#n191>`_ within the G_TYPE_BOXED conditional. However, note this text in the function:

.. code-block:: c

    /* FIXME: handle this large if statement in the cache
     *        and set the correct marshaller
     */

What this means is _pygi_marshal_from_py_interface_struct is actually dispatching
to sub-types of GIStructInfo at runtime for every argument. Not very ideal considering
we have this whole caching system for marshaling arguments.

Instead what we should really do is create a new from_py_marshaller and from_py_cleanup
callback pair specifically for GBytes arguments which are baked in at cache setup time.
Essentially specializing GBytes as early as possible in
`_arg_cache_from_py_interface_struct_setup <https://git.gnome.org/browse/pygobject/tree/gi/pygi-struct-marshal.c?id=3.13.1#n434>`_
by setting arg_cache->from_py_marshaller and arg_cache->from_py_cleanup.

Marshaler Callbacks
===================

Relevant marshaler callbacks are declared in `pygi-cache.h <https://git.gnome.org/browse/pygobject/tree/gi/pygi-cache.h?id=3.13.1#n35>`_
and we need an implementation of both PyGIMarshalFromPyFunc and PyGIMarshalCleanupFunc.

.. code-block:: c

    typedef gboolean (*PyGIMarshalFromPyFunc) (PyGIInvokeState   *state,
                                               PyGICallableCache *callable_cache,
                                               PyGIArgCache      *arg_cache,
                                               PyObject          *py_arg,
                                               GIArgument        *arg,
                                               gpointer          *cleanup_data);

    typedef void (*PyGIMarshalCleanupFunc) (PyGIInvokeState *state,
                                            PyGIArgCache    *arg_cache,
                                            PyObject        *py_arg, /* always NULL for to_py cleanup */
                                            gpointer         data,
                                            gboolean         was_processed);

PyGIMarshalFromPyFunc is called for each argument prior to executing the callee, the relevant bits are as follows:

* py_arg - This is the input PyObject the Python caller is passing to the GI function.
  We need to type check this and do a mini dispatch depending on the type
  (PyBytes or buffer protocol check, PyGIBoxed, and Py_None).
* arg - This is the target memory area marshaler will fill out. In this case arg->v_pointer
  will be assigned a pointer to a GBytes object.
* arg_cache->allow_none - If TRUE, py_arg can be Py_None and arg->v_pointer should be set to
  NULL, returning TRUE from the marshaling callback.
* arg_cache->transfer - Determines how memory should be managed for the argument.
* cleanup_data - This is an output argument that can be set to custom data which passed back
  to us in the cleanup callback as "data", used for freeing relevant memory after the callee
  returns. In our case this will either be NULL or a GBytes pointer, in which case we should
  call g_bytes_unref() on the data.

PyGIMarshalCleanupFunc is called after the callee finishes and to cleanup any temporary data
we created while the callee was running.

Transfer Semantics
==================

A py_arg input of type PyGIBoxed is a direct wrapping of an existing GBoxed. This is a fairly
simple case to deal with, we just need to extract the boxed pointer (pyg_boxed_get) and assign
it to arg->v_pointer. For GI_TRANSFER_EVERYTHING we also need to add a reference the callee can
own by calling g_bytes_ref on this pointer.

In the case where we are passed a PyBytes (or Python object implementing the buffer protocol),
we need to create a new GBytes which holds a pointer to the PyBytes data. Zero copy can easily
be achieved when transfer is GI_TRANSFER_NOTHING because a read-only buffer can be retrieved
from Python and passed to the GBytes constructor (without a free_func). We know the lifetime
of the PyBytes is valid at least until the callee completes. The trick here is we also need to
set *cleanup_data* to the newly created GBytes so our cleanup callback can free the GBytes.
Since we didn't set a free_func when constructing the GBytes, calling g_bytes_unref will not
touch our Python owned data.

For converting a PyBytes with transfer mode as GI_TRANSFER_EVERYTHING, we basically follow
the same as above with some extra tricks. Since the callee is intending to own the GBytes
we pass it, we must pass it something which is guaranteed to survive after our Python function
returns (must exist after our cleanup callback). The easiest technique here is to memcpy the result
of the PyBytes data and construct a GBytes using g_bytes_new_with_free_func, with a free_func of
g_free and user_data of the bytes (no need for setting cleanup_data because the C callee owns everything).

However, it is possible to achieve zero copy with PyBytes and GI_TRANSFER_EVERYTHING by creating a custom
free_func which calls Py_DECREF. However, this free_func must wrap any Python API calls with
PyGILState_Ensure/Release pairs:

.. code-block:: c

    void
    threaded_py_bytes_free (PyObject *py_bytes)
    {
        PyGILState_STATE state = PyGILState_Ensure ();
        Py_DECREF (py_bytes);
        PyGILState_Release (state);
    }

    gboolean marshal (...)
    {
        /* ... py_arg type and transfer checks ... */
        char *buf = NULL;
        Py_ssize_t length;
        PyBytes_AsStringAndSize (py_arg, &buf, &length);
        arg->v_pointer = g_bytes_new_with_free_func (buf, length, threaded_py_bytes_free, py_arg);
        *cleanup_data = NULL;
        return True;

The above zero copy implementation could also possibly be implemented using memoryviews for accessing a Py_buffer instead of requiring a PyBytes type as input.

Marshaler Implementation
========================

This section is left up to the reader as an exercise, remember to write tests!