File: graph_fluent.rst

package info (click to toggle)
python-cassandra-driver 3.29.2-5
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 5,144 kB
  • sloc: python: 51,532; ansic: 768; makefile: 136; sh: 13
file content (415 lines) | stat: -rw-r--r-- 15,696 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
DataStax Graph Fluent API
=========================

The fluent API adds graph features to the core driver:

* A TinkerPop GraphTraversalSource builder to execute traversals on a DSE cluster
* The ability to execution traversal queries explicitly using execute_graph
* GraphSON serializers for all DSE Graph types.
* DSE Search predicates

The Graph fluent API depends on Apache TinkerPop and is not installed by default. Make sure
you have the Graph requirements are properly :ref:`installed <installation-datastax-graph>`.

You might be interested in reading the :doc:`DataStax Graph Getting Started documentation <graph>` to
understand the basics of creating a graph and its schema.

Graph Traversal Queries
~~~~~~~~~~~~~~~~~~~~~~~

The driver provides :meth:`.Session.execute_graph`, which allows users to execute traversal
query strings. Here is a simple example::

    session.execute_graph("g.addV('genre').property('genreId', 1).property('name', 'Action').next();")

Since graph queries can be very complex, working with strings is not very convenient and is
hard to maintain. This fluent API allows you to build Gremlin traversals and write your graph
queries directly in Python. These native traversal queries can be executed explicitly, with
a `Session` object, or implicitly::

    from cassandra.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT
    from cassandra.datastax.graph import GraphProtocol
    from cassandra.datastax.graph.fluent import DseGraph

    # Create an execution profile, using GraphSON3 for Core graphs
    ep_graphson3 = DseGraph.create_execution_profile(
        'my_core_graph_name',
        graph_protocol=GraphProtocol.GRAPHSON_3_0)
    cluster = Cluster(execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: ep_graphson3})
    session = cluster.connect()

    # Execute a fluent graph query
    g = DseGraph.traversal_source(session=session)
    g.addV('genre').property('genreId', 1).property('name', 'Action').next()

    # implicit execution caused by iterating over results
    for v in g.V().has('genre', 'name', 'Drama').in_('belongsTo').valueMap():
        print(v)

These :ref:`Python types <graph-types>` are also supported transparently::

    g.addV('person').property('name', 'Mike').property('birthday', datetime(1984, 3, 11)). \
        property('house_yard', Polygon(((30, 10), (40, 40), (20, 40), (10, 20), (30, 10)))

More readings about Gremlin:

* `DataStax Drivers Fluent API <https://www.datastax.com/dev/blog/datastax-drivers-fluent-apis-for-dse-graph-are-out>`_
* `gremlin-python documentation <http://tinkerpop.apache.org/docs/current/reference/#gremlin-python>`_

Configuring a Traversal Execution Profile
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The fluent api takes advantage of *configuration profiles* to allow
different execution configurations for the various query handlers. Graph traversal
execution requires a custom execution profile to enable Gremlin-bytecode as
query language. With Core graphs, it is important to use GraphSON3. Here is how
to accomplish this configuration:

.. code-block:: python

    from cassandra.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT
    from cassandra.datastax.graph import GraphProtocol
    from cassandra.datastax.graph.fluent import DseGraph

    # Using GraphSON3 as graph protocol is a requirement with Core graphs.
    ep = DseGraph.create_execution_profile(
        'graph_name',
        graph_protocol=GraphProtocol.GRAPHSON_3_0)

    # For Classic graphs, GraphSON1, GraphSON2 and GraphSON3 (DSE 6.8+) are supported.
    ep_classic = DseGraph.create_execution_profile('classic_graph_name')  # default is GraphSON2

    cluster = Cluster(execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: ep, 'classic': ep_classic})
    session = cluster.connect()

    g = DseGraph.traversal_source(session)  # Build the GraphTraversalSource
    print(g.V().toList())  # Traverse the Graph

Note that the execution profile created with :meth:`DseGraph.create_execution_profile <.datastax.graph.fluent.DseGraph.create_execution_profile>` cannot
be used for any groovy string queries.

If you want to change execution property defaults, please see the :doc:`Execution Profile documentation <execution_profiles>`
for a more generalized discussion of the API. Graph traversal queries use the same execution profile defined for DSE graph. If you
need to change the default properties, please refer to the :doc:`DSE Graph query documentation page <graph>`

Explicit Graph Traversal Execution with a DSE Session
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Traversal queries can be executed explicitly using `session.execute_graph` or `session.execute_graph_async`. These functions
return results as DSE graph types. If you are familiar with DSE queries or need async execution, you might prefer that way.
Below is an example of explicit execution. For this example, assume the schema has been generated as above:

.. code-block:: python

    from cassandra.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT
    from cassandra.datastax.graph import GraphProtocol
    from cassandra.datastax.graph.fluent import DseGraph
    from pprint import pprint

    ep = DseGraph.create_execution_profile(
        'graph_name',
        graph_protocol=GraphProtocol.GRAPHSON_3_0)
    cluster = Cluster(execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: ep})
    session = cluster.connect()

    g = DseGraph.traversal_source(session=session)

Convert a traversal to a bytecode query for classic graphs::

    addV_query = DseGraph.query_from_traversal(
        g.addV('genre').property('genreId', 1).property('name', 'Action'),
        graph_protocol=GraphProtocol.GRAPHSON_3_0
    )
    v_query = DseGraph.query_from_traversal(
        g.V(),
        graph_protocol=GraphProtocol.GRAPHSON_3_0)

    for result in session.execute_graph(addV_query):
        pprint(result.value)
    for result in session.execute_graph(v_query):
        pprint(result.value)

Converting a traversal to a bytecode query for core graphs require some more work, because we
need the cluster context for UDT and tuple types:

.. code-block:: python
    context = {
        'cluster': cluster,
        'graph_name': 'the_graph_for_the_query'
    }
    addV_query = DseGraph.query_from_traversal(
        g.addV('genre').property('genreId', 1).property('name', 'Action'),
        graph_protocol=GraphProtocol.GRAPHSON_3_0,
        context=context
    )

    for result in session.execute_graph(addV_query):
        pprint(result.value)

Implicit Graph Traversal Execution with TinkerPop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Using the :class:`DseGraph <.datastax.graph.fluent.DseGraph>` class, you can build a GraphTraversalSource
that will execute queries on a DSE session without explicitly passing anything to
that session. We call this *implicit execution* because the `Session` is not
explicitly involved. Everything is managed internally by TinkerPop while
traversing the graph and the results are TinkerPop types as well.

Synchronous Example
-------------------

.. code-block:: python

    # Build the GraphTraversalSource
    g = DseGraph.traversal_source(session)
    # implicitly execute the query by traversing the TraversalSource
    g.addV('genre').property('genreId', 1).property('name', 'Action').next()

    # blocks until the query is completed and return the results
    results = g.V().toList()
    pprint(results)

Asynchronous Exemple
--------------------

You can execute a graph traversal query asynchronously by using `.promise()`. It returns a
python `Future <https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future>`_.

.. code-block:: python

    # Build the GraphTraversalSource
    g = DseGraph.traversal_source(session)
    # implicitly execute the query by traversing the TraversalSource
    g.addV('genre').property('genreId', 1).property('name', 'Action').next()  # not async

    # get a future and wait
    future = g.V().promise()
    results = list(future.result())
    pprint(results)

    # or set a callback
    def cb(f):
        results = list(f.result())
        pprint(results)
    future = g.V().promise()
    future.add_done_callback(cb)
    # do other stuff...

Specify the Execution Profile explicitly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you don't want to change the default graph execution profile (`EXEC_PROFILE_GRAPH_DEFAULT`), you can register a new
one as usual and use it explicitly. Here is an example:

.. code-block:: python

    from cassandra.cluster import Cluster
    from cassandra.datastax.graph.fluent import DseGraph

    cluster = Cluster()
    ep = DseGraph.create_execution_profile('graph_name', graph_protocol=GraphProtocol.GRAPHSON_3_0)
    cluster.add_execution_profile('graph_traversal', ep)
    session = cluster.connect()

    g = DseGraph.traversal_source()
    query = DseGraph.query_from_traversal(g.V())
    session.execute_graph(query, execution_profile='graph_traversal')

You can also create multiple GraphTraversalSources and use them with
the same execution profile (for different graphs):

.. code-block:: python

    g_movies = DseGraph.traversal_source(session, graph_name='movies', ep)
    g_series = DseGraph.traversal_source(session, graph_name='series', ep)

    print(g_movies.V().toList())  # Traverse the movies Graph
    print(g_series.V().toList())  # Traverse the series Graph

Batch Queries
~~~~~~~~~~~~~

DSE Graph supports batch queries using a :class:`TraversalBatch <.datastax.graph.fluent.query.TraversalBatch>` object
instantiated with :meth:`DseGraph.batch <.datastax.graph.fluent.DseGraph.batch>`. A :class:`TraversalBatch <.datastax.graph.fluent.query.TraversalBatch>` allows
you to execute multiple graph traversals in a single atomic transaction. A
traversal batch is executed with :meth:`.Session.execute_graph` or using
:meth:`TraversalBatch.execute <.datastax.graph.fluent.query.TraversalBatch.execute>` if bounded to a DSE session.

Either way you choose to execute the traversal batch, you need to configure
the execution profile accordingly. Here is a example::

    from cassandra.cluster import Cluster
    from cassandra.datastax.graph.fluent import DseGraph

    ep = DseGraph.create_execution_profile(
        'graph_name',
        graph_protocol=GraphProtocol.GRAPHSON_3_0)
    cluster = Cluster(execution_profiles={'graphson3': ep})
    session = cluster.connect()

    g = DseGraph.traversal_source()

To execute the batch using :meth:`.Session.execute_graph`, you need to convert
the batch to a GraphStatement::

    batch = DseGraph.batch()

    batch.add(
        g.addV('genre').property('genreId', 1).property('name', 'Action'))
    batch.add(
        g.addV('genre').property('genreId', 2).property('name', 'Drama'))  # Don't use `.next()` with a batch

    graph_statement = batch.as_graph_statement(graph_protocol=GraphProtocol.GRAPHSON_3_0)
    graph_statement.is_idempotent = True  # configure any Statement parameters if needed...
    session.execute_graph(graph_statement, execution_profile='graphson3')

To execute the batch using :meth:`TraversalBatch.execute <.datastax.graph.fluent.query.TraversalBatch.execute>`, you need to bound the batch to a DSE session::

    batch = DseGraph.batch(session, 'graphson3')  # bound the session and execution profile

    batch.add(
        g.addV('genre').property('genreId', 1).property('name', 'Action'))
    batch.add(
        g.addV('genre').property('genreId', 2).property('name', 'Drama'))  # Don't use `.next()` with a batch

    batch.execute()

DSL (Domain Specific Languages)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

DSL are very useful to write better domain-specific APIs and avoiding
code duplication. Let's say we have a graph of `People` and we produce
a lot of statistics based on age. All graph traversal queries of our
application would look like::

  g.V().hasLabel("people").has("age", P.gt(21))...


which is not really verbose and quite annoying to repeat in a code base. Let's create a DSL::

  from gremlin_python.process.graph_traversal import GraphTraversal, GraphTraversalSource

  class MyAppTraversal(GraphTraversal):

    def younger_than(self, age):
        return self.has("age", P.lt(age))

    def older_than(self, age):
        return self.has("age", P.gt(age))


  class MyAppTraversalSource(GraphTraversalSource):

    def __init__(self, *args, **kwargs):
        super(MyAppTraversalSource, self).__init__(*args, **kwargs)
        self.graph_traversal = MyAppTraversal

    def people(self):
        return self.get_graph_traversal().V().hasLabel("people")

Now, we can use our DSL that is a lot cleaner::

  from cassandra.datastax.graph.fluent import DseGraph

  # ...
  g = DseGraph.traversal_source(session=session, traversal_class=MyAppTraversalsource)

  g.people().younger_than(21)...
  g.people().older_than(30)...

To see a more complete example of DSL, see the `Python killrvideo DSL app <https://github.com/datastax/graph-examples/tree/master/killrvideo/dsl/python>`_

Search
~~~~~~

DSE Graph can use search indexes that take advantage of DSE Search functionality for
efficient traversal queries. Here are the list of additional search predicates:

Text tokenization:

* :meth:`token <.datastax.graph.fluent.predicates.Search.token>`
* :meth:`token_prefix <.datastax.graph.fluent.predicates.Search.token_prefix>`
* :meth:`token_regex <.datastax.graph.fluent.predicates.Search.token_regex>`
* :meth:`token_fuzzy <.datastax.graph.fluent.predicates.Search.token_fuzzy>`

Text match:

* :meth:`prefix <.datastax.graph.fluent.predicates.Search.prefix>`
* :meth:`regex <.datastax.graph.fluent.predicates.Search.regex>`
* :meth:`fuzzy <.datastax.graph.fluent.predicates.Search.fuzzy>`
* :meth:`phrase <.datastax.graph.fluent.predicates.Search.phrase>`

Geo:

* :meth:`inside <.datastax.graph.fluent.predicates.Geo.inside>`

Create search indexes
---------------------

For text tokenization:

.. code-block:: python


    s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('text_field').asText().add()")

For text match:

.. code-block:: python


    s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('text_field').asString().add()")


For geospatial:

You can create a geospatial index on Point and LineString fields.

.. code-block:: python


    s.execute_graph("schema.vertexLabel('my_vertex_label').index('search').search().by('point_field').add()")


Using search indexes
--------------------

Token:

.. code-block:: python

    from cassandra.datastax.graph.fluent.predicates import Search
    # ...

    g = DseGraph.traversal_source()
    query = DseGraph.query_from_traversal(
        g.V().has('my_vertex_label','text_field', Search.token_regex('Hello.+World')).values('text_field'))
    session.execute_graph(query)

Text:

.. code-block:: python

    from cassandra.datastax.graph.fluent.predicates import Search
    # ...

    g = DseGraph.traversal_source()
    query = DseGraph.query_from_traversal(
        g.V().has('my_vertex_label','text_field', Search.prefix('Hello')).values('text_field'))
    session.execute_graph(query)

Geospatial:

.. code-block:: python

    from cassandra.datastax.graph.fluent.predicates import Geo
    from cassandra.util import Distance
    # ...

    g = DseGraph.traversal_source()
    query = DseGraph.query_from_traversal(
        g.V().has('my_vertex_label','point_field', Geo.inside(Distance(46, 71, 100)).values('point_field'))
    session.execute_graph(query)


For more details, please refer to the official `DSE Search Indexes Documentation <https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/search/searchReference.html>`_