File: gpu-dataframes.rst

package info (click to toggle)
python-streamz 0.6.4-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 824 kB
  • sloc: python: 6,714; makefile: 18; sh: 18
file content (45 lines) | stat: -rw-r--r-- 1,595 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Streaming GPU DataFrames (cudf)
-------------------------------

The ``streamz.dataframe`` module provides a DataFrame-like interface
on streaming data as described in the ``dataframes`` documentation. It
provides support for dataframe-like libraries such as pandas and
cudf. This documentation is specific to streaming GPU dataframes using
cudf.

The example in the ``dataframes`` documentation is rewritten below
using cudf dataframes just by replacing the ``pandas`` module with
``cudf``:

.. code-block:: python

   import cudf
   from streamz.dataframe import DataFrame

   example = cudf.DataFrame({'name': [], 'amount': []})
   sdf = DataFrame(stream, example=example)

   sdf[sdf.name == 'Alice'].amount.sum()


Supported Operations
--------------------

Streaming cudf dataframes support the following classes of operations:

-  Elementwise operations like ``df.x + 1``
-  Filtering like ``df[df.name == 'Alice']``
-  Column addition like ``df['z'] = df.x + df.y``
-  Reductions like ``df.amount.mean()``
-  Windowed aggregations (fixed length) like ``df.window(n=100).amount.sum()``

The following operations are not yet supported with cudf (as of version 0.8):

-  Groupby-aggregations like ``df.groupby(df.name).amount.mean()``
-  Windowed aggregations (index valued) like ``df.window(value='2h').amount.sum()``
-  Windowed groupby aggregations like ``df.window(value='2h').groupby('name').amount.sum()``


Window-based Aggregations with cudf are supported just as explained in
the ``dataframes`` documentation.  Support for groupby operations is
expected to be added in the future.