1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
|
Streaming GPU DataFrames (cudf)
-------------------------------
The ``streamz.dataframe`` module provides a DataFrame-like interface
on streaming data as described in the ``dataframes`` documentation. It
provides support for dataframe-like libraries such as pandas and
cudf. This documentation is specific to streaming GPU dataframes using
cudf.
The example in the ``dataframes`` documentation is rewritten below
using cudf dataframes just by replacing the ``pandas`` module with
``cudf``:
.. code-block:: python
import cudf
from streamz.dataframe import DataFrame
example = cudf.DataFrame({'name': [], 'amount': []})
sdf = DataFrame(stream, example=example)
sdf[sdf.name == 'Alice'].amount.sum()
Supported Operations
--------------------
Streaming cudf dataframes support the following classes of operations:
- Elementwise operations like ``df.x + 1``
- Filtering like ``df[df.name == 'Alice']``
- Column addition like ``df['z'] = df.x + df.y``
- Reductions like ``df.amount.mean()``
- Windowed aggregations (fixed length) like ``df.window(n=100).amount.sum()``
The following operations are not yet supported with cudf (as of version 0.8):
- Groupby-aggregations like ``df.groupby(df.name).amount.mean()``
- Windowed aggregations (index valued) like ``df.window(value='2h').amount.sum()``
- Windowed groupby aggregations like ``df.window(value='2h').groupby('name').amount.sum()``
Window-based Aggregations with cudf are supported just as explained in
the ``dataframes`` documentation. Support for groupby operations is
expected to be added in the future.
|