File: query_paging.rst

package info (click to toggle)
python-cassandra-driver 3.29.2-5
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 5,144 kB
sloc: python: 51,532; ansic: 768; makefile: 136; sh: 13
file content (95 lines) | stat: -rw-r--r-- 3,667 bytes
parent folder | download | duplicates (3)
.. _query-paging:

Paging Large Queries
====================
Cassandra 2.0+ offers support for automatic query paging.  Starting with
version 2.0 of the driver, if :attr:`~.Cluster.protocol_version` is greater than
:const:`2` (it is by default), queries returning large result sets will be
automatically paged.

Controlling the Page Size
-------------------------
By default, :attr:`.Session.default_fetch_size` controls how many rows will
be fetched per page.  This can be overridden per-query by setting
:attr:`~.fetch_size` on a :class:`~.Statement`.  By default, each page
will contain at most 5000 rows.

Handling Paged Results
----------------------
Whenever the number of result rows for are query exceed the page size, an
instance of :class:`~.PagedResult` will be returned instead of a normal
list.  This class implements the iterator interface, so you can treat
it like a normal iterator over rows::

    from cassandra.query import SimpleStatement
    query = "SELECT * FROM users"  # users contains 100 rows
    statement = SimpleStatement(query, fetch_size=10)
    for user_row in session.execute(statement):
        process_user(user_row)

Whenever there are no more rows in the current page, the next page will
be fetched transparently.  However, note that it *is* possible for
an :class:`Exception` to be raised while fetching the next page, just
like you might see on a normal call to ``session.execute()``.

If you use :meth:`.Session.execute_async()` along with,
:meth:`.ResponseFuture.result()`, the first page will be fetched before
:meth:`~.ResponseFuture.result()` returns, but latter pages will be
transparently fetched synchronously while iterating the result.

Handling Paged Results with Callbacks
-------------------------------------
If callbacks are attached to a query that returns a paged result,
the callback will be called once per page with a normal list of rows.

Use :attr:`.ResponseFuture.has_more_pages` and
:meth:`.ResponseFuture.start_fetching_next_page()` to continue fetching
pages.  For example::

    class PagedResultHandler(object):

        def __init__(self, future):
            self.error = None
            self.finished_event = Event()
            self.future = future
            self.future.add_callbacks(
                callback=self.handle_page,
                errback=self.handle_err)

        def handle_page(self, rows):
            for row in rows:
                process_row(row)

            if self.future.has_more_pages:
                self.future.start_fetching_next_page()
            else:
                self.finished_event.set()

        def handle_error(self, exc):
            self.error = exc
            self.finished_event.set()

    future = session.execute_async("SELECT * FROM users")
    handler = PagedResultHandler(future)
    handler.finished_event.wait()
    if handler.error:
        raise handler.error

Resume Paged Results
--------------------

You can resume the pagination when executing a new query by using the :attr:`.ResultSet.paging_state`. This can be useful if you want to provide some stateless pagination capabilities to your application (ie. via http). For example::

    from cassandra.query import SimpleStatement
    query = "SELECT * FROM users"
    statement = SimpleStatement(query, fetch_size=10)
    results = session.execute(statement)

    # save the paging_state somewhere and return current results
    web_session['paging_state'] = results.paging_state


    # resume the pagination sometime later...
    statement = SimpleStatement(query, fetch_size=10)
    ps = web_session['paging_state']
    results = session.execute(statement, paging_state=ps)