1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
|
==============
Making Queries
==============
.. module:: cqlengine.queryset
Retrieving objects
==================
Once you've populated Cassandra with data, you'll probably want to retrieve some of it. This is accomplished with QuerySet objects. This section will describe how to use QuerySet objects to retrieve the data you're looking for.
Retrieving all objects
----------------------
The simplest query you can make is to return all objects from a table.
This is accomplished with the ``.all()`` method, which returns a QuerySet of all objects in a table
Using the Person example model, we would get all Person objects like this:
.. code-block:: python
all_objects = Person.objects.all()
.. _retrieving-objects-with-filters:
Retrieving objects with filters
-------------------------------
Typically, you'll want to query only a subset of the records in your database.
That can be accomplished with the QuerySet's ``.filter(\*\*)`` method.
For example, given the model definition:
.. code-block:: python
class Automobile(Model):
manufacturer = columns.Text(primary_key=True)
year = columns.Integer(primary_key=True)
model = columns.Text()
price = columns.Decimal()
options = columns.Set(columns.Text)
...and assuming the Automobile table contains a record of every car model manufactured in the last 20 years or so, we can retrieve only the cars made by a single manufacturer like this:
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
You can also use the more convenient syntax:
.. code-block:: python
q = Automobile.objects(Automobile.manufacturer == 'Tesla')
We can then further filter our query with another call to **.filter**
.. code-block:: python
q = q.filter(year=2012)
*Note: all queries involving any filtering MUST define either an '=' or an 'in' relation to either a primary key column, or an indexed column.*
Accessing objects in a QuerySet
===============================
There are several methods for getting objects out of a queryset
* iterating over the queryset
.. code-block:: python
for car in Automobile.objects.all():
#...do something to the car instance
pass
* list index
.. code-block:: python
q = Automobile.objects.all()
q[0] #returns the first result
q[1] #returns the second result
.. note::
* CQL does not support specifying a start position in it's queries. Therefore, accessing elements using array indexing will load every result up to the index value requested
* Using negative indices requires a "SELECT COUNT()" to be executed. This has a performance cost on large datasets.
* list slicing
.. code-block:: python
q = Automobile.objects.all()
q[1:] #returns all results except the first
q[1:9] #returns a slice of the results
.. note::
* CQL does not support specifying a start position in it's queries. Therefore, accessing elements using array slicing will load every result up to the index value requested
* Using negative indices requires a "SELECT COUNT()" to be executed. This has a performance cost on large datasets.
* calling :attr:`get() <query.QuerySet.get>` on the queryset
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year=2012)
car = q.get()
this returns the object matching the queryset
* calling :attr:`first() <query.QuerySet.first>` on the queryset
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year=2012)
car = q.first()
this returns the first value in the queryset
.. _query-filtering-operators:
Filtering Operators
===================
:attr:`Equal To <query.QueryOperator.EqualsOperator>`
The default filtering operator.
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year=2012) #year == 2012
In addition to simple equal to queries, cqlengine also supports querying with other operators by appending a ``__<op>`` to the field name on the filtering call
:attr:`in (__in) <query.QueryOperator.InOperator>`
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year__in=[2011, 2012])
:attr:`> (__gt) <query.QueryOperator.GreaterThanOperator>`
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year__gt=2010) # year > 2010
# or the nicer syntax
q.filter(Automobile.year > 2010)
:attr:`>= (__gte) <query.QueryOperator.GreaterThanOrEqualOperator>`
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year__gte=2010) # year >= 2010
# or the nicer syntax
q.filter(Automobile.year >= 2010)
:attr:`< (__lt) <query.QueryOperator.LessThanOperator>`
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year__lt=2012) # year < 2012
# or...
q.filter(Automobile.year < 2012)
:attr:`<= (__lte) <query.QueryOperator.LessThanOrEqualOperator>`
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q = q.filter(year__lte=2012) # year <= 2012
q.filter(Automobile.year <= 2012)
:attr:`CONTAINS (__contains) <query.QueryOperator.ContainsOperator>`
The CONTAINS operator is available for all collection types (List, Set, Map).
.. code-block:: python
q = Automobile.objects.filter(manufacturer='Tesla')
q.filter(options__contains='backup camera').allow_filtering()
Note that we need to use allow_filtering() since the *options* column has no secondary index.
:attr:`LIKE (__like) <query.QueryOperator.LikeOperator>`
The LIKE operator is available for text columns that have a SASI secondary index.
.. code-block:: python
q = Automobile.objects.filter(model__like='%Civic%').allow_filtering()
:attr:`IS NOT NULL (IsNotNull(column_name)) <statement.IsNotNull>`
The IS NOT NULL operator is not yet supported for C*.
.. code-block:: python
q = Automobile.objects.filter(IsNotNull('model'))
Limitations:
- Currently, cqlengine does not support SASI index creation. To use this feature, you need to create the SASI index using the core driver.
- Queries using LIKE must use allow_filtering() since the *model* column has no standard secondary index. Note that the server will use the SASI index properly when executing the query.
TimeUUID Functions
==================
In addition to querying using regular values, there are two functions you can pass in when querying TimeUUID columns to help make filtering by them easier. Note that these functions don't actually return a value, but instruct the cql interpreter to use the functions in it's query.
.. class:: MinTimeUUID(datetime)
returns the minimum time uuid value possible for the given datetime
.. class:: MaxTimeUUID(datetime)
returns the maximum time uuid value possible for the given datetime
*Example*
.. code-block:: python
class DataStream(Model):
id = columns.UUID(partition_key=True)
time = columns.TimeUUID(primary_key=True)
data = columns.Bytes()
min_time = datetime(1982, 1, 1)
max_time = datetime(1982, 3, 9)
DataStream.filter(time__gt=functions.MinTimeUUID(min_time), time__lt=functions.MaxTimeUUID(max_time))
Token Function
==============
Token functon may be used only on special, virtual column pk__token, representing token of partition key (it also works for composite partition keys).
Cassandra orders returned items by value of partition key token, so using cqlengine.Token we can easy paginate through all table rows.
See http://cassandra.apache.org/doc/cql3/CQL-3.0.html#tokenFun
*Example*
.. code-block:: python
class Items(Model):
id = columns.Text(primary_key=True)
data = columns.Bytes()
query = Items.objects.all().limit(10)
first_page = list(query);
last = first_page[-1]
next_page = list(query.filter(pk__token__gt=cqlengine.Token(last.pk)))
QuerySets are immutable
=======================
When calling any method that changes a queryset, the method does not actually change the queryset object it's called on, but returns a new queryset object with the attributes of the original queryset, plus the attributes added in the method call.
*Example*
.. code-block:: python
#this produces 3 different querysets
#q does not change after it's initial definition
q = Automobiles.objects.filter(year=2012)
tesla2012 = q.filter(manufacturer='Tesla')
honda2012 = q.filter(manufacturer='Honda')
Ordering QuerySets
==================
Since Cassandra is essentially a distributed hash table on steroids, the order you get records back in will not be particularly predictable.
However, you can set a column to order on with the ``.order_by(column_name)`` method.
*Example*
.. code-block:: python
#sort ascending
q = Automobiles.objects.all().order_by('year')
#sort descending
q = Automobiles.objects.all().order_by('-year')
*Note: Cassandra only supports ordering on a clustering key. In other words, to support ordering results, your model must have more than one primary key, and you must order on a primary key, excluding the first one.*
*For instance, given our Automobile model, year is the only column we can order on.*
Values Lists
============
There is a special QuerySet's method ``.values_list()`` - when called, QuerySet returns lists of values instead of model instances. It may significantly speedup things with lower memory footprint for large responses.
Each tuple contains the value from the respective field passed into the ``values_list()`` call — so the first item is the first field, etc. For example:
.. code-block:: python
items = list(range(20))
random.shuffle(items)
for i in items:
TestModel.create(id=1, clustering_key=i)
values = list(TestModel.objects.values_list('clustering_key', flat=True))
# [19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L]
Per Query Timeouts
===================
By default all queries are executed with the timeout defined in `~cqlengine.connection.setup()`
The examples below show how to specify a per-query timeout.
A timeout is specified in seconds and can be an int, float or None.
None means no timeout.
.. code-block:: python
class Row(Model):
id = columns.Integer(primary_key=True)
name = columns.Text()
Fetch all objects with a timeout of 5 seconds
.. code-block:: python
Row.objects().timeout(5).all()
Create a single row with a 50ms timeout
.. code-block:: python
Row(id=1, name='Jon').timeout(0.05).create()
Delete a single row with no timeout
.. code-block:: python
Row(id=1).timeout(None).delete()
Update a single row with no timeout
.. code-block:: python
Row(id=1).timeout(None).update(name='Blake')
Batch query timeouts
.. code-block:: python
with BatchQuery(timeout=10) as b:
Row(id=1, name='Jon').create()
NOTE: You cannot set both timeout and batch at the same time, batch will use the timeout defined in it's constructor.
Setting the timeout on the model is meaningless and will raise an AssertionError.
.. _ttl-change:
Default TTL and Per Query TTL
=============================
Model default TTL now relies on the *default_time_to_live* feature, introduced in Cassandra 2.0. It is not handled anymore in the CQLEngine Model (cassandra-driver >=3.6). You can set the default TTL of a table like this:
Example:
.. code-block:: python
class User(Model):
__options__ = {'default_time_to_live': 20}
user_id = columns.UUID(primary_key=True)
...
You can set TTL per-query if needed. Here are a some examples:
Example:
.. code-block:: python
class User(Model):
__options__ = {'default_time_to_live': 20}
user_id = columns.UUID(primary_key=True)
...
user = User.objects.create(user_id=1) # Default TTL 20 will be set automatically on the server
user.ttl(30).update(age=21) # Update the TTL to 30
User.objects.ttl(10).create(user_id=1) # TTL 10
User(user_id=1, age=21).ttl(10).save() # TTL 10
Named Tables
===================
Named tables are a way of querying a table without creating an class. They're useful for querying system tables or exploring an unfamiliar database.
.. code-block:: python
from cassandra.cqlengine.connection import setup
setup("127.0.0.1", "cqlengine_test")
from cassandra.cqlengine.named import NamedTable
user = NamedTable("cqlengine_test", "user")
user.objects()
user.objects()[0]
# {u'pk': 1, u't': datetime.datetime(2014, 6, 26, 17, 10, 31, 774000)}
|