File: query_101.md

package info (click to toggle)
python-maggma 0.70.0-7
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,416 kB
  • sloc: python: 10,150; makefile: 12
file content (180 lines) | stat: -rw-r--r-- 6,455 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# Understanding Queries

Putting your data into a `maggma` `Store` gives you powerful search, summary,
and analytical capabilities. All are based on "queries", which specify how
you want to search your data, and which parts of it you want to get in return.

`maggma` query syntax closely follows [MongoDB Query syntax](https://www.mongodb.com/docs/manual/tutorial/query-documents/). In this tutorial, we'll cover the syntax of the most common query operations. You can refer to the
[MongoDB](https://www.mongodb.com/docs/manual/tutorial/query-documents/) or  [pymongo](https://pymongo.readthedocs.io/en/stable/tutorial.html) (python interface to MongoDB) documentation for examples of more advanced use cases.

Let's create an example dataset describing the [Teenage Mutant Ninja Turtles](https://en.wikipedia.org/wiki/Teenage_Mutant_Ninja_Turtles).

```python
>>> turtles = [{"name": "Leonardo",
                "color": "blue",
                "tool": "sword",
                "occupation": "ninja"
                },
               {"name": "Donatello",
                "color": "purple",
                "tool": "staff",
                "occupation": "ninja"
                },
               {"name": "Michelangelo",
                "color": "orange",
                "tool": "nunchuks",
                "occupation": "ninja"
                },
               {"name":"Raphael",
               "color": "red",
               "tool": "sai",
               "occupation": "ninja"
                },
               {"name":"Splinter",
               "occupation": "sensei"
                }
            ]
```

Notice how this data follows the principles described in [Structuring `Store` data](stores.md/#structuring-store-data):
- every document (`dict`) has a `name` key with a unique value
- every document has a common set of keys (`name`,
`occupation`).
- Note that SOME documents also share the keys `tool` and `color`, but not all. This is OK.

For the rest of this tutorial, we will assume that this data has already been
added to a `Store` called `tmnt_store`, which we are going to query.

## The `query` method

`Store.query()` is the primary method you will use to search your data.

- `query`
always returns a generator yielding any and all documents that match the query
you provide.
- There are no mandatory arguments. If you run `query()` you will get a generator
containing all documents in the `Store`
- The first (optional) argument is `criteria`, which is a query formatted as a `dict` as described in the next section.
- You can also specify `properties`, which is a list of fields from the documents you want to return. This is useful when working with large documents because then you only have to download the data you need rather than the entire document.
- You can also `skip` every N documents, `limit` the number of documents returned, and `sort` the result by some field.

Since `query` returns a generator, you will typically want to turn the results into a list, or use them in a `for` loop.

Turn into a list
```python
results = [d for d in store.query()]
```

Use in a `for` loop
```python
for doc in store.query():
    print(doc)
```

## The structure of a query

A query is also a `dict`. Each key in the dict corresponds to a fjeld in the
documents you want to query (such as `name`, `color`, etc.), and the value
is the value of that key that you want to match. For example, a query to
select all documents where `occupation` is `ninja`, would look like

```python
{"occupation": "ninja"}
```

This query will be passed as an argument to `Store` methods like `query_one`,
`query`, and `count`, as demonstrated next.


## Example queries

### Match a single value

To select all records where a field matches a single value, set the key to
the field you want to match and its value to the value you are looking for.

Return all records where 'occupation' is 'ninja'
```python
>>> with tmnt_store as store:
...     results = list(store.query({"occupation": "ninja"}))
>>> len(results)
4
```

Return all records where 'name' is 'Splinter'

```python
>>> with tmnt_store as store:
...     results = list(store.query({"name": "Splinter"}))
>>> len(results)
1
```

### Match any value in a list: `$in`

To find all documents where a field matches one of several different
values, use `$in` with a list of the value you want to search.

```python
>>> with tmnt_store as store:
...     results = list(store.query({"color": {"$in": ["red", "blue"]}}))
>>> len(results)
2
```

`$in` is an example of a "query operator". Others include:

- `$nin`: a value is NOT in a list (the inverse of the above example)
- `$gt`, `$gte`: greater than, greater than or equal to a value
- `$lt`, `$lte`: greater than, greater than or equal to a value
- `$ne`: not equal to a value
- `$not`: inverts the effect of a query expression, returning results that
    do NOT match.

See the [MongoDB docs](https://www.mongodb.com/docs/manual/reference/operator/query/#query-selectors) for a complete list.

!!! Note

    When using query operators like `$in`, you must include a nested `dict` in
    your query, where the operator is the key and the search parameters are
    the value, e.g., the dictionary `{"$in": ["red", "blue"]}` is the **value**
    associated with the search field (`color`) in the parent dictionary.

### Nested fields

Suppose that our documents had a nested structure, for example, by having
separate fields for first and last name:

```python
>>> turtles = [{"name":
                    {"first": "Leonardo",
                     "last": "turtle"
                     },
                "color": "blue",
                "tool": "sword",
                "occupation": "ninja"
                },
                ...
                ]
```

You can query nested fields by placing a period `.` between each level in the
hierarchy. For example:

```python
>>> with tmnt_store as store:
...     results = list(store.query({"name.first": "Splinter"}))
>>> len(results)
1
```

### Numerical Values

You can query numerical values in analogous fashion to the examples given above.

!!! Note
    When querying on numerical values, be mindful of the `type` of the data.
    Data stored in `json` format is often converted entirely to `str`, so if
    you use a numerical query operator like `$gte`, you might not get the
    results you expect unless you first verify that the numerical data
    in the `Store` is a `float` or `int` .