File: sql.rst

package info (click to toggle)
python-agate 1.13.0-1~exp1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 2,008 kB
  • sloc: python: 8,578; makefile: 126
file content (205 lines) | stat: -rw-r--r-- 4,550 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
===========
Emulate SQL
===========

agate's command structure is very similar to SQL. The primary difference between agate and SQL is that commands like :code:`SELECT` and :code:`WHERE` explicitly create new tables. You can chain them together as you would with SQL, but be aware each command is actually creating a new table.

.. note::

    All examples in this section use the `PostgreSQL <https://www.postgresql.org/>`_ dialect for comparison.

If you want to read and write data from SQL, see :ref:`load_a_table_from_a_sql_database`.

SELECT
======

SQL:

.. code-block:: postgres

    SELECT state, total FROM table;

agate:

.. code-block:: python

    new_table = table.select(['state', 'total'])

WHERE
=====

SQL:

.. code-block:: postgres

    SELECT * FROM table WHERE LOWER(state) = 'california';

agate:

.. code-block:: python

    new_table = table.where(lambda row: row['state'].lower() == 'california')

ORDER BY
========

SQL:

.. code-block:: postgres

    SELECT * FROM table ORDER BY total DESC;

agate:

.. code-block:: python

    new_table = table.order_by(lambda row: row['total'], reverse=True)

DISTINCT
========

SQL:

.. code-block:: postgres

    SELECT DISTINCT ON (state) * FROM table;

agate:

.. code-block:: python

    new_table = table.distinct('state')

.. note::

    Unlike most SQL implementations, agate always returns the full row. Use :meth:`.Table.select` if you want to filter the columns first.

INNER JOIN
==========

SQL (two ways):

.. code-block:: postgres

    SELECT * FROM patient, doctor WHERE patient.doctor = doctor.id;

    SELECT * FROM patient INNER JOIN doctor ON (patient.doctor = doctor.id);

agate:

.. code-block:: python

    joined = patients.join(doctors, 'doctor', 'id', inner=True)

LEFT OUTER JOIN
===============

SQL:

.. code-block:: postgres

    SELECT * FROM patient LEFT OUTER JOIN doctor ON (patient.doctor = doctor.id);

agate:

.. code-block:: python

    joined = patients.join(doctors, 'doctor', 'id')

FULL OUTER JOIN
===============

SQL:

.. code-block:: postgres

    SELECT * FROM patient FULL OUTER JOIN doctor ON (patient.doctor = doctor.id);

agate:

.. code-block:: python

    joined = patients.join(doctors, 'doctor', 'id', full_outer=True)

GROUP BY
========

agate's :meth:`.Table.group_by` works slightly different than SQLs. It does not require an aggregate function. Instead it returns :py:class:`.TableSet`. To see how to perform the equivalent of a SQL aggregate, see below.

.. code-block:: python

    doctors = patients.group_by('doctor')

You can group by two or more columns by chaining the command.

.. code-block:: python

    doctors_by_state = patients.group_by('state').group_by('doctor')

HAVING
======

agate's :meth:`.TableSet.having` works very similar to SQL's keyword of the same name.

.. code-block:: python

    doctors = patients.group_by('doctor')
    popular_doctors = doctors.having([
        ('patient_count', Count())
    ], lambda t: t['patient_count'] > 100)

This filters to only those doctors whose table includes at least 100 results. Can add as many aggregations as you want to the list and each will be available, by name in the test function you pass.

For example, here we filter to popular doctors with more an average review of at least three stars:

.. code-block:: python

    doctors = patients.group_by('doctor')
    popular_doctors = doctors.having([
        ('patient_count', Count()),
        ('average_stars', Average('stars'))
    ], lambda t: t['patient_count'] > 100 and t['average_stars'] >= 3)

Chain commands together
=======================

SQL:

.. code-block:: postgres

    SELECT state, total FROM table WHERE LOWER(state) = 'california' ORDER BY total DESC;

agate:

.. code-block:: python

    new_table = table \
        .select(['state', 'total']) \
        .where(lambda row: row['state'].lower() == 'california') \
        .order_by('total', reverse=True)

.. note::

    Chaining commands in this way is sometimes not a good idea. Being explicit about each step can lead to clearer code.

Aggregate functions
===================

SQL:

.. code-block:: postgres

    SELECT mean(age), median(age) FROM patients GROUP BY doctor;

agate:

.. code-block:: python

    doctors = patients.group_by('doctor')
    patient_ages = doctors.aggregate([
        ('patient_count', agate.Count()),
        ('age_mean', agate.Mean('age')),
        ('age_median', agate.Median('age'))
    ])

The resulting table will have four columns: ``doctor``, ``patient_count``, ``age_mean`` and ``age_median``.