File: 01_table_oriented.rst

package info (click to toggle)
pandas 1.5.3%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 56,516 kB
  • sloc: python: 382,477; ansic: 8,695; sh: 119; xml: 102; makefile: 97
file content (222 lines) | stat: -rw-r--r-- 5,782 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
.. _10min_tut_01_tableoriented:

{{ header }}

What kind of data does pandas handle?
=====================================

.. raw:: html

    <ul class="task-bullet">
        <li>

I want to start using pandas

.. ipython:: python

    import pandas as pd

To load the pandas package and start working with it, import the
package. The community agreed alias for pandas is ``pd``, so loading
pandas as ``pd`` is assumed standard practice for all of the pandas
documentation.

.. raw:: html

        </li>
    </ul>

pandas data table representation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ../../_static/schemas/01_table_dataframe.svg
   :align: center

.. raw:: html

    <ul class="task-bullet">
        <li>

I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.

.. ipython:: python

    df = pd.DataFrame(
        {
            "Name": [
                "Braund, Mr. Owen Harris",
                "Allen, Mr. William Henry",
                "Bonnell, Miss. Elizabeth",
            ],
            "Age": [22, 35, 58],
            "Sex": ["male", "male", "female"],
        }
    )
    df

To manually store data in a table, create a ``DataFrame``. When using a Python dictionary of lists, the dictionary keys will be used as column headers and
the values in each list as columns of the ``DataFrame``.

.. raw:: html

        </li>
    </ul>

A :class:`DataFrame` is a 2-dimensional data structure that can store data of
different types (including characters, integers, floating point values,
categorical data and more) in columns. It is similar to a spreadsheet, a
SQL table or the ``data.frame`` in R.

-  The table has 3 columns, each of them with a column label. The column
   labels are respectively ``Name``, ``Age`` and ``Sex``.
-  The column ``Name`` consists of textual data with each value a
   string, the column ``Age`` are numbers and the column ``Sex`` is
   textual data.

In spreadsheet software, the table representation of our data would look
very similar:

.. image:: ../../_static/schemas/01_table_spreadsheet.png
   :align: center

Each column in a ``DataFrame`` is a ``Series``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: ../../_static/schemas/01_table_series.svg
   :align: center

.. raw:: html

    <ul class="task-bullet">
        <li>

I’m just interested in working with the data in the column ``Age``

.. ipython:: python

    df["Age"]

When selecting a single column of a pandas :class:`DataFrame`, the result is
a pandas :class:`Series`. To select the column, use the column label in
between square brackets ``[]``.

.. raw:: html

        </li>
    </ul>

.. note::
    If you are familiar to Python
    :ref:`dictionaries <python:tut-dictionaries>`, the selection of a
    single column is very similar to selection of dictionary values based on
    the key.

You can create a ``Series`` from scratch as well:

.. ipython:: python

    ages = pd.Series([22, 35, 58], name="Age")
    ages

A pandas ``Series`` has no column labels, as it is just a single column
of a ``DataFrame``. A Series does have row labels.

Do something with a DataFrame or Series
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. raw:: html

    <ul class="task-bullet">
        <li>

I want to know the maximum Age of the passengers

We can do this on the ``DataFrame`` by selecting the ``Age`` column and
applying ``max()``:

.. ipython:: python

    df["Age"].max()

Or to the ``Series``:

.. ipython:: python

    ages.max()

.. raw:: html

        </li>
    </ul>

As illustrated by the ``max()`` method, you can *do* things with a
``DataFrame`` or ``Series``. pandas provides a lot of functionalities,
each of them a *method* you can apply to a ``DataFrame`` or ``Series``.
As methods are functions, do not forget to use parentheses ``()``.

.. raw:: html

    <ul class="task-bullet">
        <li>

I’m interested in some basic statistics of the numerical data of my data table

.. ipython:: python

    df.describe()

The :func:`~DataFrame.describe` method provides a quick overview of the numerical data in
a ``DataFrame``. As the ``Name`` and ``Sex`` columns are textual data,
these are by default not taken into account by the :func:`~DataFrame.describe` method.

.. raw:: html

        </li>
    </ul>

Many pandas operations return a ``DataFrame`` or a ``Series``. The
:func:`~DataFrame.describe` method is an example of a pandas operation returning a
pandas ``Series`` or a pandas ``DataFrame``.

.. raw:: html

    <div class="d-flex flex-row gs-torefguide">
        <span class="badge badge-info">To user guide</span>

Check more options on ``describe`` in the user guide section about :ref:`aggregations with describe <basics.describe>`

.. raw:: html

    </div>

.. note::
    This is just a starting point. Similar to spreadsheet
    software, pandas represents data as a table with columns and rows. Apart
    from the representation, also the data manipulations and calculations
    you would do in spreadsheet software are supported by pandas. Continue
    reading the next tutorials to get started!

.. raw:: html

    <div class="shadow gs-callout gs-callout-remember">
        <h4>REMEMBER</h4>

-  Import the package, aka ``import pandas as pd``
-  A table of data is stored as a pandas ``DataFrame``
-  Each column in a ``DataFrame`` is a ``Series``
-  You can do things by applying a method to a ``DataFrame`` or ``Series``

.. raw:: html

    </div>

.. raw:: html

    <div class="d-flex flex-row gs-torefguide">
        <span class="badge badge-info">To user guide</span>

A more extended explanation to ``DataFrame`` and ``Series`` is provided in the :ref:`introduction to data structures <dsintro>`.

.. raw:: html

    </div>