File: 02_read_write.rst

package info (click to toggle)
pandas 2.2.3%2Bdfsg-9
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 66,784 kB
  • sloc: python: 422,228; ansic: 9,190; sh: 270; xml: 102; makefile: 83
file content (208 lines) | stat: -rw-r--r-- 5,482 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
.. _10min_tut_02_read_write:

{{ header }}

.. ipython:: python

    import pandas as pd

.. raw:: html

    <div class="card gs-data">
        <div class="card-header gs-data-header">
            <div class="gs-data-title">
                Data used for this tutorial:
            </div>
        </div>
        <ul class="list-group list-group-flush">
            <li class="list-group-item gs-data-list">

.. include:: includes/titanic.rst

.. raw:: html

            </li>
        </ul>
    </div>

How do I read and write tabular data?
=====================================

.. image:: ../../_static/schemas/02_io_readwrite.svg
   :align: center

.. raw:: html

    <ul class="task-bullet">
        <li>

I want to analyze the Titanic passenger data, available as a CSV file.

.. ipython:: python

    titanic = pd.read_csv("data/titanic.csv")

pandas provides the :func:`read_csv` function to read data stored as a csv
file into a pandas ``DataFrame``. pandas supports many different file
formats or data sources out of the box (csv, excel, sql, json, parquet,
…), each of them with the prefix ``read_*``.

.. raw:: html

        </li>
    </ul>

Make sure to always have a check on the data after reading in the
data. When displaying a ``DataFrame``, the first and last 5 rows will be
shown by default:

.. ipython:: python

    titanic

.. raw:: html

    <ul class="task-bullet">
        <li>

I want to see the first 8 rows of a pandas DataFrame.

.. ipython:: python

    titanic.head(8)

To see the first N rows of a ``DataFrame``, use the :meth:`~DataFrame.head` method with
the required number of rows (in this case 8) as argument.

.. raw:: html

        </li>
    </ul>

.. note::

    Interested in the last N rows instead? pandas also provides a
    :meth:`~DataFrame.tail` method. For example, ``titanic.tail(10)`` will return the last
    10 rows of the DataFrame.

A check on how pandas interpreted each of the column data types can be
done by requesting the pandas ``dtypes`` attribute:

.. ipython:: python

    titanic.dtypes

For each of the columns, the used data type is enlisted. The data types
in this ``DataFrame`` are integers (``int64``), floats (``float64``) and
strings (``object``).

.. note::
    When asking for the ``dtypes``, no brackets are used!
    ``dtypes`` is an attribute of a ``DataFrame`` and ``Series``. Attributes
    of a ``DataFrame`` or ``Series`` do not need brackets. Attributes
    represent a characteristic of a ``DataFrame``/``Series``, whereas
    methods (which require brackets) *do* something with the
    ``DataFrame``/``Series`` as introduced in the :ref:`first tutorial <10min_tut_01_tableoriented>`.

.. raw:: html

    <ul class="task-bullet">
        <li>

My colleague requested the Titanic data as a spreadsheet.

.. ipython:: python

    titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

Whereas ``read_*`` functions are used to read data to pandas, the
``to_*`` methods are used to store data. The :meth:`~DataFrame.to_excel` method stores
the data as an excel file. In the example here, the ``sheet_name`` is
named *passengers* instead of the default *Sheet1*. By setting
``index=False`` the row index labels are not saved in the spreadsheet.

.. raw:: html

        </li>
    </ul>

The equivalent read function :meth:`~DataFrame.read_excel` will reload the data to a
``DataFrame``:

.. ipython:: python

    titanic = pd.read_excel("titanic.xlsx", sheet_name="passengers")

.. ipython:: python

    titanic.head()

.. ipython:: python
   :suppress:

   import os

   os.remove("titanic.xlsx")

.. raw:: html

    <ul class="task-bullet">
        <li>

I’m interested in a technical summary of a ``DataFrame``

.. ipython:: python

    titanic.info()


The method :meth:`~DataFrame.info` provides technical information about a
``DataFrame``, so let’s explain the output in more detail:

-  It is indeed a :class:`DataFrame`.
-  There are 891 entries, i.e. 891 rows.
-  Each row has a row label (aka the ``index``) with values ranging from
   0 to 890.
-  The table has 12 columns. Most columns have a value for each of the
   rows (all 891 values are ``non-null``). Some columns do have missing
   values and less than 891 ``non-null`` values.
-  The columns ``Name``, ``Sex``, ``Cabin`` and ``Embarked`` consists of
   textual data (strings, aka ``object``). The other columns are
   numerical data with some of them whole numbers (aka ``integer``) and
   others are real numbers (aka ``float``).
-  The kind of data (characters, integers,…) in the different columns
   are summarized by listing the ``dtypes``.
-  The approximate amount of RAM used to hold the DataFrame is provided
   as well.

.. raw:: html

        </li>
    </ul>

.. raw:: html

    <div class="shadow gs-callout gs-callout-remember">
        <h4>REMEMBER</h4>

-  Getting data in to pandas from many different file formats or data
   sources is supported by ``read_*`` functions.
-  Exporting data out of pandas is provided by different
   ``to_*``\ methods.
-  The ``head``/``tail``/``info`` methods and the ``dtypes`` attribute
   are convenient for a first check.

.. raw:: html

    </div>

.. raw:: html

    <div class="d-flex flex-row gs-torefguide">
        <span class="badge badge-info">To user guide</span>

For a complete overview of the input and output possibilities from and to pandas, see the user guide section about :ref:`reader and writer functions <io>`.

.. raw:: html

    </div>