File: create.rst

package info (click to toggle)
python-agate 1.13.0-1~exp1
  • links: PTS, VCS
  • area: main
  • in suites: experimental
  • size: 2,008 kB
  • sloc: python: 8,578; makefile: 126
file content (192 lines) | stat: -rw-r--r-- 4,974 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
===============
Creating tables
===============

From data in memory
===================

From a list of lists.

.. code-block:: python

    column_names = ['letter', 'number']
    column_types = [agate.Text(), agate.Number()]

    rows = [
        ('a', 1),
        ('b', 2),
        ('c', None)
    ]

    table = agate.Table(rows, column_names, column_types)

From a list of dictionaries.

.. code-block:: python

    rows = [
        dict(letter='a', number=1),
        dict(letter='b', number=2),
        dict(letter='c', number=None)
    ]

    table = agate.Table.from_object(rows)


From a CSV
==========

By default, loading a table from a CSV will use agate's builtin :class:`.TypeTester` to infer column types:

.. code-block:: python

    table = agate.Table.from_csv('filename.csv')

Override type inference
=======================

In some cases agate's :class:`.TypeTester` may guess incorrectly. To override the type for some columns and use TypeTester for the rest, pass a dictionary to the ``column_types`` argument.

.. code-block:: python

    specified_types = {
        'column_name_one': agate.Text(),
        'column_name_two': agate.Number()
    }

    table = agate.Table.from_csv('filename.csv', column_types=specified_types)

This will use a generic TypeTester and override your specified columns with ``TypeTester.force``.

Limit type inference
====================

For large datasets :class:`.TypeTester` may be unreasonably slow. In order to limit the amount of data it uses you can specify the ``limit`` argument. Note that if data after the limit invalidates the TypeTester's inference you may get errors when the data is loaded.

.. code-block:: python

    tester = agate.TypeTester(limit=100)

    table = agate.Table.from_csv('filename.csv', column_types=tester)

Manually specify columns
========================

If you know the types of your data you may find it more efficient to manually specify the names and types of your columns. This also gives you an opportunity to rename columns when you load them.

.. code-block:: python

    text_type = agate.Text()
    number_type = agate.Number()

    column_names = ['city', 'area', 'population']
    column_types = [text_type, number_type, number_type]

    table = agate.Table.from_csv('population.csv', column_names, column_types)

Or, you can use this method to load data from a file that does not have a header row:

.. code-block:: python

    table = agate.Table.from_csv('population.csv', column_names, column_types, header=False)

From a unicode CSV
==================

You don't have to do anything special. It just works!

From a latin1 CSV
=================

.. code-block:: python

    table = agate.Table.from_csv('census.csv', encoding='latin1')

From a semicolon delimited CSV
==============================

Normally, agate will automatically guess the delimiter of your CSV, but if that guess fails you can specify it manually:

.. code-block:: python

    table = agate.Table.from_csv('filename.csv', delimiter=';')

From a TSV (tab-delimited CSV)
==============================

This is the same as the previous example, but in this case we specify that the delimiter is a tab:

.. code-block:: python

    table = agate.Table.from_csv('filename.csv', delimiter='\t')

From JSON
=========

.. code-block:: python

    table = agate.Table.from_json('filename.json')

From newline-delimited JSON
===========================

.. code-block:: python

    table = agate.Table.from_json('filename.json', newline=True)

.. _load_a_table_from_a_sql_database:

From a SQL database
===================

Use the `agate-sql <https://agate-sql.readthedocs.org/>`_ extension.

.. code-block:: python

    import agatesql

    table = agate.Table.from_sql('postgresql:///database', 'input_table')

From an Excel spreadsheet
=========================

Use the `agate-excel <https://agate-excel.readthedocs.org/>`_ extension. It supports both .xls and .xlsx files.

.. code-block:: python

    import agateexcel

    table = agate.Table.from_xls('test.xls', sheet='data')

    table2 = agate.Table.from_xlsx('test.xlsx', sheet='data')

From a DBF table
================

DBF is the file format used to hold tabular data for ArcGIS shapefiles. `agate-dbf <https://agate-dbf.readthedocs.org/>`_ extension.

.. code-block:: python

    import agatedbf

    table = agate.Table.from_dbf('test.dbf')

From a remote file
==================

Use the `agate-remote <https://agate-remote.readthedocs.org/>`_ extension.


.. code-block:: python

    import agateremote

    table = agate.Table.from_url('https://raw.githubusercontent.com/wireservice/agate/master/examples/test.csv')

agate-remote also let’s you create an Archive, which is a reference to a group of tables with a known path structure.

.. code-block:: python

    archive = agateremote.Archive('https://github.com/vincentarelbundock/Rdatasets/raw/master/csv/')

    table = archive.get_table('sandwich/PublicSchools.csv')