1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192
|
===============
Creating tables
===============
From data in memory
===================
From a list of lists.
.. code-block:: python
column_names = ['letter', 'number']
column_types = [agate.Text(), agate.Number()]
rows = [
('a', 1),
('b', 2),
('c', None)
]
table = agate.Table(rows, column_names, column_types)
From a list of dictionaries.
.. code-block:: python
rows = [
dict(letter='a', number=1),
dict(letter='b', number=2),
dict(letter='c', number=None)
]
table = agate.Table.from_object(rows)
From a CSV
==========
By default, loading a table from a CSV will use agate's builtin :class:`.TypeTester` to infer column types:
.. code-block:: python
table = agate.Table.from_csv('filename.csv')
Override type inference
=======================
In some cases agate's :class:`.TypeTester` may guess incorrectly. To override the type for some columns and use TypeTester for the rest, pass a dictionary to the ``column_types`` argument.
.. code-block:: python
specified_types = {
'column_name_one': agate.Text(),
'column_name_two': agate.Number()
}
table = agate.Table.from_csv('filename.csv', column_types=specified_types)
This will use a generic TypeTester and override your specified columns with ``TypeTester.force``.
Limit type inference
====================
For large datasets :class:`.TypeTester` may be unreasonably slow. In order to limit the amount of data it uses you can specify the ``limit`` argument. Note that if data after the limit invalidates the TypeTester's inference you may get errors when the data is loaded.
.. code-block:: python
tester = agate.TypeTester(limit=100)
table = agate.Table.from_csv('filename.csv', column_types=tester)
Manually specify columns
========================
If you know the types of your data you may find it more efficient to manually specify the names and types of your columns. This also gives you an opportunity to rename columns when you load them.
.. code-block:: python
text_type = agate.Text()
number_type = agate.Number()
column_names = ['city', 'area', 'population']
column_types = [text_type, number_type, number_type]
table = agate.Table.from_csv('population.csv', column_names, column_types)
Or, you can use this method to load data from a file that does not have a header row:
.. code-block:: python
table = agate.Table.from_csv('population.csv', column_names, column_types, header=False)
From a unicode CSV
==================
You don't have to do anything special. It just works!
From a latin1 CSV
=================
.. code-block:: python
table = agate.Table.from_csv('census.csv', encoding='latin1')
From a semicolon delimited CSV
==============================
Normally, agate will automatically guess the delimiter of your CSV, but if that guess fails you can specify it manually:
.. code-block:: python
table = agate.Table.from_csv('filename.csv', delimiter=';')
From a TSV (tab-delimited CSV)
==============================
This is the same as the previous example, but in this case we specify that the delimiter is a tab:
.. code-block:: python
table = agate.Table.from_csv('filename.csv', delimiter='\t')
From JSON
=========
.. code-block:: python
table = agate.Table.from_json('filename.json')
From newline-delimited JSON
===========================
.. code-block:: python
table = agate.Table.from_json('filename.json', newline=True)
.. _load_a_table_from_a_sql_database:
From a SQL database
===================
Use the `agate-sql <https://agate-sql.readthedocs.org/>`_ extension.
.. code-block:: python
import agatesql
table = agate.Table.from_sql('postgresql:///database', 'input_table')
From an Excel spreadsheet
=========================
Use the `agate-excel <https://agate-excel.readthedocs.org/>`_ extension. It supports both .xls and .xlsx files.
.. code-block:: python
import agateexcel
table = agate.Table.from_xls('test.xls', sheet='data')
table2 = agate.Table.from_xlsx('test.xlsx', sheet='data')
From a DBF table
================
DBF is the file format used to hold tabular data for ArcGIS shapefiles. `agate-dbf <https://agate-dbf.readthedocs.org/>`_ extension.
.. code-block:: python
import agatedbf
table = agate.Table.from_dbf('test.dbf')
From a remote file
==================
Use the `agate-remote <https://agate-remote.readthedocs.org/>`_ extension.
.. code-block:: python
import agateremote
table = agate.Table.from_url('https://raw.githubusercontent.com/wireservice/agate/master/examples/test.csv')
agate-remote also let’s you create an Archive, which is a reference to a group of tables with a known path structure.
.. code-block:: python
archive = agateremote.Archive('https://github.com/vincentarelbundock/Rdatasets/raw/master/csv/')
table = archive.get_table('sandwich/PublicSchools.csv')
|