File: 4_going_elsewhere.rst

package info (click to toggle)
csvkit 2.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 40,664 kB
  • sloc: python: 4,924; perl: 1,000; makefile: 131; sql: 4
file content (106 lines) | stat: -rw-r--r-- 3,848 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
==============================
Going elsewhere with your data
==============================

csvjson: going online
=====================

Very frequently one of the last steps in any data analysis is to get the data onto the web for display as a table, map or chart. CSV is rarely the ideal format for this. More often than not what you want is JSON and that's where :doc:`/scripts/csvjson` comes in. :doc:`/scripts/csvjson` takes an input CSV and outputs neatly formatted JSON. For the sake of illustration, let's use :doc:`/scripts/csvcut` and :doc:`/scripts/csvgrep` to convert just a small slice of our data:

.. code-block:: bash

   csvcut -c county,item_name data.csv | csvgrep -c county -m "GREELEY" | csvjson --indent 4

.. code-block:: json

   [
       {
           "county": "GREELEY",
           "item_name": "RIFLE,7.62 MILLIMETER"
       },
       {
           "county": "GREELEY",
           "item_name": "RIFLE,7.62 MILLIMETER"
       },
       {
           "county": "GREELEY",
           "item_name": "RIFLE,7.62 MILLIMETER"
       }
   ]

A common usage of turning a CSV into a JSON file is for usage as a lookup table in the browser. This can be illustrated with the ACS data we looked at earlier, which contains a unique ``fips`` code for each county:

.. code-block:: bash

   csvjson --indent 4 --key fips acs2012_5yr_population.csv | head

.. code-block:: json

   {
       "31001": {
           "fips": "31001",
           "name": "Adams County, NE",
           "total_population": "31299",
           "margin_of_error": "0"
       },
       "31003": {
           "fips": "31003",
           "name": "Antelope County, NE",
           "...": "..."
       }
   }

For making maps, :doc:`/scripts/csvjson` can also output GeoJSON, see its :doc:`/scripts/csvjson` for more details.

csvpy: going into code
======================

For the programmers out there, the command line is rarely as functional as just writing a little bit of code. :doc:`/scripts/csvpy` exists just to make a programmer's life easier. Invoking it simply launches a Python interactive terminal, with the data preloaded into a CSV reader:

.. code-block:: console

   $ csvpy data.csv
   Welcome! "data.csv" has been loaded in a reader object named "reader".
   >>> print(len(list(reader)))
   1037
   >>> quit()

In addition to being a time-saver, because this uses agate, the reader is Unicode aware.

csvformat: for legacy systems
=============================

It is a foundational principle of csvkit that it always outputs cleanly formatted CSV data. None of the normal csvkit tools can be forced to produce pipe or tab-delimited output, despite these being common formats. This principle is what allows the csvkit tools to chain together so easily and hopefully also reduces the amount of crummy, non-standard CSV files in the world. However, sometimes a legacy system just has to have a pipe-delimited file and it would be crazy to make you use another tool to create it. That's why we've got :doc:`/scripts/csvformat`.

Pipe-delimited:

.. code-block:: bash

   csvformat -D \| data.csv

Tab-delimited:

.. code-block:: bash

   csvformat -T data.csv

Quote every cell:

.. code-block:: bash

   csvformat -U 1 data.csv

Ampersand-delimited, dollar-signs for quotes, quote all strings, and asterisk for line endings:

.. code-block:: bash

   csvformat -D \& -Q \$ -U 2 -M \* data.csv

You get the picture.

Summing up
==========

Thus concludes the csvkit tutorial. At this point, I hope, you have a sense a breadth of possibilities these tools open up with a relatively small number of command-line tools. Of course, this tutorial has only scratched the surface of the available options, so remember to check the :doc:`/cli` documentation for each tool as well.

So armed, go forth and expand the empire of the king of tabular file formats.