1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
|
# README [](https://travis-ci.org/ozataman/csv-conduit)
## CSV Files and Haskell
CSV files are the de-facto standard in many cases of data transfer,
particularly when dealing with enterprise application or disparate database
systems.
While there are a number of csv libraries in Haskell, at the time of
this project's start, there wasn't one that provided all of the
following:
* Full flexibility in quote characters, separators, input/output
* Constant space operation
* Robust parsing and error resiliency
* Battle-tested reliability in real-world datasets
* Fast operation
* Convenient interface that supports a variety of use cases
Over time, people created other plausible CSV packages like cassava.
The major benefit from this library remains to be:
* Direct participation in the conduit ecosystem, which is now quite
large, and all the benefits that come with it.
* Flexibility in CSV format definition.
* Resiliency to errors in the input data.
## This package
csv-conduit is a conduit-based CSV parsing library that is easy to
use, flexible and fast. It leverages the conduit infrastructure to
provide constant-space operation, which is quite critical in many real
world use cases.
For example, you can use http-conduit to download a CSV file from the
internet and plug its Source into intoCSV to stream-convert the
download into the Row data type and do something with it as the data
streams, that is without having to download the entire file to disk
first.
## Author & Contributors
- Ozgun Ataman (@ozataman)
- Daniel Bergey (@bergey)
- BJTerry (@BJTerry)
- Mike Craig (@mkscrg)
- Daniel Corson (@dancor)
- Dmitry Dzhus (@dzhus)
- Niklas Hambüchen (@nh2)
- Facundo Domínguez (@facundominguez)
### Introduction
* The CSVeable typeclass implements the key operations.
* CSVeable is parameterized on both a stream type and a target CSV row type.
* There are 2 basic row types and they implement *exactly* the same operations,
so you can chose the right one for the job at hand:
- `type MapRow t = Map t t`
- `type Row t = [t]`
* You basically use the Conduits defined in this library to do the
parsing from a CSV stream and rendering back into a CSV stream.
* Use the full flexibility and modularity of conduits for sources and sinks.
### Speed
While fast operation is of concern, I have so far cared more about correct
operation and a flexible API. Please let me know if you notice any performance
regressions or optimization opportunities.
### Usage Examples
#### Example #1: Basics Using Convenience API
```haskell
{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Binary
import Data.Conduit.List as CL
import Data.CSV.Conduit
import Data.Text (Text)
-- Just reverse te columns
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = CL.map reverse
test :: IO ()
test = runResourceT $
transformCSV defCSVSettings
(sourceFile "input.csv")
myProcessor
(sinkFile "output.csv")
```
#### Example #2: Basics Using Conduit API
```haskell
{-# LANGUAGE OverloadedStrings #-}
import Data.Conduit
import Data.Conduit.Binary
import Data.CSV.Conduit
import Data.Text (Text)
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = awaitForever $ yield
-- Let's simply stream from a file, parse the CSV, reserialize it
-- and push back into another file.
test :: IO ()
test = runResourceT $
sourceFile "test/BigFile.csv" $=
intoCSV defCSVSettings $=
myProcessor $=
fromCSV defCSVSettings $$
sinkFile "test/BigFileOut.csv"
```
|