File: README.md

package info (click to toggle)
haskell-csv-conduit 0.7.3.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 156 kB
  • sloc: haskell: 1,307; makefile: 3
file content (123 lines) | stat: -rw-r--r-- 3,663 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# README [![Build Status](https://travis-ci.org/ozataman/csv-conduit.svg?branch=master)](https://travis-ci.org/ozataman/csv-conduit)

## CSV Files and Haskell

CSV files are the de-facto standard in many cases of data transfer,
particularly when dealing with enterprise application or disparate database
systems.

While there are a number of csv libraries in Haskell, at the time of
this project's start, there wasn't one that provided all of the
following:

* Full flexibility in quote characters, separators, input/output
* Constant space operation
* Robust parsing and error resiliency
* Battle-tested reliability in real-world datasets
* Fast operation
* Convenient interface that supports a variety of use cases

Over time, people created other plausible CSV packages like cassava.
The major benefit from this library remains to be:

* Direct participation in the conduit ecosystem, which is now quite
  large, and all the benefits that come with it.
* Flexibility in CSV format definition.
* Resiliency to errors in the input data.


## This package

csv-conduit is a conduit-based CSV parsing library that is easy to
use, flexible and fast. It leverages the conduit infrastructure to
provide constant-space operation, which is quite critical in many real
world use cases.

For example, you can use http-conduit to download a CSV file from the
internet and plug its Source into intoCSV to stream-convert the
download into the Row data type and do something with it as the data
streams, that is without having to download the entire file to disk
first.


## Author & Contributors

- Ozgun Ataman (@ozataman)
- Daniel Bergey (@bergey)
- BJTerry (@BJTerry)
- Mike Craig (@mkscrg)
- Daniel Corson (@dancor)
- Dmitry Dzhus (@dzhus)
- Niklas Hambüchen (@nh2)
- Facundo Domínguez (@facundominguez)


### Introduction

* The CSVeable typeclass implements the key operations.
* CSVeable is parameterized on both a stream type and a target CSV row type.
* There are 2 basic row types and they implement *exactly* the same operations,
  so you can chose the right one for the job at hand:
  - `type MapRow t = Map t t`
  - `type Row t = [t]`
* You basically use the Conduits defined in this library to do the
  parsing from a CSV stream and rendering back into a CSV stream.
* Use the full flexibility and modularity of conduits for sources and sinks.

### Speed

While fast operation is of concern, I have so far cared more about correct
operation and a flexible API. Please let me know if you notice any performance
regressions or optimization opportunities.


### Usage Examples


#### Example #1: Basics Using Convenience API

```haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.Conduit.List as CL
import Data.CSV.Conduit
import Data.Text (Text)

-- Just reverse te columns
myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = CL.map reverse

test :: IO ()
test = runResourceT $ 
  transformCSV defCSVSettings 
               (sourceFile "input.csv") 
               myProcessor
               (sinkFile "output.csv")
```

#### Example #2: Basics Using Conduit API

```haskell
{-# LANGUAGE OverloadedStrings #-}

import Data.Conduit
import Data.Conduit.Binary
import Data.CSV.Conduit
import Data.Text (Text)

myProcessor :: Monad m => Conduit (Row Text) m (Row Text)
myProcessor = awaitForever $ yield

-- Let's simply stream from a file, parse the CSV, reserialize it
-- and push back into another file.
test :: IO ()
test = runResourceT $ 
  sourceFile "test/BigFile.csv" $= 
  intoCSV defCSVSettings $=
  myProcessor $=
  fromCSV defCSVSettings $$
  sinkFile "test/BigFileOut.csv"
```