1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
|
<pre>Network Working Group Y. Shafranovich
Request for Comments: 4180 SolidMatrix Technologies, Inc.
Category: Informational October 2005
<span class="h1">Common Format and MIME Type for Comma-Separated Values (CSV) Files</span>
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This RFC documents the format used for Comma-Separated Values (CSV)
files and registers the associated MIME type "text/csv".
Table of Contents
<a href="#section-1">1</a>. Introduction ....................................................<a href="#page-2">2</a>
<a href="#section-2">2</a>. Definition of the CSV Format ....................................<a href="#page-2">2</a>
<a href="#section-3">3</a>. MIME Type Registration of text/csv ..............................<a href="#page-4">4</a>
<a href="#section-4">4</a>. IANA Considerations .............................................<a href="#page-5">5</a>
<a href="#section-5">5</a>. Security Considerations .........................................<a href="#page-5">5</a>
<a href="#section-6">6</a>. Acknowledgments .................................................<a href="#page-6">6</a>
<a href="#section-7">7</a>. References ......................................................<a href="#page-6">6</a>
<a href="#section-7.1">7.1</a>. Normative References .......................................<a href="#page-6">6</a>
<a href="#section-7.2">7.2</a>. Informative References .....................................<a href="#page-6">6</a>
<span class="grey">Shafranovich Informational [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
The comma separated values format (CSV) has been used for exchanging
and converting data between various spreadsheet programs for quite
some time. Surprisingly, while this format is very common, it has
never been formally documented. Additionally, while the IANA MIME
registration tree includes a registration for
"text/tab-separated-values" type, no MIME types have ever been
registered with IANA for CSV. At the same time, various programs and
operating systems have begun to use different MIME types for this
format. This RFC documents the format of comma separated values
(CSV) files and formally registers the "text/csv" MIME type for CSV
in accordance with <a href="./rfc2048">RFC 2048</a> [<a href="#ref-1" title=""Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures"">1</a>].
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Definition of the CSV Format</span>
While there are various specifications and implementations for the
CSV format (for ex. [<a href="#ref-4" title=""HOW-TO: The Comma Separated Value (CSV) File Format"">4</a>], [<a href="#ref-5" title=""CSV Standard File Format"">5</a>], [<a href="#ref-6" title=""Documentation for Ricebridge CSV Manager"">6</a>] and [<a href="#ref-7" title=""The Art of Unix Programming, Chapter 5"">7</a>]), there is no formal
specification in existence, which allows for a wide variety of
interpretations of CSV files. This section documents the format that
seems to be followed by most implementations:
1. Each record is located on a separate line, delimited by a line
break (CRLF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
2. The last record in the file may or may not have an ending line
break. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
3. There maybe an optional header line appearing as the first line
of the file with the same format as normal record lines. This
header will contain names corresponding to the fields in the file
and should contain the same number of fields as the records in
the rest of the file (the presence or absence of the header line
should be indicated via the optional "header" parameter of this
MIME type). For example:
field_name,field_name,field_name CRLF
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
<span class="grey">Shafranovich Informational [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
4. Within the header and each record, there may be one or more
fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part
of a field and should not be ignored. The last field in the
record must not be followed by a comma. For example:
aaa,bbb,ccc
5. Each field may or may not be enclosed in double quotes (however
some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then
double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
6. Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
The ABNF grammar [<a href="#ref-2" title=""Augmented BNF for Syntax Specifications: ABNF"">2</a>] appears as follows:
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per <a href="./rfc2234#section-6.1">section 6.1 of RFC 2234</a> [<a href="#ref-2" title=""Augmented BNF for Syntax Specifications: ABNF"">2</a>]
<span class="grey">Shafranovich Informational [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
DQUOTE = %x22 ;as per <a href="./rfc2234#section-6.1">section 6.1 of RFC 2234</a> [<a href="#ref-2" title=""Augmented BNF for Syntax Specifications: ABNF"">2</a>]
LF = %x0A ;as per <a href="./rfc2234#section-6.1">section 6.1 of RFC 2234</a> [<a href="#ref-2" title=""Augmented BNF for Syntax Specifications: ABNF"">2</a>]
CRLF = CR LF ;as per <a href="./rfc2234#section-6.1">section 6.1 of RFC 2234</a> [<a href="#ref-2" title=""Augmented BNF for Syntax Specifications: ABNF"">2</a>]
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. MIME Type Registration of text/csv</span>
This section provides the media-type registration application (as per
<a href="./rfc2048">RFC 2048</a> [<a href="#ref-1" title=""Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures"">1</a>].
To: ietf-types@iana.org
Subject: Registration of MIME media type text/csv
MIME media type name: text
MIME subtype name: csv
Required parameters: none
Optional parameters: charset, header
Common usage of CSV is US-ASCII, but other character sets defined
by IANA for the "text" tree may be used in conjunction with the
"charset" parameter.
The "header" parameter indicates the presence or absence of the
header line. Valid values are "present" or "absent".
Implementors choosing not to use this parameter must make their
own decisions as to whether the header line is present or absent.
Encoding considerations:
As per <a href="./rfc2046#section-4.1.1">section 4.1.1. of RFC 2046</a> [<a href="#ref-3" title=""Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types"">3</a>], this media type uses CRLF
to denote line breaks. However, implementors should be aware that
some implementations may use other values.
Security considerations:
CSV files contain passive text data that should not pose any
risks. However, it is possible in theory that malicious binary
data may be included in order to exploit potential buffer overruns
in the program processing CSV data. Additionally, private data
may be shared via this format (which of course applies to any text
data).
<span class="grey">Shafranovich Informational [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
Interoperability considerations:
Due to lack of a single specification, there are considerable
differences among implementations. Implementors should "be
conservative in what you do, be liberal in what you accept from
others" (<a href="./rfc793">RFC 793</a> [<a href="#ref-8" title=""Transmission Control Protocol"">8</a>]) when processing CSV files. An attempt at a
common definition can be found in <a href="#section-2">Section 2</a>.
Implementations deciding not to use the optional "header"
parameter must make their own decision as to whether the header is
absent or present.
Published specification:
While numerous private specifications exist for various programs
and systems, there is no single "master" specification for this
format. An attempt at a common definition can be found in <a href="#section-2">Section</a>
<a href="#section-2">2</a>.
Applications that use this media type:
Spreadsheet programs and various data conversion utilities
Additional information:
Magic number(s): none
File extension(s): CSV
Macintosh File Type Code(s): TEXT
Person & email address to contact for further information:
Yakov Shafranovich <ietf@shaftek.org>
Intended usage: COMMON
Author/Change controller: IESG
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. IANA Considerations</span>
The IANA has registered the MIME type "text/csv" using the
application provided in <a href="#section-3">Section 3</a> of this document.
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Security Considerations</span>
See discussion above in <a href="#section-3">section 3</a>.
<span class="grey">Shafranovich Informational [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Acknowledgments</span>
The author would like to thank Dave Crocker, Martin Duerst, Joel M.
Halpern, Clyde Ingram, Graham Klyne, Bruce Lilly, Chris Lilley, and
members of the IESG for their helpful suggestions. A special word of
thanks goes to Dave for helping with the ABNF grammar.
The author would also like to thank Henrik Lefkowetz, Marshall Rose,
and the folks at xml.resource.org for providing many of the tools
used for preparing RFCs and Internet drafts.
A special thank you goes to L.T.S.
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. References</span>
<span class="h3"><a class="selflink" id="section-7.1" href="#section-7.1">7.1</a>. Normative References</span>
[<a id="ref-1">1</a>] Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
Mail Extensions (MIME) Part Four: Registration Procedures", <a href="https://www.rfc-editor.org/bcp/bcp13">BCP</a>
<a href="https://www.rfc-editor.org/bcp/bcp13">13</a>, <a href="./rfc2048">RFC 2048</a>, November 1996.
[<a id="ref-2">2</a>] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", <a href="./rfc2234">RFC 2234</a>, November 1997.
[<a id="ref-3">3</a>] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", <a href="./rfc2046">RFC 2046</a>, November
1996.
<span class="h3"><a class="selflink" id="section-7.2" href="#section-7.2">7.2</a>. Informative References</span>
[<a id="ref-4">4</a>] Repici, J., "HOW-TO: The Comma Separated Value (CSV) File
Format", 2004,
<<a href="http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm">http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm</a>>.
[<a id="ref-5">5</a>] Edoceo, Inc., "CSV Standard File Format", 2004,
<<a href="http://www.edoceo.com/utilis/csv-file-format.php">http://www.edoceo.com/utilis/csv-file-format.php</a>>.
[<a id="ref-6">6</a>] Rodger, R. and O. Shanaghy, "Documentation for Ricebridge CSV
Manager", February 2005,
<<a href="http://www.ricebridge.com/products/csvman/reference.htm">http://www.ricebridge.com/products/csvman/reference.htm</a>>.
[<a id="ref-7">7</a>] Raymond, E., "The Art of Unix Programming, Chapter 5", September
2003,
<<a href="http://www.catb.org/~esr/writings/taoup/html/ch05s02.html">http://www.catb.org/~esr/writings/taoup/html/ch05s02.html</a>>.
[<a id="ref-8">8</a>] Postel, J., "Transmission Control Protocol", STD 7, <a href="./rfc793">RFC 793</a>,
September 1981.
<span class="grey">Shafranovich Informational [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
Author's Address
Yakov Shafranovich
SolidMatrix Technologies, Inc.
EMail: ietf@shaftek.org
URI: <a href="http://www.shaftek.org">http://www.shaftek.org</a>
<span class="grey">Shafranovich Informational [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc4180">RFC 4180</a> Common Format and MIME Type for CSV Files October 2005</span>
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a>, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and <a href="https://www.rfc-editor.org/bcp/bcp79">BCP 79</a>.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
<a href="http://www.ietf.org/ipr">http://www.ietf.org/ipr</a>.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Shafranovich Informational [Page 8]
</pre>
|