File: bom.md

package info (click to toggle)
php-league-csv 9.24.1%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 39,744 kB
  • sloc: php: 13,447; javascript: 80; makefile: 33; xml: 29
file content (128 lines) | stat: -rw-r--r-- 5,106 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
layout: default
title: CSV and BOM character
---

# Managing the BOM character

## Detecting the CSV BOM character

To improve interoperability with programs interacting with CSV, you can now manage the presence of a <abbr title="Byte Order Mark">BOM</abbr> character in your CSV content. <a href="http://en.wikipedia.org/wiki/Endianness" target="_blank">The character signals the endianness</a> of the CSV and its value depends on the CSV encoding character. To help you work with `BOM`, we are adding the following constants to the `Reader` and the `Writer` class:

- `BOM_UTF8` : `UTF-8` `BOM`;
- `BOM_UTF16_BE` : `UTF-16` `BOM` with Big-Endian;
- `BOM_UTF16_LE` : `UTF-16` `BOM` with Little-Endian;
- `BOM_UTF32_BE` : `UTF-32` `BOM` with Big-Endian;
- `BOM_UTF32_LE` : `UTF-32` `BOM` with Little-Endian;

They each represent the `BOM` character for each encoding character.

### getInputBOM()

This method will detect and return the `BOM` character used in your CSV if any.

```php
$reader = new Reader::createFromPath('/path/to/your/file.csv', 'r');
$res = $reader->getInputBOM(); //$res equals null if no BOM is found

$reader = new Reader::createFromPat('path/to/your/msexcel.csv');
if (Reader::BOM_UTF16_LE == $reader->getInputBOM()) {
    //the CSV file is encoded using UTF-16 LE
}
```

If you wish to remove the BOM character while processing your data, you can rely on the [query filters](/7.0/query-filtering/#stripbomstatus) to do so.

## Adding the BOM character to your CSV

### setOutputBOM($bom = null)

This method will manage the addition of a BOM character in front of your outputted CSV when you are:

- downloading a file using the `output` method
- outputting the CSV directly using the `__toString()` method

`$bom` is a string representing the BOM character. To remove the `BOM` character just set `$bom` to an empty value like `null` or an empty string.

<p class="message-info">To ease writing the sequence you should use the <code>BOM_*</code> constants.</p>

### getOutputBOM()

This method will tell you at any given time what `BOM` character will be prepended to the CSV content.

<p class="message-info">For Backward compatibility by default <code>getOutputBOM</code> returns <code>null</code>.</p>

```php
$reader = new Reader::createFromPath('/path/to/your/file.csv', 'r');
$reader->getOutputBOM(); //$res equals null;
$reader->setOutputBOM(Reader::BOM_UTF16LE);
$res = $reader->getOutputBOM(); //$res equals "\xFF\xFE";
echo $reader; //the BOM sequence is prepended to the CSV
```

## Software dependency

Depending on your operating system and on the software you are using to read/import your CSV you may need to adjust the encoding character and add its corresponding BOM character to your CSV.

<p class="message-warning">Out of the box, <code>League\Csv</code> assumes that your are using a <code>UTF-8</code> encoded CSV without any <code>BOM</code> character.</p>

In the examples below we will be using an existing CSV as a starting point. The code may vary if you are creating the CSV from scratch.

### MS Excel on Windows

On Windows, MS Excel, expects an UTF-8 encoded CSV with its corresponding `BOM` character. To fulfill this requirement, you simply need to add the `UTF-8` `BOM` character if needed as explained below:

```php
use League\Csv\Reader;

require '../vendor/autoload.php';

$reader = Reader::createFromPath('/path/to/my/file.csv', 'r');
$reader->setOutputBOM(Reader::BOM_UTF8);
//BOM detected and adjusted for the output
echo $reader->__toString();
```

### MS Excel on MacOS

On a MacOS system, MS Excel requires a CSV encoded in `UTF-16 LE` using the `tab` character as delimiter. Here's an example on how to meet those requirements using the `League\Csv` package.

```php
use League\Csv\Reader;
use League\Csv\Writer;
use lib\FilterTranscode;

require '../vendor/autoload.php';

//the current CSV is UTF-8 encoded with a ";" delimiter
$origin = Reader::createFromPath(__DIR__.'/data/prenoms.csv');

//let's convert the CSV to use a tab delimiter.

//we must use a real temp file to be able to rewind the cursor file
//without loosing the modifications
$writer = Writer::createFromPath('/tmp/toto.csv', 'w');

//we set the tab as the delimiter character
$writer->setDelimiter("\t");

//we insert csv data
$writer->insertAll($origin);

//let's switch to the Reader object
//Writer::output will failed because of the open mode
$csv = $writer->newReader();

//we register a Stream Filter class to convert the CSV into the UTF-16 LE
stream_filter_register(FilterTranscode::FILTER_NAME."*", "\lib\FilterTranscode");
$csv->appendStreamFilter(FilterTranscode::FILTER_NAME."UTF-8:UTF-16LE");

//we detect and adjust the output BOM to be used
$csv->setOutputBOM(Reader::BOM_UTF16_LE);
//all is good let's output the results
$csv->output('mycsvfile.csv');
```

Of note, we used the [filtering capability](/7.0/filtering) of the library to convert the CSV encoding character from `UTF-8` to `UTF-16 LE`.

You can found the code and the associated filter class in the [examples directory](https://github.com/thephpleague/csv/tree/master/examples).