1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
|
# NAME
utf8::all - turn on Unicode - all of it
# VERSION
version 0.024
# SYNOPSIS
use utf8::all; # Turn on UTF-8, all of it.
open my $in, '<', 'contains-utf8'; # UTF-8 already turned on here
print length 'føø bār'; # 7 UTF-8 characters
my $utf8_arg = shift @ARGV; # @ARGV is UTF-8 too (only for main)
# DESCRIPTION
The `use utf8` pragma tells the Perl parser to allow UTF-8 in the
program text in the current lexical scope. This also means that you
can now use literal Unicode characters as part of strings, variable
names, and regular expressions.
`utf8::all` goes further:
- [`charnames`](https://metacpan.org/pod/charnames) are imported so `\N{...}` sequences can be
used to compile Unicode characters based on names.
- On Perl `v5.11.0` or higher, the `use feature 'unicode_strings'` is
enabled.
- `use feature fc` and `use feature unicode_eval` are enabled on Perl
`5.16.0` and higher.
- Filehandles are opened with UTF-8 encoding turned on by default
(including `STDIN`, `STDOUT`, and `STDERR` when `utf8::all` is
used from the `main` package). Meaning that they automatically
convert UTF-8 octets to characters and vice versa. If you _don't_
want UTF-8 for a particular filehandle, you'll have to set `binmode
$filehandle`.
- `@ARGV` gets converted from UTF-8 octets to Unicode characters (when
`utf8::all` is used from the `main` package). This is similar to the
behaviour of the `-CA` perl command-line switch (see [perlrun](https://metacpan.org/pod/perlrun)).
- `readdir`, `readlink`, `readpipe` (including the `qx//` and
backtick operators), and [`glob`](https://metacpan.org/pod/perlfunc#glob) (including the `<>` operator) now all work with and return Unicode characters
instead of (UTF-8) octets (again only when `utf8::all` is used from
the `main` package).
## Lexical Scope
The pragma is lexically-scoped, so you can do the following if you had
some reason to:
{
use utf8::all;
open my $out, '>', 'outfile';
my $utf8_str = 'føø bār';
print length $utf8_str, "\n"; # 7
print $out $utf8_str; # out as utf8
}
open my $in, '<', 'outfile'; # in as raw
my $text = do { local $/; <$in>};
print length $text, "\n"; # 10, not 7!
Instead of lexical scoping, you can also use `no utf8::all` to turn
off the effects.
Note that the effect on `@ARGV` and the `STDIN`, `STDOUT`, and
`STDERR` file handles is always global and can not be undone!
## Enabling/Disabling Global Features
As described above, the default behaviour of `utf8::all` is to
convert `@ARGV` and to open the `STDIN`, `STDOUT`, and `STDERR`
file handles with UTF-8 encoding, and override the `readlink` and
`readdir` functions and `glob` operators when `utf8::all` is used
from the `main` package.
If you want to disable these features even when `utf8::all` is used
from the `main` package, add the option `NO-GLOBAL` (or
`LEXICAL-ONLY`) to the use line. E.g.:
use utf8::all 'NO-GLOBAL';
If on the other hand you want to enable these global effects even when
`utf8::all` was used from another package than `main`, use the
option `GLOBAL` on the use line:
use utf8::all 'GLOBAL';
## UTF-8 Errors
`utf8::all` will handle invalid code points (i.e., utf-8 that does
not map to a valid unicode "character"), as a fatal error.
For `glob`, `readdir`, and `readlink`, one can change this
behaviour by setting the attribute ["$utf8::all::UTF8\_CHECK"](#utf8-all-utf8_check).
# ATTRIBUTES
## $utf8::all::UTF8\_CHECK
By default `utf8::all` marks decoding errors as fatal (default value
for this setting is `Encode::FB_CROAK`). If you want, you can change this by
setting `$utf8::all::UTF8_CHECK`. The value `Encode::FB_WARN` reports
the encoding errors as warnings, and `Encode::FB_DEFAULT` will completely
ignore them. Please see [Encode](https://metacpan.org/pod/Encode) for details. Note: `Encode::LEAVE_SRC` is
_always_ enforced.
Important: Only controls the handling of decoding errors in `glob`,
`readdir`, and `readlink`.
# INTERACTION WITH AUTODIE
If you use [autodie](https://metacpan.org/pod/autodie), which is a great idea, you need to use at least
version **2.12**, released on [June 26,
2012](https://metacpan.org/source/PJF/autodie-2.12/Changes#L3).
Otherwise, autodie obliterates the IO layers set by the [open](https://metacpan.org/pod/open)
pragma. See [RT
\#54777](https://rt.cpan.org/Ticket/Display.html?id=54777) and [GH
\#7](https://github.com/doherty/utf8-all/issues/7).
# BUGS
Please report any bugs or feature requests on the bugtracker
[website](https://github.com/doherty/utf8-all/issues).
When submitting a bug or request, please include a test-file or a
patch to an existing test-file that illustrates the bug or desired
feature.
# COMPATIBILITY
The filesystems of Dos, Windows, and OS/2 do not (fully) support
UTF-8. The `readlink` and `readdir` functions and `glob` operators
will therefore not be replaced on these systems.
# SEE ALSO
- [File::Find::utf8](https://metacpan.org/pod/File::Find::utf8) for fully utf-8 aware File::Find functions.
- [Cwd::utf8](https://metacpan.org/pod/Cwd::utf8) for fully utf-8 aware Cwd functions.
# AUTHORS
- Michael Schwern <mschwern@cpan.org>
- Mike Doherty <doherty@cpan.org>
- Hayo Baan <info@hayobaan.com>
# COPYRIGHT AND LICENSE
This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.
|