File: README.mkdn

package info (click to toggle)
libutf8-all-perl 0.024-3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 320 kB
  • sloc: perl: 836; makefile: 2
file content (151 lines) | stat: -rw-r--r-- 5,582 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# NAME

utf8::all - turn on Unicode - all of it

# VERSION

version 0.024

# SYNOPSIS

    use utf8::all;                      # Turn on UTF-8, all of it.

    open my $in, '<', 'contains-utf8';  # UTF-8 already turned on here
    print length 'føø bār';             # 7 UTF-8 characters
    my $utf8_arg = shift @ARGV;         # @ARGV is UTF-8 too (only for main)

# DESCRIPTION

The `use utf8` pragma tells the Perl parser to allow UTF-8 in the
program text in the current lexical scope. This also means that you
can now use literal Unicode characters as part of strings, variable
names, and regular expressions.

`utf8::all` goes further:

- [`charnames`](https://metacpan.org/pod/charnames) are imported so `\N{...}` sequences can be
used to compile Unicode characters based on names.
- On Perl `v5.11.0` or higher, the `use feature 'unicode_strings'` is
enabled.
- `use feature fc` and `use feature unicode_eval` are enabled on Perl
`5.16.0` and higher.
- Filehandles are opened with UTF-8 encoding turned on by default
(including `STDIN`, `STDOUT`, and `STDERR` when `utf8::all` is
used from the `main` package). Meaning that they automatically
convert UTF-8 octets to characters and vice versa. If you _don't_
want UTF-8 for a particular filehandle, you'll have to set `binmode
$filehandle`.
- `@ARGV` gets converted from UTF-8 octets to Unicode characters (when
`utf8::all` is used from the `main` package). This is similar to the
behaviour of the `-CA` perl command-line switch (see [perlrun](https://metacpan.org/pod/perlrun)).
- `readdir`, `readlink`, `readpipe` (including the `qx//` and
backtick operators), and [`glob`](https://metacpan.org/pod/perlfunc#glob) (including the `<>` operator) now all work with and return Unicode characters
instead of (UTF-8) octets (again only when `utf8::all` is used from
the `main` package).

## Lexical Scope

The pragma is lexically-scoped, so you can do the following if you had
some reason to:

    {
        use utf8::all;
        open my $out, '>', 'outfile';
        my $utf8_str = 'føø bār';
        print length $utf8_str, "\n"; # 7
        print $out $utf8_str;         # out as utf8
    }
    open my $in, '<', 'outfile';      # in as raw
    my $text = do { local $/; <$in>};
    print length $text, "\n";         # 10, not 7!

Instead of lexical scoping, you can also use `no utf8::all` to turn
off the effects.

Note that the effect on `@ARGV` and the `STDIN`, `STDOUT`, and
`STDERR` file handles is always global and can not be undone!

## Enabling/Disabling Global Features

As described above, the default behaviour of `utf8::all` is to
convert `@ARGV` and to open the `STDIN`, `STDOUT`, and `STDERR`
file handles with UTF-8 encoding, and override the `readlink` and
`readdir` functions and `glob` operators when `utf8::all` is used
from the `main` package.

If you want to disable these features even when `utf8::all` is used
from the `main` package, add the option `NO-GLOBAL` (or
`LEXICAL-ONLY`) to the use line. E.g.:

    use utf8::all 'NO-GLOBAL';

If on the other hand you want to enable these global effects even when
`utf8::all` was used from another package than `main`, use the
option `GLOBAL` on the use line:

    use utf8::all 'GLOBAL';

## UTF-8 Errors

`utf8::all` will handle invalid code points (i.e., utf-8 that does
not map to a valid unicode "character"), as a fatal error.

For `glob`, `readdir`, and `readlink`, one can change this
behaviour by setting the attribute ["$utf8::all::UTF8\_CHECK"](#utf8-all-utf8_check).

# ATTRIBUTES

## $utf8::all::UTF8\_CHECK

By default `utf8::all` marks decoding errors as fatal (default value
for this setting is `Encode::FB_CROAK`). If you want, you can change this by
setting `$utf8::all::UTF8_CHECK`. The value `Encode::FB_WARN` reports
the encoding errors as warnings, and `Encode::FB_DEFAULT` will completely
ignore them. Please see [Encode](https://metacpan.org/pod/Encode) for details. Note: `Encode::LEAVE_SRC` is
_always_ enforced.

Important: Only controls the handling of decoding errors in `glob`,
`readdir`, and `readlink`.

# INTERACTION WITH AUTODIE

If you use [autodie](https://metacpan.org/pod/autodie), which is a great idea, you need to use at least
version **2.12**, released on [June 26,
2012](https://metacpan.org/source/PJF/autodie-2.12/Changes#L3).
Otherwise, autodie obliterates the IO layers set by the [open](https://metacpan.org/pod/open)
pragma. See [RT
\#54777](https://rt.cpan.org/Ticket/Display.html?id=54777) and [GH
\#7](https://github.com/doherty/utf8-all/issues/7).

# BUGS

Please report any bugs or feature requests on the bugtracker
[website](https://github.com/doherty/utf8-all/issues).

When submitting a bug or request, please include a test-file or a
patch to an existing test-file that illustrates the bug or desired
feature.

# COMPATIBILITY

The filesystems of Dos, Windows, and OS/2 do not (fully) support
UTF-8. The `readlink` and `readdir` functions and `glob` operators
will therefore not be replaced on these systems.

# SEE ALSO

- [File::Find::utf8](https://metacpan.org/pod/File::Find::utf8) for fully utf-8 aware File::Find functions.
- [Cwd::utf8](https://metacpan.org/pod/Cwd::utf8) for fully utf-8 aware Cwd functions.

# AUTHORS

- Michael Schwern <mschwern@cpan.org>
- Mike Doherty <doherty@cpan.org>
- Hayo Baan <info@hayobaan.com>

# COPYRIGHT AND LICENSE

This software is copyright (c) 2009 by Michael Schwern <mschwern@cpan.org>; he originated it.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.