File: README

package info (click to toggle)
libutf8-all-perl 0.024-3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, forky, sid, trixie
  • size: 320 kB
  • sloc: perl: 836; makefile: 2
file content (157 lines) | stat: -rw-r--r-- 5,502 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
NAME

    utf8::all - turn on Unicode - all of it

VERSION

    version 0.024

SYNOPSIS

        use utf8::all;                      # Turn on UTF-8, all of it.
    
        open my $in, '<', 'contains-utf8';  # UTF-8 already turned on here
        print length 'føø bār';             # 7 UTF-8 characters
        my $utf8_arg = shift @ARGV;         # @ARGV is UTF-8 too (only for main)

DESCRIPTION

    The use utf8 pragma tells the Perl parser to allow UTF-8 in the program
    text in the current lexical scope. This also means that you can now use
    literal Unicode characters as part of strings, variable names, and
    regular expressions.

    utf8::all goes further:

      * charnames are imported so \N{...} sequences can be used to compile
      Unicode characters based on names.

      * On Perl v5.11.0 or higher, the use feature 'unicode_strings' is
      enabled.

      * use feature fc and use feature unicode_eval are enabled on Perl
      5.16.0 and higher.

      * Filehandles are opened with UTF-8 encoding turned on by default
      (including STDIN, STDOUT, and STDERR when utf8::all is used from the
      main package). Meaning that they automatically convert UTF-8 octets
      to characters and vice versa. If you don't want UTF-8 for a
      particular filehandle, you'll have to set binmode $filehandle.

      * @ARGV gets converted from UTF-8 octets to Unicode characters (when
      utf8::all is used from the main package). This is similar to the
      behaviour of the -CA perl command-line switch (see perlrun).

      * readdir, readlink, readpipe (including the qx// and backtick
      operators), and glob (including the <> operator) now all work with
      and return Unicode characters instead of (UTF-8) octets (again only
      when utf8::all is used from the main package).

 Lexical Scope

    The pragma is lexically-scoped, so you can do the following if you had
    some reason to:

        {
            use utf8::all;
            open my $out, '>', 'outfile';
            my $utf8_str = 'føø bār';
            print length $utf8_str, "\n"; # 7
            print $out $utf8_str;         # out as utf8
        }
        open my $in, '<', 'outfile';      # in as raw
        my $text = do { local $/; <$in>};
        print length $text, "\n";         # 10, not 7!

    Instead of lexical scoping, you can also use no utf8::all to turn off
    the effects.

    Note that the effect on @ARGV and the STDIN, STDOUT, and STDERR file
    handles is always global and can not be undone!

 Enabling/Disabling Global Features

    As described above, the default behaviour of utf8::all is to convert
    @ARGV and to open the STDIN, STDOUT, and STDERR file handles with UTF-8
    encoding, and override the readlink and readdir functions and glob
    operators when utf8::all is used from the main package.

    If you want to disable these features even when utf8::all is used from
    the main package, add the option NO-GLOBAL (or LEXICAL-ONLY) to the use
    line. E.g.:

        use utf8::all 'NO-GLOBAL';

    If on the other hand you want to enable these global effects even when
    utf8::all was used from another package than main, use the option
    GLOBAL on the use line:

        use utf8::all 'GLOBAL';

 UTF-8 Errors

    utf8::all will handle invalid code points (i.e., utf-8 that does not
    map to a valid unicode "character"), as a fatal error.

    For glob, readdir, and readlink, one can change this behaviour by
    setting the attribute "$utf8::all::UTF8_CHECK".

ATTRIBUTES

 $utf8::all::UTF8_CHECK

    By default utf8::all marks decoding errors as fatal (default value for
    this setting is Encode::FB_CROAK). If you want, you can change this by
    setting $utf8::all::UTF8_CHECK. The value Encode::FB_WARN reports the
    encoding errors as warnings, and Encode::FB_DEFAULT will completely
    ignore them. Please see Encode for details. Note: Encode::LEAVE_SRC is
    always enforced.

    Important: Only controls the handling of decoding errors in glob,
    readdir, and readlink.

INTERACTION WITH AUTODIE

    If you use autodie, which is a great idea, you need to use at least
    version 2.12, released on June 26, 2012
    <https://metacpan.org/source/PJF/autodie-2.12/Changes#L3>. Otherwise,
    autodie obliterates the IO layers set by the open pragma. See RT #54777
    <https://rt.cpan.org/Ticket/Display.html?id=54777> and GH #7
    <https://github.com/doherty/utf8-all/issues/7>.

BUGS

    Please report any bugs or feature requests on the bugtracker website
    <https://github.com/doherty/utf8-all/issues>.

    When submitting a bug or request, please include a test-file or a patch
    to an existing test-file that illustrates the bug or desired feature.

COMPATIBILITY

    The filesystems of Dos, Windows, and OS/2 do not (fully) support UTF-8.
    The readlink and readdir functions and glob operators will therefore
    not be replaced on these systems.

SEE ALSO

      * File::Find::utf8 for fully utf-8 aware File::Find functions.

      * Cwd::utf8 for fully utf-8 aware Cwd functions.

AUTHORS

      * Michael Schwern <mschwern@cpan.org>

      * Mike Doherty <doherty@cpan.org>

      * Hayo Baan <info@hayobaan.com>

COPYRIGHT AND LICENSE

    This software is copyright (c) 2009 by Michael Schwern
    <mschwern@cpan.org>; he originated it.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.