1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196
|
# $Id: UTF8.pm,v 1.21 2002/02/27 09:26:48 m-kasahr Exp $
#
# Copyright (c) 2000 Japan Network Information Center. All rights reserved.
#
# By using this file, you agree to the terms and conditions set forth bellow.
#
# LICENSE TERMS AND CONDITIONS
#
# The following License Terms and Conditions apply, unless a different
# license is obtained from Japan Network Information Center ("JPNIC"),
# a Japanese association, Fuundo Bldg., 1-2 Kanda Ogawamachi, Chiyoda-ku,
# Tokyo, Japan.
#
# 1. Use, Modification and Redistribution (including distribution of any
# modified or derived work) in source and/or binary forms is permitted
# under this License Terms and Conditions.
#
# 2. Redistribution of source code must retain the copyright notices as they
# appear in each source code file, this License Terms and Conditions.
#
# 3. Redistribution in binary form must reproduce the Copyright Notice,
# this License Terms and Conditions, in the documentation and/or other
# materials provided with the distribution. For the purposes of binary
# distribution the "Copyright Notice" refers to the following language:
# "Copyright (c) Japan Network Information Center. All rights reserved."
#
# 4. Neither the name of JPNIC may be used to endorse or promote products
# derived from this Software without specific prior written approval of
# JPNIC.
#
# 5. Disclaimer/Limitation of Liability: THIS SOFTWARE IS PROVIDED BY JPNIC
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL JPNIC BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
#
# 6. Indemnification by Licensee
# Any person or entities using and/or redistributing this Software under
# this License Terms and Conditions shall defend indemnify and hold
# harmless JPNIC from and against any and all judgements damages,
# expenses, settlement liabilities, cost and other liabilities of any
# kind as a result of use and redistribution of this Software or any
# claim, suite, action, litigation or proceeding by any third party
# arising out of or relates to this License Terms and Conditions.
#
# 7. Governing Law, Jurisdiction and Venue
# This License Terms and Conditions shall be governed by and and
# construed in accordance with the law of Japan. Any person or entities
# using and/or redistributing this Software under this License Terms and
# Conditions hereby agrees and consent to the personal and exclusive
# jurisdiction and venue of Tokyo District Court of Japan.
#
package MDN::UTF8;
use strict;
use vars qw($VERSION @ISA @EXPORT @EXPORT_OK);
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
# Items to export into callers namespace by default. Note: do not export
# names by default without a very good reason. Use EXPORT_OK instead.
# Do not simply export all your public functions/methods/constants.
@EXPORT = qw();
$VERSION = '2.4';
bootstrap MDN::UTF8 $VERSION;
# Preloaded methods go here.
sub mblen {
my ($package_name, $string) = @_;
my ($wc, $length);
if (($wc, $length) = $package_name->getwc($string)) {
return $length;
}
return 0;
}
# Autoload methods go after =cut, and are processed by the autosplit program.
1;
__END__
=head1 NAME
MDN::UTF8 - Perl extension for libmdn utf8 module.
=head1 SYNOPSIS
use MDN::UTF8;
$length = MDN::UTF8->mblen($utf8_string);
@ucs4_characters = MDN::UTF8->unpack($utf8_string);
$utf8_string = MDN::UTF8->pack(@ucs4_characters);
die if (!MDN::UTF8->isvalid($utf8_string));
=head1 DESCRIPTION
C<MDN::UTF8> provides a Perl interface to UTF-8 utility
module of the MDN library (a C library for handling
multilingual domain names) in the mDNkit.
=head1 CLASS METHODS
Although this module does not provide object interface,
all the functions should be called as class methods,
in order to be consistent with other modules in C<MDN::>.
MDN::UTF8->mblen($string); # OK
MDN::UTF8::mblen($string); # NG
=over 4
=item mblen($utf8_string)
Returns the length (in bytes) of the first character of C<$utf8_string>.
If the character is not a valid UTF-8 character, this method returns 0.
=item getwc($utf8_string)
Inspects the first character of C<$utf8_string>, and resturns the
result as a list with two elements.
The first elemnt of the list is the integer code value of the character
in the form of UCS-4, and the second is the length (in bytes) of the
character in the form of UTF-8.
($wc, $length) = MDN::UTF8->getwc($string);
The value of the second element is the same as the one retruned from
C<mblen()>.
If the character is not a valid UTF-8 character, this method returns
an empty list.
Note that it also returns an empty list for an empty UTF-8 string.
=item unpack($utf8_string)
Unpacks C<$utf8_string> into a list of UCS-4 characters, and
returns the list of integer code values of them.
An empty list is returned if C<$utf8_string> contains an invalid
character or C<$utf8_string> is empty.
=item pack(@ucs4_characters)
Packs a list of UCS-4 characters into an UTF-8 string, and returns
the string. This is the reverse of C<unpack> method above.
If C<@ucs4_characters> contains an invalid UCS-4 character, it returns
C<undef>.
=item isvalid($utf8_string)
Checks if C<$utf8_string> is a valid UTF-8 encoded string.
Returns 1 if it is valid, 0 otherwise.
=back
=head1 ISSUE OF HANDLING UNICODE CHARACTERS
Beginning with version 5.6, Perl supports Unicode character, but the
implementation is incomplete and highly experimental.
Perl provides the `character' and `byte' semantics.
In the character semantics, an Unicode character is recognized as a
character even if that occupies two or more bytes.
In the byte semantics, Unicode character is recognized as a sequence
of bytes.
Some Perl operators changes theier behaviors according with the
semantics, and Perl decides whether an operator uses the character
or bytes semantics based on whether input data is byte or character
data.
For example, a string literal which contains C<\x{304B}> (Unicode
character U+304B) is recognized as character data.
Also the MDN modules dealing with UTF-8.
If you don't have special reason to use the character semantics, or
you aren't familier with the character semantics, we recommend you to
use C<bytes> pragmra:
use bytes;
That forces the byte semantics everywhere in your program.
See L<perlunicode> and L<perlbytes> for more details about this issue.
=head1 SEE ALSO
MDN library specification, L<perlunicode>, L<perlbytes>
=cut
|