File: README.multibyte

package info (click to toggle)
gawk 1%3A3.1.4-2
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 11,452 kB
  • ctags: 4,563
  • sloc: ansic: 39,418; sh: 5,349; awk: 4,898; yacc: 2,872; makefile: 1,717; sed: 18
file content (22 lines) | stat: -rw-r--r-- 896 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Wed Jun 18 16:47:31 IDT 2003
============================

Multibyte locales can cause occasional weirdness, in particular with
ranges inside brackets: /[....]/.  Something that works great for ASCII
will choke for, e.g., en_US.UTF-8.  One such program is test/gsubtst5.awk.

By default, the test suite runs with LC_ALL=C and LANG=C. You
can change this by doing (from a Bourne-style shell):

	$ GAWKLOCALE=some_locale make check

Then the test suite will set LC_ALL and LANG to the given locale.

As of this writing, this works for en_US.UTF-8, and all tests
pass except gsubtst5.

For the normal case of RS = "\n", the locale is largely irrelevant.
For other single byte record separators, using LC_ALL=C will give you
much better performance when reading records.  Otherwise, gawk has to
make several function calls, *per input character* to find the record
terminator.  You have been warned.