1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346
|
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!--
Generated from r6rs-lib.tex by tex2page, v 20070803
(running on MzScheme 371, unix),
(c) Dorai Sitaram,
http://www.ccs.neu.edu/~dorai/tex2page/tex2page-doc.html
-->
<head>
<title>
r6rs-lib
</title>
<link rel="stylesheet" type="text/css" href="r6rs-lib-Z-S.css" title=default>
<meta name=robots content="index,follow">
</head>
<body>
<div id=slidecontent>
<div align=right class=navigation>[Go to <span><a href="r6rs-lib.html">first</a>, <a href="r6rs-lib-Z-H-1.html">previous</a></span><span>, <a href="r6rs-lib-Z-H-3.html">next</a></span> page<span>; </span><span><a href="r6rs-lib-Z-H-1.html#node_toc_start">contents</a></span><span><span>; </span><a href="r6rs-lib-Z-H-21.html#node_index_start">index</a></span>]</div>
<p></p>
<a name="node_chap_1"></a>
<h1 class=chapter>
<div class=chapterheading><a href="r6rs-lib-Z-H-1.html#node_toc_node_chap_1">Chapter 1</a></div><br>
<a href="r6rs-lib-Z-H-1.html#node_toc_node_chap_1">Unicode</a></h1>
<p></p>
<p>
</p>
<p>
The procedures exported by the <tt>(rnrs unicode (6))</tt><a name="node_idx_2"></a>library provide access to some aspects
of the Unicode semantics for characters and strings:
category information, case-independent comparisons,
case mappings, and normalization [<a href="r6rs-lib-Z-H-21.html#node_bib_12">12</a>].</p>
<p>
Some of the procedures that operate on characters or strings ignore the
difference between upper case and lower case. These procedures
have “<tt>-ci</tt>” (for “case
insensitive”) embedded in their names.</p>
<p>
</p>
<a name="node_sec_1.1"></a>
<h2 class=section><a href="r6rs-lib-Z-H-1.html#node_toc_node_sec_1.1">1.1 Characters</a></h2>
<p></p>
<p></p>
<div align=left><tt>(<a name="node_idx_4"></a>char-upcase<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_6"></a>char-downcase<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_8"></a>char-titlecase<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_10"></a>char-foldcase<i> char</i>)</tt> procedure </div>
<p>
These procedures take a character argument and return a character
result. If the argument is an upper-case or title-case character, and if
there is a single character that is its lower-case form, then
<tt>char-downcase</tt> returns that character. If the argument is a lower-case
or title-case character, and there is a single character that is
its upper-case form, then <tt>char-upcase</tt> returns that character.
If the argument is a lower-case
or upper-case character, and there is a single character that is
its title-case form, then <tt>char-titlecase</tt> returns that character.
If the argument is not a title-case character and there is no single
character that is its title-case form, then <tt>char-titlecase</tt>
returns the upper-case form of the argument.
Finally, if the character has a case-folded character,
then <tt>char-foldcase</tt> returns that character.
Otherwise the character returned is the same
as the argument.
For Turkic characters <u>I</u> (<tt>#<tt>\</tt>x130</tt>)
and (<tt>#<tt>\</tt>x131</tt>),
<tt>char-foldcase</tt> behaves as the identity function; otherwise
<tt>char-foldcase</tt> is the
same as <tt>char-downcase</tt> composed with <tt>char-upcase</tt>.</p>
<p>
</p>
<tt>(char-upcase <tt>#</tt><tt>\</tt>i) ⇒ <tt>#</tt><tt>\</tt>I<br>
(char-downcase <tt>#</tt><tt>\</tt>i) ⇒ <tt>#</tt><tt>\</tt>i<br>
(char-titlecase <tt>#</tt><tt>\</tt>i) ⇒ <tt>#</tt><tt>\</tt>I<br>
(char-foldcase <tt>#</tt><tt>\</tt>i) ⇒ <tt>#</tt><tt>\</tt>i<br>
<br>
(char-upcase <tt>#</tt><tt>\</tt>ß) ⇒ <tt>#</tt><tt>\</tt>ß<br>
(char-downcase <tt>#</tt><tt>\</tt>ß) ⇒ <tt>#</tt><tt>\</tt>ß<br>
(char-titlecase <tt>#</tt><tt>\</tt>ß) ⇒ <tt>#</tt><tt>\</tt>ß<br>
(char-foldcase <tt>#</tt><tt>\</tt>ß) ⇒ <tt>#</tt><tt>\</tt>ß<br>
<br>
(char-upcase <tt>#</tt><tt>\</tt>Σ) ⇒ <tt>#</tt><tt>\</tt>Σ<br>
(char-downcase <tt>#</tt><tt>\</tt>Σ) ⇒ <tt>#</tt><tt>\</tt>σ<br>
(char-titlecase <tt>#</tt><tt>\</tt>Σ) ⇒ <tt>#</tt><tt>\</tt>Σ<br>
(char-foldcase <tt>#</tt><tt>\</tt>Σ) ⇒ <tt>#</tt><tt>\</tt>σ<br>
<br>
(char-upcase <tt>#</tt><tt>\</tt>ς) ⇒ <tt>#</tt><tt>\</tt>Σ<br>
(char-downcase <tt>#</tt><tt>\</tt>ς) ⇒ <tt>#</tt><tt>\</tt>ς<br>
(char-titlecase <tt>#</tt><tt>\</tt>ς) ⇒ <tt>#</tt><tt>\</tt>Σ<br>
(char-foldcase <tt>#</tt><tt>\</tt>ς) ⇒ <tt>#</tt><tt>\</tt>σ<br>
<p></tt></p>
<p>
</p>
<blockquote><em>Note: </em>
Note that <tt>char-titlecase</tt> does not always return a title-case
character.
</blockquote><p>
</p>
<blockquote><em>Note: </em>
These procedures are consistent with
Unicode’s locale-independent mappings from scalar values to
scalar values for upcase, downcase, titlecase, and case-folding
operations. These mappings can be extracted from <tt>UnicodeData.txt</tt> and <tt>CaseFolding.txt</tt> from the Unicode
Consortium, ignoring Turkic mappings in the latter.<p>
Note that these character-based procedures are an incomplete
approximation to case conversion, even ignoring the user’s locale.
In general, case mappings require the context of a string, both in
arguments and in result. The <tt>string-upcase</tt>, <tt>string-downcase</tt>, <tt>string-titlecase</tt>, and <tt>string-foldcase</tt> procedures (section <a href="#node_sec_1.2">1.2</a>)
perform more general case conversion.
</p>
</blockquote>
<p></p>
<p>
</p>
<p></p>
<div align=left><tt>(<a name="node_idx_12"></a>char-ci=?<i> <i>char<sub>1</sub></i> <i>char<sub>2</sub></i> <i>char<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_14"></a>char-ci<?<i> <i>char<sub>1</sub></i> <i>char<sub>2</sub></i> <i>char<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_16"></a>char-ci>?<i> <i>char<sub>1</sub></i> <i>char<sub>2</sub></i> <i>char<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_18"></a>char-ci<=?<i> <i>char<sub>1</sub></i> <i>char<sub>2</sub></i> <i>char<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_20"></a>char-ci>=?<i> <i>char<sub>1</sub></i> <i>char<sub>2</sub></i> <i>char<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<p>
These procedures are similar to <tt>char=?</tt>, etc., but operate
on the case-folded versions of the characters.</p>
<p>
</p>
<tt>(char-ci<? <tt>#</tt><tt>\</tt>z <tt>#</tt><tt>\</tt>Z) ⇒ <tt>#f</tt><br>
(char-ci=? <tt>#</tt><tt>\</tt>z <tt>#</tt><tt>\</tt>Z) ⇒ <tt>#t</tt><br>
(char-ci=? <tt>#</tt><tt>\</tt>ς <tt>#</tt><tt>\</tt>σ) ⇒ <tt>#t</tt><p></tt>
</p>
<p></p>
<p>
</p>
<p></p>
<div align=left><tt>(<a name="node_idx_22"></a>char-alphabetic?<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_24"></a>char-numeric?<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_26"></a>char-whitespace?<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_28"></a>char-upper-case?<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_30"></a>char-lower-case?<i> char</i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_32"></a>char-title-case?<i> char</i>)</tt> procedure </div>
<p>
These procedures return <tt>#t</tt> if their arguments are alphabetic,
numeric, whitespace, upper-case, lower-case, or title-case characters,
respectively; otherwise they return <tt>#f</tt>.</p>
<p>
A character is alphabetic if it has the Unicode “Alphabetic”
property. A character is numeric if it has the Unicode “Numeric”
property. A character is whitespace if has the Unicode
“White_Space” property.
A character is upper case if it has the Unicode
“Uppercase” property, lower case if it has the “Lowercase”
property, and title case if it is in the Lt general category.</p>
<p>
</p>
<tt>(char-alphabetic? <tt>#</tt><tt>\</tt>a) ⇒ <tt>#t</tt><br>
(char-numeric? <tt>#</tt><tt>\</tt>1) ⇒ <tt>#t</tt><br>
(char-whitespace? <tt>#</tt><tt>\</tt>space) ⇒ <tt>#t</tt><br>
(char-whitespace? <tt>#</tt><tt>\</tt>x00A0) ⇒ <tt>#t</tt><br>
(char-upper-case? <tt>#</tt><tt>\</tt>Σ) ⇒ <tt>#t</tt><br>
(char-lower-case? <tt>#</tt><tt>\</tt>σ) ⇒ <tt>#t</tt><br>
(char-lower-case? <tt>#</tt><tt>\</tt>x00AA) ⇒ <tt>#t</tt><br>
(char-title-case? <tt>#</tt><tt>\</tt>I) ⇒ <tt>#f</tt><br>
(char-title-case? <tt>#</tt><tt>\</tt>x01C5) ⇒ <tt>#t</tt><p></tt>
</p>
<p></p>
<p>
</p>
<p></p>
<div align=left><tt>(<a name="node_idx_34"></a>char-general-category<i> char</i>)</tt> procedure </div>
<p>
Returns a symbol representing the
Unicode general category of <i>char</i>, one of <tt>Lu</tt>, <tt>Ll</tt>, <tt>Lt</tt>,
<tt>Lm</tt>, <tt>Lo</tt>, <tt>Mn</tt>, <tt>Mc</tt>, <tt>Me</tt>, <tt>Nd</tt>, <tt>Nl</tt>,
<tt>No</tt>, <tt>Ps</tt>, <tt>Pe</tt>, <tt>Pi</tt>, <tt>Pf</tt>, <tt>Pd</tt>, <tt>Pc</tt>,
<tt>Po</tt>, <tt>Sc</tt>, <tt>Sm</tt>, <tt>Sk</tt>, <tt>So</tt>, <tt>Zs</tt>, <tt>Zp</tt>,
<tt>Zl</tt>, <tt>Cc</tt>, <tt>Cf</tt>, <tt>Cs</tt>, <tt>Co</tt>, or <tt>Cn</tt>.</p>
<p>
</p>
<tt>(char-general-category #<tt>\</tt>a) ⇒ Ll<br>
(char-general-category #<tt>\</tt>space) <br> ⇒ Zs<br>
(char-general-category #<tt>\</tt>x10FFFF) <br> ⇒ Cn <br>
<p></tt>
</p>
<p></p>
<p>
</p>
<a name="node_sec_1.2"></a>
<h2 class=section><a href="r6rs-lib-Z-H-1.html#node_toc_node_sec_1.2">1.2 Strings</a></h2>
<p></p>
<p></p>
<div align=left><tt>(<a name="node_idx_36"></a>string-upcase<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_38"></a>string-downcase<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_40"></a>string-titlecase<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_42"></a>string-foldcase<i> <i>string</i></i>)</tt> procedure </div>
<p>
These procedures take a string argument and return a string
result. They are defined in terms of Unicode’s locale-independent
case mappings from Unicode scalar-value sequences to scalar-value sequences.
In particular, the length of the result string can be different from
the length of the input string.
When the specified result is equal in the sense of <tt>string=?</tt> to the
argument, these procedures may return the argument instead of a newly
allocated string.</p>
<p>
The <tt>string-upcase</tt> procedure converts a string to upper case;
<tt>string-downcase</tt> converts a string to lower case. The <tt>string-foldcase</tt> procedure converts the string to its case-folded
counterpart, using the full case-folding mapping, but without the
special mappings for Turkic languages. The <tt>string-titlecase</tt>
procedure converts the first cased character of each word via <tt>char-titlecase</tt>, and downcases all other cased characters.</p>
<p>
</p>
<tt>(string-upcase "Hi") ⇒ "HI"<br>
(string-downcase "Hi") ⇒ "hi"<br>
(string-foldcase "Hi") ⇒ "hi"<br>
<br>
(string-upcase "Straße") ⇒ "STRASSE"<br>
(string-downcase "Straße") ⇒ "straße"<br>
(string-foldcase "Straße") ⇒ "strasse"<br>
(string-downcase "STRASSE") ⇒ "strasse"<br>
<br>
(string-downcase "Σ") ⇒ "σ"<br>
<br>
; Chi Alpha Omicron Sigma:<br>
(string-upcase "<i>XAO</i>Σ") ⇒ "<i>XAO</i>Σ" <br>
(string-downcase "<i>XAO</i>Σ") ⇒ "χα<em>o</em>ς"<br>
(string-downcase "<i>XAO</i>ΣΣ") ⇒ "χα<em>o</em>σς"<br>
(string-downcase "<i>XAO</i>Σ Σ") ⇒ "χα<em>o</em>ς σ"<br>
(string-foldcase "<i>XAO</i>ΣΣ") ⇒ "χα<em>o</em>σσ"<br>
(string-upcase "χα<em>o</em>ς") ⇒ "<i>XAO</i>Σ"<br>
(string-upcase "χα<em>o</em>σ") ⇒ "<i>XAO</i>Σ"<br>
<br>
(string-titlecase "kNock KNoCK")<br>
⇒ "Knock Knock"<br>
(string-titlecase "who’s there?")<br>
⇒ "Who’s There?"<br>
(string-titlecase "r6rs") ⇒ "R6Rs"<br>
(string-titlecase "R6RS") ⇒ "R6Rs"<p></tt></p>
<p>
</p>
<blockquote><em>Note: </em>
The case mappings needed for implementing these procedures
can be extracted from <tt>UnicodeData.txt</tt>, <tt>SpecialCasing.txt</tt>, <tt>WordBreakProperty.txt</tt>
(the “MidLetter” property partly defines case-ignorable characters),
and <tt>CaseFolding.txt</tt> from the Unicode Consortium.<p>
Since these procedures are locale-independent, they may not
be appropriate for some locales.
</p>
</blockquote><p>
</p>
<blockquote><em>Note: </em>
Word breaking, as needed for the correct casing of Σ and for
<tt>string-titlecase</tt>, is specified in Unicode Standard Annex
#29 [<a href="r6rs-lib-Z-H-21.html#node_bib_5">5</a>].
</blockquote><p>
</p>
<p></p>
<p>
</p>
<p></p>
<div align=left><tt>(<a name="node_idx_44"></a>string-ci=?<i> <i>string<sub>1</sub></i> <i>string<sub>2</sub></i> <i>string<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_46"></a>string-ci<?<i> <i>string<sub>1</sub></i> <i>string<sub>2</sub></i> <i>string<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_48"></a>string-ci>?<i> <i>string<sub>1</sub></i> <i>string<sub>2</sub></i> <i>string<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_50"></a>string-ci<=?<i> <i>string<sub>1</sub></i> <i>string<sub>2</sub></i> <i>string<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_52"></a>string-ci>=?<i> <i>string<sub>1</sub></i> <i>string<sub>2</sub></i> <i>string<sub>3</sub></i> <tt>...</tt></i>)</tt> procedure </div>
<p>
These procedures are similar to <tt>string=?</tt>, etc., but
operate on the case-folded versions of the strings.</p>
<p>
</p>
<tt>(string-ci<? "z" "Z") ⇒ <tt>#f</tt><br>
(string-ci=? "z" "Z") ⇒ <tt>#t</tt><br>
(string-ci=? "Straße" "Strasse") <br>
⇒ <tt>#t</tt><br>
(string-ci=? "Straße" "STRASSE")<br>
⇒ <tt>#t</tt><br>
(string-ci=? "<i>XAO</i>Σ" "χα<em>o</em>σ")<br>
⇒ <tt>#t</tt><p></tt></p>
<p>
</p>
<p></p>
<p>
</p>
<p></p>
<div align=left><tt>(<a name="node_idx_54"></a>string-normalize-nfd<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_56"></a>string-normalize-nfkd<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_58"></a>string-normalize-nfc<i> <i>string</i></i>)</tt> procedure </div>
<div align=left><tt>(<a name="node_idx_60"></a>string-normalize-nfkc<i> <i>string</i></i>)</tt> procedure </div>
<p>
These procedures take a string argument and return a string
result, which is the input string normalized
to Unicode normalization form D, KD, C, or KC, respectively.
When the specified result is equal in the sense of <tt>string=?</tt> to the
argument, these procedures may return the argument instead of a newly
allocated string.</p>
<p>
</p>
<tt>(string-normalize-nfd "<tt>\</tt>xE9;")<br>
⇒ "<tt>\</tt>x65;<tt>\</tt>x301;"<br>
(string-normalize-nfc "<tt>\</tt>xE9;")<br>
⇒ "<tt>\</tt>xE9;"<br>
(string-normalize-nfd "<tt>\</tt>x65;<tt>\</tt>x301;")<br>
⇒ "<tt>\</tt>x65;<tt>\</tt>x301;"<br>
(string-normalize-nfc "<tt>\</tt>x65;<tt>\</tt>x301;")<br>
⇒ "<tt>\</tt>xE9;"<p></tt>
</p>
<p></p>
<p>
</p>
<p></p>
<div class=smallskip></div>
<p style="margin-top: 0pt; margin-bottom: 0pt">
<div align=right class=navigation>[Go to <span><a href="r6rs-lib.html">first</a>, <a href="r6rs-lib-Z-H-1.html">previous</a></span><span>, <a href="r6rs-lib-Z-H-3.html">next</a></span> page<span>; </span><span><a href="r6rs-lib-Z-H-1.html#node_toc_start">contents</a></span><span><span>; </span><a href="r6rs-lib-Z-H-21.html#node_index_start">index</a></span>]</div>
</p>
<p></p>
</div>
</body>
</html>
|