1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
|
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="generator" content="AsciiDoc 8.6.8">
<title>Unicode</title>
<link rel="stylesheet" href="./asciidoc.css" type="text/css">
<link rel="stylesheet" href="./pygments.css" type="text/css">
<script type="text/javascript" src="./asciidoc.js"></script>
<script type="text/javascript">
/*<![CDATA[*/
asciidoc.install();
/*]]>*/
</script>
<link rel="stylesheet" href="./mlton.css" type="text/css"/>
</head>
<body class="article">
<div id="banner">
<div id="banner-home">
<a href="./Home">MLton 20130715</a>
</div>
</div>
<div id="header">
<h1>Unicode</h1>
</div>
<div id="content">
<div id="preamble">
<div class="sectionbody">
<div class="paragraph"><p>The current release of MLton does not support Unicode. We are working
on adding support.</p></div>
<div class="ulist"><ul>
<li>
<p>
<span class="monospaced">WideChar</span> structure.
</p>
</li>
<li>
<p>
UTF-8 encoded source files.
</p>
</li>
</ul></div>
<div class="paragraph"><p>There is no real support for Unicode in the <a href="DefinitionOfStandardML">Definition</a>;
there are only a few throw-away sentences along the lines of "ASCII
must be a subset of the character set in programs".</p></div>
<div class="paragraph"><p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis Library</a>.
The general consensus (which includes the opinions of the
editors of the Basis Library) is that the <span class="monospaced">WideChar</span> structure is
insufficient for the purposes of Unicode. There is no <span class="monospaced">LargeChar</span>
structure, which in itself is a deficiency, since a programmer can not
program against the largest supported character size.</p></div>
<div class="paragraph"><p>MLton has some preliminary support for 16 and 32 bit characters and
strings. It is even possible to include arbitrary Unicode characters
in 32-bit strings using a <span class="monospaced">\Uxxxxxxxx</span> escape sequence. (This
longer escape sequence is a minor extension over the Definition which
only allows <span class="monospaced">\uxxxx</span>.) This is by no means completely
satisfactory in terms of support for Unicode, but it is what is
currently available.</p></div>
<div class="paragraph"><p>There are periodic flurries of questions and discussion about Unicode
in MLton/SML. In December 2004, there was a discussion that led to
some seemingly sound design decisions. The discussion started at:</p></div>
<div class="literalblock">
<div class="content monospaced">
<pre>http://www.mlton.org/pipermail/mlton/2004-December/026396.html</pre>
</div></div>
<div class="paragraph"><p>There is a good summary of points at:</p></div>
<div class="literalblock">
<div class="content monospaced">
<pre>http://www.mlton.org/pipermail/mlton/2004-December/026440.html</pre>
</div></div>
<div class="paragraph"><p>In November 2005, there was a followup discussion and the beginning of
some coding.</p></div>
<div class="literalblock">
<div class="content monospaced">
<pre>http://www.mlton.org/pipermail/mlton/2005-November/028300.html</pre>
</div></div>
<div class="paragraph"><p>We are optimistic that support will appear in the next MLton release.</p></div>
</div>
</div>
<div class="sect1">
<h2 id="_also_see">Also see</h2>
<div class="sectionbody">
<div class="paragraph"><p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode
documents.</p></div>
</div>
</div>
</div>
<div id="footnotes"><hr></div>
<div id="footer">
<div id="footer-text">
</div>
<div id="footer-badges">
</div>
</div>
</body>
</html>
|