File: Unicode

package info (click to toggle)
mlton 20100608-5.1
  • links: PTS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 36,628 kB
  • ctags: 70,047
  • sloc: ansic: 18,441; lisp: 2,879; makefile: 1,572; sh: 1,326; pascal: 256; asm: 97
file content (128 lines) | stat: -rw-r--r-- 4,292 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta name="robots" content="index,nofollow">



<title>Unicode - MLton Standard ML Compiler (SML Compiler)</title>
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="all" href="common.css">
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="screen" href="screen.css">
<link rel="stylesheet" type="text/css" charset="iso-8859-1" media="print" href="print.css">


<link rel="Start" href="Home">


</head>

<body lang="en" dir="ltr">

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "UA-833377-1";
urchinTracker();
</script>
<table bgcolor = lightblue cellspacing = 0 style = "border: 0px;" width = 100%>
  <tr>
    <td style = "
		border: 0px;
		color: darkblue; 
		font-size: 150%;
		text-align: left;">
      <a class = mltona href="Home">MLton MLTONWIKIVERSION</a>
    <td style = "
		border: 0px;
		font-size: 150%;
		text-align: center;
		width: 50%;">
      Unicode
    <td style = "
		border: 0px;
		text-align: right;">
      <table cellspacing = 0 style = "border: 0px">
        <tr style = "vertical-align: middle;">
      </table>
  <tr style = "background-color: white;">
    <td colspan = 3
	style = "
		border: 0px;
		font-size:70%;
		text-align: right;">
      <a href = "Home">Home</a>
      &nbsp;<a href = "TitleIndex">Index</a>
      &nbsp;
</table>
<div id="content" lang="en" dir="ltr">
The current release of MLton does not support Unicode.  We are working on adding support. 
    <ul>

    <li>
<p>
 <tt>WideChar</tt> structure. 
</p>
</li>
    <li>
<p>
 UTF-8 encoded source files. 
</p>
</li>

    </ul>


<p>
There is no real support for Unicode in the <a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away sentences along the lines of "ASCII must be a subset of the character set in programs". 
</p>
<p>
Neither is there real support for Unicode in the <a href="BasisLibrary">Basis Library</a>. The general consensus (which includes the opinions of the editors of the Basis Library) is that the <tt>WideChar</tt> structure is insufficient for the purposes of Unicode.  There is no <tt>LargeChar</tt> structure, which in itself is a deficiency, since a programmer can not program against the largest supported character size. 
</p>
<p>
MLton has some preliminary support for 16 and 32 bit characters and strings.  It is even possible to include arbitrary Unicode characters in 32-bit strings using a <tt>\Uxxxxxxxx</tt> escape sequence.  (This longer escape sequence is a minor extension over the Definition which only allows <tt>\uxxxx</tt>.)  This is by no means completely satisfactory in terms of support for Unicode, but it is what is currently available. 
</p>
<p>
There are periodic flurries of questions and discussion about Unicode in MLton/SML.  In December 2004, there was a discussion that led to some seemingly sound design decisions.  The discussion started at: 
</p>

            <ul>

   <a href="http://mlton.org/pipermail/mlton/2004-December/026396.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2004-December/026396.html</a> 
            </ul>


<p>
There is a good summary of points at: 
</p>

            <ul>

   <a href="http://mlton.org/pipermail/mlton/2004-December/026440.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2004-December/026440.html</a> 
            </ul>


<p>
In November 2005, there was a followup discussion and the beginning of some coding. 
</p>

        <ul>

  <a href="http://mlton.org/pipermail/mlton/2005-November/028300.html"><img src="moin-www.png" alt="[WWW]" height="11" width="11">http://mlton.org/pipermail/mlton/2005-November/028300.html</a> 
        </ul>


<p>
We are optimistic that support will appear in the next MLton release. 
</p>
<h2 id="head-a4bc8bf5caf54b18cea9f58e83dd4acb488deb17">Also see</h2>
<p>
The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode documents. 
</p>
</div>



<p>
<hr>
Last edited on 2007-08-15 22:07:35 by <span title="fenrir.uchicago.edu"><a href="MatthewFluet">MatthewFluet</a></span>.
</body></html>