
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<TITLE>AARM95 - Character Set</TITLE>
<META NAME="Author" CONTENT="JTC1/SC22/WG9/ARG, by Randall Brukardt, ARG Editor">
<META NAME="GENERATOR" CONTENT="Arm_Form.Exe, Ada Reference Manual generator">
<STYLE type="text/css">
DIV.paranum {position: absolute; font-family: Arial, Helvetica, sans-serif; left: 0.5 em; top: auto}
TT {font-family: "Courier New", monospace}
DT {display: compact}
DIV.Normal {font-family: "Times New Roman", Times, serif; margin-bottom: 0.6em}
DIV.Wide {font-family: "Times New Roman", Times, serif; margin-top: 0.6em; margin-bottom: 0.6em}
DIV.Annotations {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-bottom: 0.6em}
DIV.WideAnnotations {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-top: 0.6em; margin-bottom: 0.6em}
DIV.Index {font-family: "Times New Roman", Times, serif}
DIV.SyntaxSummary {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-bottom: 0.4em}
DIV.Notes {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-bottom: 0.6em}
DIV.NotesHeader {font-family: "Times New Roman", Times, serif; margin-left: 2.0em}
DIV.SyntaxIndented {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-bottom: 0.4em}
DIV.Indented {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-bottom: 0.6em}
DIV.CodeIndented {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-bottom: 0.6em}
DIV.SmallIndented {font-family: "Times New Roman", Times, serif; margin-left: 10.0em; margin-bottom: 0.6em}
DIV.SmallCodeIndented {font-family: "Times New Roman", Times, serif; margin-left: 8.0em; margin-bottom: 0.6em}
DIV.Examples {font-family: "Courier New", monospace; margin-left: 2.0em; margin-bottom: 0.6em}
DIV.SmallExamples {font-family: "Courier New", monospace; font-size: 80%; margin-left: 7.5em; margin-bottom: 0.6em}
DIV.IndentedExamples {font-family: "Courier New", monospace; margin-left: 8.0em; margin-bottom: 0.6em}
DIV.SmallIndentedExamples {font-family: "Courier New", monospace; font-size: 80%; margin-left: 15.0em; margin-bottom: 0.6em}
UL.Bulleted {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-right: 2.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.SmallBulleted {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-right: 6.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.NestedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-right: 4.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.SmallNestedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 8.0em; margin-right: 8.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.IndentedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 8.0em; margin-right: 8.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.CodeIndentedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-right: 6.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.CodeIndentedNestedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 8.0em; margin-right: 8.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.SyntaxIndentedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-right: 4.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.NotesBulleted {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-right: 4.0em; margin-top: 0em; margin-bottom: 0.5em}
UL.NotesNestedBulleted {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-right: 6.0em; margin-top: 0em; margin-bottom: 0.5em}
DL.Hanging {font-family: "Times New Roman", Times, serif; margin-top: 0em; margin-bottom: 0.6em}
DD.Hanging {margin-left: 6.0em}
DL.IndentedHanging {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-top: 0em; margin-bottom: 0.6em}
DD.IndentedHanging {margin-left: 2.0em}
DL.HangingInBulleted {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-right: 2.0em; margin-top: 0em; margin-bottom: 0.5em}
DD.HangingInBulleted {margin-left: 4.0em}
DL.SmallHanging {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-top: 0em; margin-bottom: 0.6em}
DD.SmallHanging {margin-left: 7.5em}
DL.SmallIndentedHanging {font-family: "Times New Roman", Times, serif; margin-left: 8.0em; margin-top: 0em; margin-bottom: 0.6em}
DD.SmallIndentedHanging {margin-left: 2.0em}
DL.SmallHangingInBulleted {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-right: 6.0em; margin-top: 0em; margin-bottom: 0.5em}
DD.SmallHangingInBulleted {margin-left: 5.0em}
DL.Enumerated {font-family: "Times New Roman", Times, serif; margin-right: 0.0em; margin-top: 0em; margin-bottom: 0.5em}
DD.Enumerated {margin-left: 2.0em}
DL.SmallEnumerated {font-family: "Times New Roman", Times, serif; margin-left: 4.0em; margin-right: 4.0em; margin-top: 0em; margin-bottom: 0.5em}
DD.SmallEnumerated {margin-left: 2.5em}
DL.NestedEnumerated {font-family: "Times New Roman", Times, serif; margin-left: 2.0em; margin-right: 2.0em; margin-top: 0em; margin-bottom: 0.5em}
DL.SmallNestedEnumerated {font-family: "Times New Roman", Times, serif; margin-left: 6.0em; margin-right: 6.0em; margin-top: 0em; margin-bottom: 0.5em}
</STYLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFF0" LINK="#0000FF" VLINK="#800080" ALINK="#FF0000">
<P><A HREF="AA-TOC.html">Contents</A> <A HREF="AA-0-29.html">Index</A> <A HREF="AA-2.html">Previous</A> <A HREF="AA-2-2.html">Next</A></P>
<HR>
<H1> 2.1 Character Set</H1>
<DIV Class="Paranum"><FONT SIZE=-2>1</FONT></DIV>
<DIV Class="Normal"> <A NAME="I1143"></A>The only characters allowed
outside of <FONT FACE="Arial, Helvetica">comment</FONT>s are the <FONT FACE="Arial, Helvetica">graphic_character</FONT>s
and <FONT FACE="Arial, Helvetica">format_effector</FONT>s. </DIV>
<DIV Class="Paranum"><FONT SIZE=-2>1.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Ramification: </B>Any character,
including an <FONT FACE="Arial, Helvetica">other_control_function</FONT>,
is allowed in a comment.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>1.b</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1>Note that this rule doesn't really
have much force, since the implementation can represent characters in
the source in any way it sees fit. For example, an implementation could
simply define that what seems to be a non-graphic, non-format-effector
character is actually a representation of the space character. </FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>1.c</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Discussion: </B>It is our intent
to follow the terminology of ISO 10646 BMP where appropriate, and to
remain compatible with the character classifications defined in <A HREF="AA-A-3.html">A.3</A>,
``<A HREF="AA-A-3.html">Character Handling</A>''. Note that our definition
for <FONT FACE="Arial, Helvetica">graphic_character</FONT> is more inclusive
than that of ISO 10646-1. </FONT></DIV>
<H4 ALIGN=CENTER>Syntax</H4>
<DIV Class="Paranum"><FONT SIZE=-2>2</FONT></DIV>
<DIV Class="SyntaxIndented"><FONT FACE="Arial, Helvetica">character<A NAME="I1144"></A>
::= </FONT><A NAME="I1145"></A><FONT FACE="Arial, Helvetica">graphic_character</FONT> | <A NAME="I1146"></A><FONT FACE="Arial, Helvetica">format_effector</FONT> | <A NAME="I1147"></A><FONT FACE="Arial, Helvetica">other_control_function</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>3</FONT></DIV>
<DIV Class="SyntaxIndented"><FONT FACE="Arial, Helvetica">graphic_character<A NAME="I1148"></A>
::= </FONT><A NAME="I1149"></A><FONT FACE="Arial, Helvetica">identifier_letter</FONT> | <A NAME="I1150"></A><FONT FACE="Arial, Helvetica">digit</FONT> | <A NAME="I1151"></A><FONT FACE="Arial, Helvetica">space_character</FONT> | <A NAME="I1152"></A><FONT FACE="Arial, Helvetica">special_character</FONT></DIV>
<H4 ALIGN=CENTER>Static Semantics</H4>
<DIV Class="Paranum"><FONT SIZE=-2>4</FONT></DIV>
<DIV Class="Normal"> The character repertoire for the text of an Ada
program consists of the collection of characters called the Basic Multilingual
Plane (BMP) of the ISO 10646 Universal Multiple-Octet Coded Character
Set, plus a set of <FONT FACE="Arial, Helvetica">format_effector</FONT>s
and, in comments only, a set of <FONT FACE="Arial, Helvetica">other_control_function</FONT>s;
the coded representation for these characters is implementation defined
[(it need not be a representation defined within ISO-10646-1)]. </DIV>
<DIV Class="Paranum"><FONT SIZE=-2>4.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Implementation defined: </B>The
coded representation for the text of an Ada program.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>5</FONT></DIV>
<DIV Class="Normal"> The description of the language definition in
this International Standard uses the graphic symbols defined for Row
00: Basic Latin and Row 00: Latin-1 Supplement of the ISO 10646 BMP;
these correspond to the graphic symbols of ISO 8859-1 (Latin-1); no graphic
symbols are used in this International Standard for characters outside
of Row 00 of the BMP. The actual set of graphic symbols used by an implementation
for the visual representation of the text of an Ada program is not specified.
<A NAME="I1153"></A></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>6</FONT></DIV>
<DIV Class="Normal" Style="margin-bottom: 0.4em"> The categories of
characters are defined as follows: </DIV>
<DIV Class="Paranum"><FONT SIZE=-2>7</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1154"></A><FONT FACE="Arial, Helvetica">identifier_letter</FONT><DD Class="Hanging">
<FONT FACE="Arial, Helvetica">upper_case_identifier_letter</FONT> | <FONT FACE="Arial, Helvetica">lower_case_identifier_letter</FONT>
</DL>
<DIV Class="Paranum"><FONT SIZE=-2>7.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Discussion: </B>We use <FONT FACE="Arial, Helvetica">identifier_letter</FONT>
instead of simply <FONT FACE="Arial, Helvetica">letter</FONT> because
ISO 10646 BMP includes many other characters that would generally be
considered "letters." </FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>8</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1155"></A><FONT FACE="Arial, Helvetica">upper_case_identifier_letter</FONT><DD Class="Hanging">
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Capital
Letter''.</DL>
<DIV Class="Paranum"><FONT SIZE=-2>9</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1156"></A><FONT FACE="Arial, Helvetica">lower_case_identifier_letter</FONT><DD Class="Hanging">
Any character of Row 00 of ISO 10646 BMP whose name begins ``Latin Small
Letter''. </DL>
<DIV Class="Paranum"><FONT SIZE=-2>9.a/1</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><FONT SIZE=-1><I>This paragraph
was deleted.</I></FONT><B>To be honest: </B>{<I><A HREF="defect1.html#8652/0001">8652/0001</A></I>}
<S>The above rules do not include the ligatures Æ and æ.
However, the intent is to include these characters as identifier letters.
This problem was pointed out by a comment from the Netherlands.</S> </FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>10</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1157"></A><FONT FACE="Arial, Helvetica">digit</FONT><DD Class="Hanging">
One of the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9.</DL>
<DIV Class="Paranum"><FONT SIZE=-2>11</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1158"></A><FONT FACE="Arial, Helvetica">space_character</FONT><DD Class="Hanging">
The character of ISO 10646 BMP named ``Space''.</DL>
<DIV Class="Paranum"><FONT SIZE=-2>12</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1159"></A><FONT FACE="Arial, Helvetica">special_character</FONT><DD Class="Hanging">
Any character of the ISO 10646 BMP that is not reserved for a control
function, and is not the <FONT FACE="Arial, Helvetica">space_character</FONT>,
an <FONT FACE="Arial, Helvetica">identifier_letter</FONT>, or a <FONT FACE="Arial, Helvetica">digit</FONT>.
</DL>
<DIV Class="Paranum"><FONT SIZE=-2>12.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Ramification: </B>Note that
the no break space and soft hyphen are <FONT FACE="Arial, Helvetica">special_character</FONT>s,
and therefore <FONT FACE="Arial, Helvetica">graphic_character</FONT>s.
They are not the same characters as space and hyphen-minus. </FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>13</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1160"></A><FONT FACE="Arial, Helvetica">format_effector</FONT><DD Class="Hanging">
The control functions of ISO 6429 called character tabulation (HT), line
tabulation (VT), carriage return (CR), line feed (LF), and form feed
(FF). <A NAME="I1161"></A></DL>
<DIV Class="Paranum"><FONT SIZE=-2>14</FONT></DIV>
<DL Class="Hanging"><DT> <A NAME="I1162"></A><FONT FACE="Arial, Helvetica">other_control_function</FONT><DD Class="Hanging">
Any control function, other than a <FONT FACE="Arial, Helvetica">format_effector</FONT>,
that is allowed in a comment; the set of <FONT FACE="Arial, Helvetica">other_control_function</FONT>s
allowed in comments is implementation defined. <A NAME="I1163"></A></DL>
<DIV Class="Paranum"><FONT SIZE=-2>14.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Implementation defined: </B>The
control functions allowed in comments.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>15</FONT></DIV>
<DIV Class="Normal" Style="margin-bottom: 0.4em"> <A NAME="I1164"></A><A NAME="I1165"></A>The
following names are used when referring to certain <FONT FACE="Arial, Helvetica">special_character</FONT>s:
<A NAME="I1166"></A><A NAME="I1167"></A><A NAME="I1168"></A><A NAME="I1169"></A><A NAME="I1170"></A><A NAME="I1171"></A><A NAME="I1172"></A><A NAME="I1173"></A><A NAME="I1174"></A><A NAME="I1175"></A><A NAME="I1176"></A><A NAME="I1177"></A><A NAME="I1178"></A><A NAME="I1179"></A><A NAME="I1180"></A><A NAME="I1181"></A><A NAME="I1182"></A><A NAME="I1183"></A><A NAME="I1184"></A><A NAME="I1185"></A><A NAME="I1186"></A><A NAME="I1187"></A><A NAME="I1188"></A><A NAME="I1189"></A><A NAME="I1190"></A><A NAME="I1191"></A><A NAME="I1192"></A><A NAME="I1193"></A><A NAME="I1194"></A><A NAME="I1195"></A></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>15.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Discussion: </B>These are the
ones that play a special role in the syntax of Ada 95, or in the syntax
rules; we don't bother to define names for all characters. The first
name given is the name from ISO 10646-1; the subsequent names, if any,
are those used within the standard, depending on context. </FONT></DIV>
<DIV Class="CodeIndented"><TABLE Width="70%">
<TR><TD align="left"> symbol<TD align="left">name<TD align="left"> symbol<TD align="left">name<TD align="left">
<TR><TD align="left"> <TD align="left"> <TD align="left"> <TD align="left"> <TD align="left">
<TR><TD align="left"> "<TD align="left">quotation mark<TD align="left"> :<TD align="left">colon<TD align="left">
<TR><TD align="left"> #<TD align="left">number sign<TD align="left"> ;<TD align="left">semicolon<TD align="left">
<TR><TD align="left"> &<TD align="left">ampersand<TD align="left"> <<TD align="left">less-than sign<TD align="left">
<TR><TD align="left"> '<TD align="left">apostrophe, tick<TD align="left"> =<TD align="left">equals sign<TD align="left">
<TR><TD align="left"> (<TD align="left">left parenthesis<TD align="left"> ><TD align="left">greater-than sign<TD align="left">
<TR><TD align="left"> )<TD align="left">right parenthesis<TD align="left"> _<TD align="left">low line, underline<TD align="left">
<TR><TD align="left"> *<TD align="left">asterisk, multiply<TD align="left"> |<TD align="left">vertical line<TD align="left">
<TR><TD align="left"> +<TD align="left">plus sign<TD align="left"> [<TD align="left">left square bracket<TD align="left">
<TR><TD align="left"> ,<TD align="left">comma<TD align="left"> ]<TD align="left">right square bracket<TD align="left">
<TR><TD align="left"> -<TD align="left">hyphen-minus, minus<TD align="left"> {<TD align="left">left curly bracket<TD align="left">
<TR><TD align="left"> .<TD align="left">full stop, dot, point<TD align="left"> } <TD align="left">right curly bracket <TD align="left">
<TR><TD align="left"> / <TD align="left">solidus, divide <TD align="left"> <TD align="left"> <TD align="left">
</TABLE></DIV>
<H4 ALIGN=CENTER>Implementation Permissions</H4>
<DIV Class="Paranum"><FONT SIZE=-2>16</FONT></DIV>
<DIV Class="Normal"> In a nonstandard mode, the implementation may
support a different character repertoire[; in particular, the set of
characters that are considered <FONT FACE="Arial, Helvetica">identifier_letter</FONT>s
can be extended or changed to conform to local conventions]. </DIV>
<DIV Class="Paranum"><FONT SIZE=-2>16.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Ramification: </B>If an implementation
supports other character sets, it defines which characters fall into
each category, such as ``<FONT FACE="Arial, Helvetica">identifier_letter</FONT>,''
and what the corresponding rules of this section are, such as which characters
are allowed in the text of a program. </FONT></DIV>
<DIV Class="NotesHeader"><FONT SIZE=-1>NOTES</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>17</FONT></DIV>
<DIV Class="Notes"><FONT SIZE=-1>1 Every code position of
ISO 10646 BMP that is not reserved for a control function is defined
to be a <FONT FACE="Arial, Helvetica">graphic_character</FONT> by this
International Standard. This includes all code positions other than 0000
- 001F, 007F - 009F, and FFFE - FFFF.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>18</FONT></DIV>
<DIV Class="Notes"><FONT SIZE=-1>2 The language does not specify
the source representation of programs. </FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>18.a</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><B>Discussion: </B>Any source
representation is valid so long as the implementer can produce an (information-preserving)
algorithm for translating both directions between the representation
and the standard character set. (For example, every character in the
standard character set has to be representable, even if the output devices
attached to a given computer cannot print all of those characters properly.)
From a practical point of view, every implementer will have to provide
some way to process the ACVC. It is the intent to allow source representations,
such as parse trees, that are not even linear sequences of characters.
It is also the intent to allow different fonts: reserved words might
be in bold face, and that should be irrelevant to the semantics. </FONT></DIV>
<H4 ALIGN=CENTER>Extensions to Ada 83</H4>
<DIV Class="Paranum"><FONT SIZE=-2>18.b</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1><A NAME="I1196"></A>Ada 95 allows
8-bit and 16-bit characters, as well as implementation-specified character
sets. </FONT></DIV>
<H4 ALIGN=CENTER>Wording Changes from Ada 83</H4>
<DIV Class="Paranum"><FONT SIZE=-2>18.c</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1>The syntax rules in this clause
are modified to remove the emphasis on basic characters vs. others. (In
this day and age, there is no need to point out that you can write programs
without using (for example) lower case letters.) In particular, <FONT FACE="Arial, Helvetica">character</FONT>
(representing all characters usable outside comments) is added, and <FONT FACE="Arial, Helvetica">basic_graphic_character</FONT>,
<FONT FACE="Arial, Helvetica">other_special_character</FONT>, and <FONT FACE="Arial, Helvetica">basic_character</FONT>
are removed. <FONT FACE="Arial, Helvetica">Special_character</FONT> is
expanded to include Ada 83's <FONT FACE="Arial, Helvetica">other_special_character</FONT>,
as well as new 8-bit characters not present in Ada 83. Note that the
term ``basic letter'' is used in <A HREF="AA-A-3.html">A.3</A>, ``<A HREF="AA-A-3.html">Character
Handling</A>'' to refer to letters without diacritical marks.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>18.d</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1>Character names now come from
ISO 10646.</FONT></DIV>
<DIV Class="Paranum"><FONT SIZE=-2>18.e</FONT></DIV>
<DIV Class="Annotations"><FONT SIZE=-1>We use <FONT FACE="Arial, Helvetica">identifier_letter</FONT>
rather than <FONT FACE="Arial, Helvetica">letter</FONT> since ISO 10646
BMP includes many "letters' that are not permitted in identifiers
(in the standard mode). </FONT></DIV>
<HR>
<P><A HREF="AA-TOC.html">Contents</A> <A HREF="AA-0-29.html">Index</A> <A HREF="AA-2.html">Previous</A> <A HREF="AA-2-2.html">Next</A> <A HREF="AA-TTL.html">Legal</A></P>
</BODY>
</HTML>
|