1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
<HTML>
<HEAD>
<TITLE>OpenSP - SGML declaration</TITLE>
</HEAD>
<BODY>
<H1>Handling of the SGML declaration in OpenSP</H1>
<H2>Extended Naming Rules</H2>
<P>
OpenSP supports the Extended Naming Rules as specified in Annex J
of ISO 8879:1986 (added by the 1996 technical corrigendum).
<H2>Web SGML Adaptations</H2>
<P>
OpenSP supports most of the Web SGML Adaptations as specified in
Annex K of ISO 8879:1996 (added by the second technical corrigendum, 1998)
<H2>Default SGML declaration</H2>
<P>
If the SGML declaration is omitted
and there is no applicable
<A HREF="catalog.htm#sgmldecl"><SAMP>SGMLDECL</SAMP></A>
or <A HREF="catalog.htm#dtddecl"><SAMP>DTDDECL</SAMP></A>
entry in a catalog,
the following declaration will be implied:
<PRE>
<!SGML "ISO 8879:1986"
CHARSET
BASESET "ISO 646-1983//CHARSET
International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
SCOPE DOCUMENT
SYNTAX
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
BASESET "ISO 646-1983//CHARSET International Reference Version
(IRV)//ESC 2/5 4/0"
DESCSET 0 128 0
FUNCTION RE 13
RS 10
SPACE 32
TAB SEPCHAR 9
NAMING LCNMSTRT ""
UCNMSTRT ""
LCNMCHAR "-."
UCNMCHAR "-."
NAMECASE GENERAL YES
ENTITY NO
DELIM GENERAL SGMLREF
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
ATTCNT 99999999
ATTSPLEN 99999999
DTEMPLEN 24000
ENTLVL 99999999
GRPCNT 99999999
GRPGTCNT 99999999
GRPLVL 99999999
LITLEN 24000
NAMELEN 99999999
PILEN 24000
TAGLEN 99999999
TAGLVL 99999999
FEATURES
MINIMIZE DATATAG NO
OMITTAG YES
RANK YES
SHORTTAG YES
LINK SIMPLE YES 1000
IMPLICIT YES
EXPLICIT YES 1
OTHER CONCUR NO
SUBDOC YES 99999999
FORMAL YES
APPINFO NONE>
</PRE>
<P>
with the exception that all characters that are neither significant
nor shunned will be assigned to DATACHAR.
<H2><A NAME="charset">Character sets</A></H2>
<P>
A character in a base character set is described either by giving its
number in a <i>universal</i> character set, or by specifying a minimum
literal.
The first 65536 character numbers in the <i>universal</i> character
set are assumed to be the same as in Unicode 2.0 (ISO/IEC 10646).
The remaining character numbers can be assigned in any way convenient.
<P>
The public identifier of a base character set can be associated
with an entity that describes it by using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment
of an SGML declaration
consisting of the
portion of a character set description,
following the DESCSET keyword,
that is, it must be a sequence of character descriptions,
where each character description specifies a described character
number, the number of characters and
either a character number in the universal character set, a minimum literal
or the keyword
<SAMP>UNUSED</SAMP>.
Character numbers in the universal character set can be as big as
99999999.
<P>
In addition OpenSP has built in knowledge of many character sets.
These are identified using the designating sequence in the
public identifier. The following designating sequences are
recognized:
<DL>
<DT>
<SAMP>ESC 2/5 4/0</SAMP>
<DD>
The full set of ISO 646 IRV.
This is not a registered character set,
but is recommended by ISO 8879 (clause 10.2.2.4).
<DT>
<SAMP>ESC 2/8 4/0</SAMP>
<DD>
G0 set of ISO 646 IRV,
ISO Registration Number 2.
<DT>
<SAMP>ESC 2/8 4/2</SAMP>
<DD>
G0 set of ASCII,
ISO Registration Number 6.
<DT>
<SAMP>ESC 2/1 4/0</SAMP>
<DD>
C0 set of ISO 646,
ISO Registration Number 1.
<DT>
<SAMP>ESC 2/13 4/1</SAMP>
<DD>
G1 set of ISO 8859-1
<DT>
<SAMP>ESC 2/13 4/2</SAMP>
<DD>
G1 set of ISO 8859-2
<DT>
<SAMP>ESC 2/13 4/3</SAMP>
<DD>
G1 set of ISO 8859-3
<DT>
<SAMP>ESC 2/13 4/4</SAMP>
<DD>
G1 set of ISO 8859-4
<DT>
<SAMP>ESC 2/13 4/12</SAMP>
<DD>
G1 set of ISO 8859-5
<DT>
<SAMP>ESC 2/13 4/7</SAMP>
<DD>
G1 set of ISO 8859-6
<DT>
<SAMP>ESC 2/13 4/6</SAMP>
<DD>
G1 set of ISO 8859-7
<DT>
<SAMP>ESC 2/13 4/8</SAMP>
<DD>
G1 set of ISO 8859-8
<DT>
<SAMP>ESC 2/13 4/13</SAMP>
<DD>
G1 set of ISO 8859-9
<DT>
<SAMP>ESC 2/8 4/10</SAMP>
<DD>
Roman set from JIS-X-0202.
JIS version of ISO 646.
ISO Registration Number 14.
<DT>
<SAMP>ESC 2/8 4/9</SAMP>
<DD>
Katakana set from JIS X 0201.
ISO Registration Number 13.
<DT>
<SAMP>ESC 2/4 4/2</SAMP>
<DT>
<SAMP>ESC 2/6 4/0 ESC 2/4 4/2</SAMP>
<DD>
JIS X 0208-1990.
ISO Registration Numbers 87 and 168.
<DT>
<SAMP>ESC 2/4 2/8 4/4</SAMP>
<DD>
JIS X 0212-1990.
ISO Registration Number 159.
<DT>
<SAMP>ESC 2/4 4/1</SAMP>
<DD>
GB 2312-80.
ISO Registration Number 58.
<DT>
<SAMP>ESC 2/4 2/8 4/3</SAMP>
<DD>
KS C 5601-1992.
ISO Registration Number 149.
<DT>
<SAMP>ESC 2/5 2/15 4/0</SAMP>
<DT>
<SAMP>ESC 2/5 2/15 4/3</SAMP>
<DT>
<SAMP>ESC 2/5 2/15 4/5</SAMP>
<DD>
ISO/IEC 10646 UCS-2
<DT>
<SAMP>ESC 2/5 2/15 4/1</SAMP>
<DT>
<SAMP>ESC 2/5 2/15 4/4</SAMP>
<DT>
<SAMP>ESC 2/5 2/15 4/6</SAMP>
<DD>
ISO/IEC 10646 UCS-4
</DL>
<H2>Concrete syntaxes</H2>
<P>
The public identifier for a public concrete syntax can be associated
with an entity that describes using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment of an SGML declaration
consisting of a concrete syntax description
starting with the
<SAMP>SHUNCHAR</SAMP>
keyword
as in an SGML declaration.
The entity can also make use of the following extensions:
<UL>
<LI>
The Extended Naming Rules extensions can be used regardless of the minimum
literal used in the SGML declaration.
<LI>
An
<I>added function</I>
can be expressed as a parameter literal
instead of a name.
<LI>
The replacement for a reference reserved name
can be expressed as a parameter literal instead of a name.
<LI>
The total number of characters specified for
<SAMP>UCNMCHAR</SAMP>
or
<SAMP>UCNMSTRT</SAMP>
may exceed the total number of characters specified for
<SAMP>LCNMCHAR</SAMP>
or
<SAMP>LCNMSTRT</SAMP>
respectively.
Each character in
<SAMP>UCNMCHAR</SAMP>
or
<SAMP>UCNMSTRT</SAMP>
which does not have a corresponding character in the same position in
<SAMP>LCNMCHAR</SAMP>
or
<SAMP>LCNMSTRT</SAMP>
is simply assigned to <SAMP>UCNMCHAR</SAMP> or <SAMP>UCNMSTRT</SAMP>
without making it the upper-case form of any character.
<LI>
Within the specification of the short reference delimiters,
a parameter literal containing exactly one character
may be followed by the delimiter <SAMP>-</SAMP>
and another parameter literal containing exactly one character.
This has the same meaning as a sequence of parameter literals
one for each character number that is greater than or equal
to the number of the character in the first parameter literal
and less than or equal to the number of the character in the
second parameter literal.
<LI>
A number may be used as a delimiter in the
<SAMP>DELIM</SAMP>
section with the same meaning as a parameter literal
containing just a numeric character reference with that number.
</UL>
<H2>Capacity sets</H2>
<P>
The public identifier for a public capacity set can be associated
with an entity that describes using a
<SAMP>PUBLIC</SAMP>
entry in the catalog entry file.
The entity must be a fragment of an SGML declaration
consisting of a sequence of capacity names and numbers.
</BODY>
</HTML>
|