1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385
|
<Chapter Label="HowEnter">
<Heading>How To Type a &GAPDoc; Document</Heading>
In this chapter we give a more formal description of what you need to start
to type documentation in &GAPDoc; XML format. Many details were already
explained by example in Section <Ref Sect="sec:3k+1expl"/> of the
introduction.<P/>
We do <E>not</E> answer the question <Q>How to <E>write</E> a &GAPDoc;
document?</Q> in this chapter. You can (hopefully) find an answer to
this question by studying the example in the introduction, see <Ref
Sect="sec:3k+1expl"/>, and learning about more details in the reference
Chapter <Ref Chap="DTD" />.<P/>
The definite source for all details of the official XML standard with useful
annotations is:<P/>
<URL>https://www.xml.com/axml/axml.html</URL><P/>
Although this document must be quite technical, it is surprisingly well
readable.<P/>
<Section Label="EnterXML">
<Heading>General XML Syntax</Heading>
We will now discuss the pieces of text which can occur in a general XML
document. We start with those pieces which do not contribute to the actual
content of the document.
<Subsection Label="XMLhead">
<Heading>Head of XML Document</Heading>
Each XML document should have a head which states that it is an XML document
in some encoding and which XML-defined language is used. In case of a
&GAPDoc; document this should always look as in the following example.
<Listing Type="Example">
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Book SYSTEM "gapdoc.dtd">]]>
</Listing>
See <Ref Subsect="XMLenc"/> for a remark on the <Q>encoding</Q>
statement.<P/>
(There may be local entity definitions inside the <C>DOCTYPE</C> statement,
see Subsection <Ref Subsect="GDent" /> below.)
</Subsection>
<Subsection Label="XMLcomment">
<Heading>Comments</Heading>
A <Q>comment</Q> in XML starts with the character sequence
<Q><C><!--</C></Q> and ends with the sequence <Q><C>--></C></Q>. Between
these sequences there must not be two adjacent dashes <Q><C>--</C></Q>.
</Subsection>
<Subsection Label="XMLprocinstr">
<Heading>Processing Instructions</Heading>
A <Q>processing instruction</Q> in XML starts with the character sequence
<Q><C><?</C></Q> followed by a name (<Q><C>xml</C></Q> is only allowed
at the very beginning of the document to declare it being an XML document,
see <Ref Subsect="XMLhead"/>). After that any characters may follow, except
that the ending sequence <Q><C>?></C></Q> must not occur within the
processing instruction.
</Subsection>
<P/>
And now we turn to those parts of the document which contribute to its
actual content.
<Subsection Label="XMLnames">
<Heading>Names in XML and Whitespace</Heading>
A <Q>name</Q> in XML (used for element and attribute identifiers, see below)
must start with a letter (in the encoding of the document) or with a
colon <Q><C>:</C></Q> or underscore <Q><C>_</C></Q> character. The
following characters may also be digits, dots <Q><C>.</C></Q> or dashes
<Q><C>-</C></Q>.<P/>
This is a simplified description of the rules in the standard, which are
concerned with lots of unicode ranges to specify what a <Q>letter</Q>
is.<P/>
Sequences only consisting of the following characters are considered as
<E>whitespace</E>: blanks, tabs, carriage return characters and new line
characters.
</Subsection>
<Subsection Label="XMLel">
<Heading>Elements</Heading>
The actual content of an XML document consists of <Q>elements</Q>.
An element has some <Q>content</Q> with a leading <Q>start tag</Q>
(<Ref Subsect="XMLstarttag"/>) and a trailing <Q>end tag</Q> (<Ref
Subsect="XMLendtag"/>). The content can contain further elements but they
must be properly nested. One can define elements whose content is always
empty, those elements can also be entered with a single combined tag (<Ref
Subsect="XMLcombtag"/>).
</Subsection>
<Subsection Label="XMLstarttag">
<Heading>Start Tags</Heading>
A <Q>start-tag</Q> consists of a less-than-character <Q><C><</C></Q>
directly followed (without whitespace) by an element name (see <Ref
Subsect="XMLnames"/>), optional attributes, optional whitespace, and a
greater-than-character <Q><C>></C></Q>.<P/>
An <Q>attribute</Q> consists of some whitespace and then its name
followed by an equal sign <Q><C>=</C></Q> which is optionally enclosed by
whitespace, and the attribute value, which is enclosed either in single
or double quotes. The attribute value may not contain the type of
quote used as a delimiter or the character <Q><C><</C></Q>, the
character <Q><C>&</C></Q> may only appear to start an entity,
see <Ref Subsect="XMLent"/>. We describe
in <Ref Subsect="AttrValRules"/> how
to enter special characters in attribute values.<P/>
Note especially that no whitespace is allowed between the starting
<Q><C><</C></Q> character and the element name. The quotes around an
attribute value cannot be omitted. The names of elements and attributes are
<E>case sensitive</E>.
</Subsection>
<Subsection Label="XMLendtag">
<Heading>End Tags</Heading>
An <Q>end tag</Q> consists of the two characters <Q><C></</C></Q>
directly followed by the element name, optional whitespace and a
greater-than-character <Q><C>></C></Q>.
</Subsection>
<Subsection Label="XMLcombtag">
<Heading>Combined Tags for Empty Elements</Heading>
Elements which always have empty content can be written with a single
tag. This looks like a start tag (see <Ref Subsect="XMLstarttag"/>)
<E>except</E> that the trailing greater-than-character <Q><C>></C></Q>
is substituted by the two character sequence <Q><C>/></C></Q>.
</Subsection>
<Subsection Label="XMLent">
<Heading>Entities</Heading>
An <Q>entity</Q> in XML is a macro for some substitution text. There are two
types of entities. <P/>
A <Q>character entity</Q> can be used to specify characters in the encoding
of the document (can be useful for entering non-ASCII characters which you
cannot manage to type in directly). They are entered with a sequence
<Q><C>&#</C></Q>, directly followed by either some decimal digits
or an <Q><C>x</C></Q> and some hexadecimal digits, directly followed by a
semicolon <Q><C>;</C></Q>. Using such a character entity is just equivalent
to typing the corresponding character directly.<P/>
Then there are references to <Q>named entities</Q>. They are entered with an
ampersand character <Q><C>&</C></Q> directly followed by a name which
is directly followed by a semicolon <Q><C>;</C></Q>. Such entities must be
declared somewhere by giving a substitution text. This text is included in
the document and the document is parsed again afterwards. The exact rules
are a bit subtle but you probably want to use this only in simple cases.
Predefined entities for &GAPDoc; are described in <Ref Subsect="XMLspchar"/>
and <Ref Subsect="GDent"/>.<P/>
</Subsection>
<Subsection Label="XMLspchar">
<Heading>Special Characters in XML</Heading>
We have seen that the less-than-character <Q><C><</C></Q> and the
ampersand character <Q><C>&</C></Q> start a tag or entity reference in
XML. To get these characters into the document text one has to use
entity references, namely <Q><C>&lt;</C></Q> to get <Q><C><</C></Q>
and <Q><C>&amp;</C></Q> to get <Q><C>&</C></Q>. Furthermore
<Q><C>&gt;</C></Q> must be used to get <Q><C>></C></Q> when the string
<Q><C>]]></C></Q> appears in element content (and not as delimiter of a
<C>CDATA</C> section explained below).<P/>
Another possibility is to use a <C>CDATA</C> statement explained
in <Ref Subsect="XMLcdata"/>.
</Subsection>
<Subsection Label="AttrValRules">
<Heading>Rules for Attribute Values</Heading>
Attribute values can contain entities which are substituted recursively.
But except for the entities &lt; or a character entity it is not
allowed that a < character is introduced by the substitution (there is
no XML parsing for evaluating the attribute value, just entity substitutions).
</Subsection>
<Subsection Label="XMLcdata">
<Heading><C>CDATA</C></Heading>
Pieces of text which contain many characters which can be
misinterpreted as markup can be enclosed by the character sequences
<Q><C><![CDATA[<![CDATA[]]></C></Q> and <Q><C>]]></C></Q>. Everything
between these sequences is considered as content of the document and is not
further interpreted as XML text. All the rules explained so far in this
section do <E>not apply</E> to such a part of the document. The only
document content which cannot be entered directly inside a <C>CDATA</C>
statement is the sequence <Q><C>]]></C></Q>. This can be entered as
<Q><C>]]&gt;</C></Q> outside the <C>CDATA</C> statement.
<Listing Type="Example">
A nesting of tags like <![CDATA[<a> <b> </a> </b>]]> is not allowed.
</Listing>
</Subsection>
<Subsection Label="XMLenc">
<Heading>Encoding of an XML Document</Heading>
We suggest to use the UTF-8 encoding for writing &GAPDoc; XML documents.
But the tools described in Chapter <Ref Chap="ch:conv" /> also work
with ASCII or the various ISO-8859-X encodings (ISO-8859-1 is also
called latin1 and covers most special characters for western European
languages).
</Subsection>
<Subsection Label="XMLvalid">
<Heading>Well Formed and Valid XML Documents</Heading>
We want to mention two further important words which are often used in the
context of XML documents. A piece of text becomes a <Q>well formed</Q> XML
document if all the formal rules described in this section are fulfilled.
<P/>
But this says nothing about the content of the document. To give
this content a meaning one needs a declaration of the element and
corresponding attribute names as well as of named entities which are
allowed. Furthermore there may be restrictions how such elements can be
nested. This <E>definition of an XML based markup language</E> is done in a
<Q>document type definition</Q>. An XML document which contains only
elements and entities declared in such a document type definition and obeys
the rules given there is called <Q>valid (with respect to this document type
definition)</Q>.<P/>
The main file of the &GAPDoc; package is <F>gapdoc.dtd</F>. This contains
such a definition of a markup language. We are not going to explain the
formal syntax rules for document type definitions in this section. But in
Chapter <Ref Chap="DTD"/> we will explain enough about it to understand
the file <F>gapdoc.dtd</F> and so the markup language defined there.
</Subsection>
</Section>
<Section Label="EnterGD">
<Heading>Entering &GAPDoc; Documents</Heading>
Here are some additional rules for writing &GAPDoc; XML documents.
<Subsection Label="otherspecchar">
<Heading>Other special characters</Heading>
As &GAPDoc; documents are used to produce &LaTeX; and HTML
documents, the question arises how to deal with characters with a
special meaning for other applications (for example
<Q><C>&</C></Q>,
<Q><C>#</C></Q>,
<Q><C>$</C></Q>,
<Q><C>%</C></Q>,
<Q><C>~</C></Q>,
<Q><C>\</C></Q>,
<Q><C>{</C></Q>,
<Q><C>}</C></Q>,
<Q><C>_</C></Q>,
<Q><C>^</C></Q>,
<Q><C> </C></Q> (this is a non-breakable space,
<Q><C>~</C></Q> in &LaTeX;) have a special meaning for &LaTeX; and
<Q><C>&</C></Q>,
<Q><C><</C></Q>,
<Q><C>></C></Q> have a special meaning for HTML (and XML).
In &GAPDoc; you can usually just type these characters directly, it is
the task of the converter programs which translate to some output format
to take care of such special characters. The exceptions to this simple
rule are:
<List >
<Item>
& and < must be entered as <C>&amp;</C> and
<C>&lt;</C> as explained in <Ref Subsect="XMLspchar"/>.
</Item>
<Item>The content of the &GAPDoc; elements <C><M></C>,
<C><Math></C> and <C><Display></C> is &LaTeX; code,
see <Ref Sect="MathForm"/>.</Item>
<Item>The content of an <C><Alt></C> element with <C>Only</C>
attribute contains code for the specified output type, see
<Ref Subsect="Alt"/>.</Item>
</List>
Remark: In former versions of &GAPDoc; one had to use particular
entities for all the special characters mentioned above
(<C>&tamp;</C>, <C>&hash;</C>,
<C>&dollar;</C>, <C>&percent;</C>, <C>&tilde;</C>,
<C>&bslash;</C>, <C>&obrace;</C>, <C>&cbrace;</C>,
<C>&uscore;</C>, <C>&circum;</C>, <C>&tlt;</C>, <C>&tgt;</C>).
These are no longer needed, but they are still defined for backwards
compatibility with older &GAPDoc; documents.
</Subsection>
<Subsection Label="GDformulae">
<Heading>Mathematical Formulae</Heading>
Mathematical formulae in &GAPDoc; are typed as in &LaTeX;. They must be
the content of one of three types of &GAPDoc; elements concerned with
mathematical formulae: <Q><C>Math</C></Q>, <Q><C>Display</C></Q>, and
<Q><C>M</C></Q> (see Sections <Ref Subsect="Math"/> and <Ref
Subsect="M"/> for more details). The first two correspond to &LaTeX;'s
math mode and display math mode. The last one is a special form of the
<Q><C>Math</C></Q> element type, that imposes certain restrictions on
the content. On the other hand the content of an <Q><C>M</C></Q> element
is processed in a well defined way for text terminal or HTML output. The
<Q><C>Display</C></Q> element also has an attribute such that its
content is processed as in <Q><C>M</C></Q> elements.<P/>
Note that the content of these element is &LaTeX; code, but
the special characters
<Q><C><</C></Q> and <Q><C>&</C></Q> for XML must be entered via
the entities described in <Ref Subsect="XMLspchar"/> or by using a
<C>CDATA</C> statement, see <Ref Subsect="XMLcdata"/>.<P/>
</Subsection>
<Subsection Label="GDent">
<Heading>More Entities</Heading>
In &GAPDoc; there are some more predefined entities:
<Table Align="|l|l|">
<Caption>Predefined Entities in the &GAPDoc; system</Caption>
<HorLine/>
<Row> <Item><C>&GAP;</C></Item> <Item>&GAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>&GAPDoc;</C></Item> <Item>&GAPDoc;</Item> </Row>
<HorLine/>
<Row> <Item><C>&TeX;</C></Item> <Item>&TeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&LaTeX;</C></Item> <Item>&LaTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&BibTeX;</C></Item> <Item>&BibTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&MeatAxe;</C></Item> <Item>&MeatAxe;</Item> </Row>
<HorLine/>
<Row> <Item><C>&XGAP;</C></Item> <Item>&XGAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>&copyright;</C></Item> <Item>©right;</Item> </Row>
<HorLine/>
<Row> <Item><C>&nbsp;</C></Item> <Item><Q> </Q></Item> </Row>
<HorLine/>
<Row> <Item><C>&ndash;</C></Item> <Item>–</Item> </Row>
<HorLine/>
</Table>
Here <C>&nbsp;</C> is a non-breakable space character.
<P/>
Additional entities are defined for some mathematical symbols, see <Ref
Sect="MathForm"/> for more details.
<P/>
One can define further local entities right inside the head (see <Ref
Subsect="XMLhead"/>) of a &GAPDoc; XML document as in the following example.
<Listing Type="Example">
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Book SYSTEM "gapdoc.dtd"
[ <!ENTITY MyEntity "some longish <E>text</E> possibly with markup">
]>]]>
</Listing>
These additional definitions go into the <C><!DOCTYPE</C> tag in square
brackets. Such new entities are used like this: <C>&MyEntity;</C> <P/>
</Subsection>
</Section>
</Chapter>
|