File: enter.xml

package info (click to toggle)
gap-gapdoc 1.6.7-1
links: PTS
area: main
in suites: forky, sid, trixie
size: 4,596 kB
sloc: xml: 3,502; makefile: 244; javascript: 155; sh: 3
file content (385 lines) | stat: -rw-r--r-- 15,860 bytes
<Chapter Label="HowEnter">
<Heading>How To Type a &GAPDoc; Document</Heading>

In this chapter we give a more  formal description of what you need to start
to  type documentation  in &GAPDoc;  XML format.  Many details  were already
explained  by  example  in Section&nbsp;<Ref  Sect="sec:3k+1expl"/>  of  the
introduction.<P/>

We  do <E>not</E>  answer the  question  <Q>How to  <E>write</E> a  &GAPDoc;
document?</Q>  in  this chapter.  You  can  (hopefully)  find an  answer  to
this question  by studying  the example  in the  introduction, see&nbsp;<Ref
Sect="sec:3k+1expl"/>,  and learning  about  more details  in the  reference
Chapter&nbsp;<Ref Chap="DTD" />.<P/>

The definite source for all details of the official XML standard with useful
annotations is:<P/>

<URL>https://www.xml.com/axml/axml.html</URL><P/>

Although  this document  must be  quite technical,  it is  surprisingly well
readable.<P/>

<Section Label="EnterXML">
<Heading>General XML Syntax</Heading>

We will  now discuss the  pieces of  text which can  occur in a  general XML
document. We start  with those pieces which do not  contribute to the actual
content of the document.

<Subsection Label="XMLhead">
<Heading>Head of XML Document</Heading>

Each XML document should have a head which states that it is an XML document
in  some encoding  and which  XML-defined  language is  used. In  case of  a
&GAPDoc; document this should always look as in the following example.

<Listing Type="Example">
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Book SYSTEM "gapdoc.dtd">]]>
</Listing>

See&nbsp;<Ref  Subsect="XMLenc"/>  for  a   remark  on  the  <Q>encoding</Q>
statement.<P/>

(There may be local entity  definitions inside the <C>DOCTYPE</C> statement,
see Subsection&nbsp;<Ref Subsect="GDent" /> below.)
</Subsection>

<Subsection Label="XMLcomment">
<Heading>Comments</Heading>

A   <Q>comment</Q>    in   XML   starts   with    the   character   sequence
<Q><C>&lt;!--</C></Q> and ends with the sequence <Q><C>--></C></Q>. Between
these sequences there must not be two adjacent dashes <Q><C>--</C></Q>.

</Subsection>

<Subsection Label="XMLprocinstr">
<Heading>Processing Instructions</Heading>

A <Q>processing  instruction</Q> in XML  starts with the  character sequence
<Q><C>&lt;?</C></Q> followed  by a name (<Q><C>xml</C></Q>  is only allowed
at the very beginning  of the document to declare it  being an XML document,
see <Ref Subsect="XMLhead"/>). After that  any characters may follow, except
that  the ending  sequence <Q><C>?></C></Q>  must not  occur within  the
processing instruction.

</Subsection>

&nbsp;<P/>
And now  we turn  to those  parts of  the document  which contribute  to its
actual content.

<Subsection Label="XMLnames">
<Heading>Names in XML and Whitespace</Heading>

A <Q>name</Q> in XML (used for element and attribute identifiers, see below)
must  start with  a letter  (in  the encoding  of  the document)  or with  a
colon  <Q><C>:</C></Q> or  underscore <Q><C>_</C></Q>  character. The
following  characters may  also  be digits,  dots  <Q><C>.</C></Q> or dashes
<Q><C>-</C></Q>.<P/>

This is  a simplified description  of the rules  in the standard,  which are
concerned  with lots  of  unicode  ranges to  specify  what a  <Q>letter</Q>
is.<P/>

Sequences  only consisting  of the  following characters  are considered  as
<E>whitespace</E>:  blanks, tabs,  carriage return  characters and  new line
characters.

</Subsection>

<Subsection Label="XMLel">
<Heading>Elements</Heading>

The  actual  content  of  an   XML  document  consists  of  <Q>elements</Q>.
An  element  has  some  <Q>content</Q>   with  a  leading  <Q>start  tag</Q>
(<Ref  Subsect="XMLstarttag"/>)   and  a   trailing  <Q>end   tag</Q>  (<Ref
Subsect="XMLendtag"/>). The  content can  contain further elements  but they
must be  properly nested. One  can define  elements whose content  is always
empty, those elements  can also be entered with a  single combined tag (<Ref
Subsect="XMLcombtag"/>). 
</Subsection>

<Subsection Label="XMLstarttag">
<Heading>Start Tags</Heading>

A  <Q>start-tag</Q> consists  of  a less-than-character  <Q><C>&lt;</C></Q>
directly  followed (without  whitespace) by  an element  name (see&nbsp;<Ref
Subsect="XMLnames"/>),  optional  attributes,  optional  whitespace,  and  a
greater-than-character <Q><C>></C></Q>.<P/>

An  <Q>attribute</Q>  consists   of  some  whitespace  and   then  its  name
followed by  an equal sign  <Q><C>=</C></Q> which is optionally  enclosed by
whitespace,  and the  attribute value,  which is  enclosed either  in single
or  double  quotes.  The  attribute  value  may  not  contain  the  type  of
quote  used  as  a  delimiter  or  the  character  <Q><C>&lt;</C></Q>, the 
character <Q><C>&amp;</C></Q> may only appear to start an entity, 
see&nbsp;<Ref Subsect="XMLent"/>. We describe 
in&nbsp;<Ref Subsect="AttrValRules"/>  how 
to enter special characters in attribute values.<P/>

Note  especially  that  no  whitespace   is  allowed  between  the  starting
<Q><C>&lt;</C></Q> character  and the  element name.  The quotes  around an
attribute value cannot be omitted. The  names of elements and attributes are
<E>case sensitive</E>.
</Subsection>

<Subsection Label="XMLendtag">
<Heading>End Tags</Heading>

An  <Q>end  tag</Q>  consists  of the  two  characters  <Q><C>&lt;/</C></Q>
directly  followed   by  the  element   name,  optional  whitespace   and  a
greater-than-character <Q><C>></C></Q>.
</Subsection>

<Subsection Label="XMLcombtag">
<Heading>Combined Tags for Empty Elements</Heading>

Elements  which always  have  empty content  can be  written  with a  single
tag.  This looks  like a  start tag  (see&nbsp;<Ref Subsect="XMLstarttag"/>)
<E>except</E> that  the trailing  greater-than-character <Q><C>></C></Q>
is substituted by the two character sequence <Q><C>/></C></Q>.

</Subsection>

<Subsection Label="XMLent">
<Heading>Entities</Heading>

An <Q>entity</Q> in XML is a macro for some substitution text. There are two
types of entities. <P/>

A <Q>character entity</Q> can be used  to specify characters in the encoding
of the document  (can be useful for entering non-ASCII  characters which you
cannot  manage to  type  in  directly). They  are  entered  with a  sequence
<Q><C>&amp;#</C></Q>, directly followed by  either some decimal digits
or an  <Q><C>x</C></Q> and some  hexadecimal digits, directly followed  by a
semicolon <Q><C>;</C></Q>. Using such a  character entity is just equivalent
to typing the corresponding character directly.<P/>

Then there are references to <Q>named entities</Q>. They are entered with an
ampersand character  <Q><C>&amp;</C></Q> directly followed by  a name which
is directly followed  by a semicolon <Q><C>;</C></Q>. Such  entities must be
declared somewhere by  giving a substitution text. This text  is included in
the document  and the document is  parsed again afterwards. The  exact rules
are a  bit subtle but you  probably want to  use this only in  simple cases.
Predefined entities for &GAPDoc; are described in <Ref Subsect="XMLspchar"/>
and <Ref Subsect="GDent"/>.<P/>

</Subsection>

<Subsection Label="XMLspchar">
<Heading>Special Characters in XML</Heading>

We  have  seen  that  the  less-than-character  <Q><C>&lt;</C></Q>  and  the
ampersand character <Q><C>&amp;</C></Q>  start a tag or  entity reference in
XML.  To  get  these characters  into  the  document  text  one has  to  use
entity references,  namely <Q><C>&amp;lt;</C></Q> to  get <Q><C>&lt;</C></Q>
and   <Q><C>&amp;amp;</C></Q>   to  get   <Q><C>&amp;</C></Q>.   Furthermore
<Q><C>&amp;gt;</C></Q> must be  used to get <Q><C>></C></Q>  when the string
<Q><C>]]&gt;</C></Q> appears in  element content (and not as  delimiter of a
<C>CDATA</C> section explained below).<P/>

Another   possibility  is   to  use   a  <C>CDATA</C>   statement  explained
in&nbsp;<Ref Subsect="XMLcdata"/>.

</Subsection>

<Subsection Label="AttrValRules">
<Heading>Rules for Attribute Values</Heading>

Attribute values can contain entities which are substituted recursively.
But except for the entities &amp;lt;  or a character entity it is not
allowed that a &lt; character is introduced by the substitution (there is 
no XML parsing for evaluating the attribute value, just entity substitutions).
</Subsection>

<Subsection Label="XMLcdata">
<Heading><C>CDATA</C></Heading>

Pieces   of   text   which   contain    many   characters   which   can   be
misinterpreted  as  markup  can  be  enclosed  by  the  character  sequences
<Q><C><![CDATA[<![CDATA[]]></C></Q>  and  <Q><C>]]&gt;</C></Q>.  Everything
between these sequences is considered as  content of the document and is not
further interpreted  as XML  text. All  the rules explained  so far  in this
section  do <E>not  apply</E>  to such  a  part of  the  document. The  only
document  content which  cannot be  entered directly  inside a  <C>CDATA</C>
statement  is the  sequence <Q><C>]]&gt;</C></Q>.  This can  be entered  as
<Q><C>]]&amp;gt;</C></Q> outside the <C>CDATA</C> statement.

<Listing Type="Example">
A nesting of tags like <![CDATA[<a> <b> </a> </b>]]> is not allowed.
</Listing>

</Subsection> 

<Subsection Label="XMLenc">
<Heading>Encoding of an XML Document</Heading>

We suggest to use the UTF-8 encoding for writing &GAPDoc; XML documents.
But  the tools  described in  Chapter <Ref  Chap="ch:conv" />  also work
with  ASCII or  the  various ISO-8859-X  encodings  (ISO-8859-1 is  also
called latin1  and covers most  special characters for  western European
languages).

</Subsection>

<Subsection Label="XMLvalid">
<Heading>Well Formed and Valid XML Documents</Heading>

We want to mention  two further important words which are  often used in the
context of XML  documents. A piece of text becomes  a <Q>well formed</Q> XML
document if  all the formal rules  described in this section  are fulfilled.
<P/>

But  this  says  nothing  about  the   content  of  the  document.  To  give
this  content  a  meaning  one  needs  a  declaration  of  the  element  and
corresponding  attribute  names as  well  as  of  named entities  which  are
allowed.  Furthermore there  may be  restrictions how  such elements  can be
nested. This <E>definition of an XML  based markup language</E> is done in a
<Q>document  type  definition</Q>.  An  XML  document  which  contains  only
elements and entities declared in such  a document type definition and obeys
the rules given there is called <Q>valid (with respect to this document type
definition)</Q>.<P/>

The main  file of the  &GAPDoc; package is <F>gapdoc.dtd</F>.  This contains
such a  definition of  a markup language.  We are not  going to  explain the
formal syntax  rules for document type  definitions in this section.  But in
Chapter&nbsp;<Ref Chap="DTD"/> we will explain enough about it to understand
the file <F>gapdoc.dtd</F> and so the markup language defined there.

</Subsection>
</Section>

<Section Label="EnterGD">
<Heading>Entering &GAPDoc; Documents</Heading>

Here are some additional rules for writing &GAPDoc; XML documents.

<Subsection Label="otherspecchar">
<Heading>Other special characters</Heading>
As &GAPDoc; documents are used to produce  &LaTeX; and HTML
documents, the question arises how to deal with characters with a
special meaning for other applications (for example 
<Q><C>&amp;</C></Q>,
<Q><C>#</C></Q>,
<Q><C>$</C></Q>,
<Q><C>%</C></Q>,
<Q><C>~</C></Q>,
<Q><C>\</C></Q>,
<Q><C>{</C></Q>,
<Q><C>}</C></Q>,
<Q><C>_</C></Q>,
<Q><C>^</C></Q>,
<Q><C>&nbsp;</C></Q> (this is a non-breakable space, 
<Q><C>~</C></Q> in &LaTeX;) have a special meaning for &LaTeX; and
<Q><C>&amp;</C></Q>,
<Q><C>&lt;</C></Q>,
<Q><C>></C></Q> have a special meaning for HTML (and XML). 
In &GAPDoc; you can usually just type these characters directly, it is
the task of the converter programs which translate to some output format
to take care of such special characters. The exceptions to this simple
rule are: 
<List >
<Item>
&amp; and &lt; must be entered as <C>&amp;amp;</C> and 
<C>&amp;lt;</C> as explained in <Ref Subsect="XMLspchar"/>. 
</Item>
<Item>The content of the &GAPDoc; elements <C>&lt;M></C>, 
<C>&lt;Math></C> and <C>&lt;Display></C> is &LaTeX; code,
see <Ref  Sect="MathForm"/>.</Item>
<Item>The content of an <C>&lt;Alt></C> element with <C>Only</C>
attribute contains code for the specified output type, see 
<Ref Subsect="Alt"/>.</Item>
</List>

Remark: In former versions of &GAPDoc; one had to use particular
entities for all the special characters mentioned above 
(<C>&amp;tamp;</C>, <C>&amp;hash;</C>, 
<C>&amp;dollar;</C>, <C>&amp;percent;</C>, <C>&amp;tilde;</C>, 
<C>&amp;bslash;</C>, <C>&amp;obrace;</C>, <C>&amp;cbrace;</C>, 
<C>&amp;uscore;</C>, <C>&amp;circum;</C>, <C>&amp;tlt;</C>, <C>&amp;tgt;</C>).
These are no longer needed, but they are still defined for backwards
compatibility with older &GAPDoc; documents.

</Subsection>

<Subsection Label="GDformulae">
<Heading>Mathematical Formulae</Heading>

Mathematical formulae in &GAPDoc; are typed  as in &LaTeX;. They must be
the content  of one of three  types of &GAPDoc; elements  concerned with
mathematical  formulae:  <Q><C>Math</C></Q>, <Q><C>Display</C></Q>,  and
<Q><C>M</C></Q>  (see Sections&nbsp;<Ref  Subsect="Math"/> and&nbsp;<Ref
Subsect="M"/> for more  details). The first two  correspond to &LaTeX;'s
math mode and display  math mode. The last one is a  special form of the
<Q><C>Math</C></Q> element  type, that  imposes certain  restrictions on
the content. On the other hand the content of an <Q><C>M</C></Q> element
is processed in a well defined way for text terminal or HTML output. The
<Q><C>Display</C></Q>  element  also  has  an attribute  such  that  its
content is processed as in <Q><C>M</C></Q> elements.<P/>

Note that the content of these element is &LaTeX; code, but  
the special  characters
<Q><C>&lt;</C></Q>  and <Q><C>&amp;</C></Q>  for XML  must be  entered via
the  entities described  in&nbsp;<Ref  Subsect="XMLspchar"/> or  by using  a
<C>CDATA</C> statement, see&nbsp;<Ref Subsect="XMLcdata"/>.<P/>

</Subsection>

<Subsection Label="GDent">
<Heading>More Entities</Heading>

In &GAPDoc; there are some more predefined  entities:

<Table Align="|l|l|">
<Caption>Predefined Entities in the &GAPDoc; system</Caption>
<HorLine/>
<Row> <Item><C>&amp;GAP;</C></Item>       <Item>&GAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;GAPDoc;</C></Item>    <Item>&GAPDoc;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;TeX;</C></Item>       <Item>&TeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;LaTeX;</C></Item>     <Item>&LaTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;BibTeX;</C></Item>    <Item>&BibTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;MeatAxe;</C></Item>   <Item>&MeatAxe;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;XGAP;</C></Item>      <Item>&XGAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;copyright;</C></Item> <Item>&copyright;</Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;nbsp;</C></Item> <Item><Q>&nbsp;</Q></Item> </Row>
<HorLine/>
<Row> <Item><C>&amp;ndash;</C></Item> <Item>&ndash;</Item> </Row>
<HorLine/>
</Table>

Here <C>&amp;nbsp;</C> is a non-breakable space character.
<P/>

Additional entities are defined for some mathematical symbols, see <Ref
Sect="MathForm"/> for more details.
<P/>
One can define  further  local entities right inside  the head (see&nbsp;<Ref
Subsect="XMLhead"/>) of a &GAPDoc; XML document as in the following example.

<Listing Type="Example">
<![CDATA[<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE Book SYSTEM "gapdoc.dtd"
  [ <!ENTITY MyEntity "some longish <E>text</E> possibly with markup">
  ]>]]>
</Listing>

These additional definitions go into  the <C>&lt;!DOCTYPE</C> tag in square
brackets. Such new entities are used like this: <C>&amp;MyEntity;</C> <P/>

</Subsection>

</Section>
</Chapter>