1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330
|
<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE article PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN' 'http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd'>
<article>
<articleinfo>
<title>Profiling DocBook documents</title>
<subtitle>An easy way to personalize your content for several target audiences</subtitle>
<author>
<firstname>Jirka</firstname>
<surname>Kosek</surname>
<affiliation>
<address>E-mail: <email>jirka@kosek.cz</email>
<otheraddr>Web: <ulink
url="http://www.kosek.cz">http://www.kosek.cz</ulink>
</otheraddr>
</address>
</affiliation>
</author>
<copyright>
<year>2001</year>
<holder>Jiří Kosek</holder>
</copyright>
<releaseinfo role="meta">
$Id: profiling.xml,v 1.3 2002/03/14 14:19:38 nwalsh Exp $
</releaseinfo>
</articleinfo>
<section>
<title>Introduction</title>
<para>There are many situations when we need to generate several
versions of document with slightly different content from the single
source. User guide for program with both Windows and Linux port will
differ only in several topics related to installation and
configuration. It would be futile to create and maintain two different
documents in sync. Another real world example – in addition to
standard documentation we can have guide enriched with problem
solutions from help-desk. It also may be better to store these
information in one document in order to make them absolutely
synchronized.</para>
<para>Several high-end editing tools have built in support for
profiling. User can easily add target audiences for any part of
document in a simple to use dialog box. User can select desired target
audience before printing or generation of other output formats.
Software will automatically filter out excess parts of document and
pass rest of it to rendering engine. However, if your budget is
limited you can not use commercial solutions. In the following text I
will show you simple but flexible profiling solution based on freely
available technologies.</para>
</section>
<section>
<title>$0 solution</title>
<para>In the document we mark parts targeted for particular platform
or user group. When generating final output version of document we
must do profiling i.e. personalization for particular target audience.
Only some parts of document are processed. DocBook has built in
support for marking document parts – on almost every element you
can use attributes like <sgmltag class="attribute">os</sgmltag>, <sgmltag
class="attribute">userlevel</sgmltag> and <sgmltag
class="attribute">arch</sgmltag>. We can store identifier of operating
system, user group or hardware architecture here. You can also store
profiling information into some general use attribute like <sgmltag
class="attribute">role</sgmltag>. <xref linkend="ex.doc"/>
shows how document with profiling information might
look like.</para>
<example id="ex.doc">
<title>Sample DocBook document with profiling information</title>
<programlisting><![CDATA[<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE chapter PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN'
'http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd'>
<chapter>
<title>How to setup SGML catalogs</title>
<para>Many existing SGML tools are able to map public identifiers to
files on your local file system. Mapping is specified in so called
catalog file. List of catalog files to use is stored in environment
variable <envar>SGML_CATALOG_FILES</envar>.</para>
<para os="unix">On Unix systems you can set this variable by invoking
command <command>export SGML_CATALOG_FILES=/usr/lib/catalog</command>
on command line. If you want maintain value of the variable between
sessions, place this command into startup file,
e.g. <filename>.profile</filename>.</para>
<para os="win">In Windows NT/2000 you can set environment variable by
issuing command <menuchoice><guimenu>Start</guimenu>
<guisubmenu>Settings</guisubmenu> <guisubmenu>Control
Pannel</guisubmenu>
<guimenuitem>System</guimenuitem></menuchoice>. Then select
<guilabel>Advanced</guilabel> card in the dialog box and click on the
<guibutton>Environment Variables...</guibutton> button. Using the
<guibutton>New</guibutton> button you can add new environment variable
into your system.</para>
</chapter>]]></programlisting>
</example>
<para>DocBook documents are often processed by freely available DSSSL
and XSL stylesheets. Most DocBook users who want profiling starts with
creation of customization layer which filters out some parts of
document. This approach has several serious disadvantages. First, you
must create profiling customization for all output formats as they are
using different stylesheets. This mean that you must maintain same
code on several places or do some dirty tricks with importing several
stylesheets into one stylesheet.</para>
<para>Second drawback is more serious. If you override templates to
filter out documents, you can get almost correct output in a single
run of stylesheet. If you will closely look on generated output, you
will recognize that in the table of contents there are entries for
items which should be completely removed by profiling. Similar
problems are in several other places – e.g. skipped auto
generated numbers for tables, pictures and so on. To correct this one
should change all stylesheet code which generates ToC,
cross-references and so on to ignore filtered content. This is very
complicated task and will disallow you to easily upgrade to new
versions of stylesheets.</para>
<para>Thus we must use different approach. Profiling should be totally
separate step which will filter out some parts of original document
and will create new correct DocBook document. When processed with any
DocBook tool or stylesheet you will get always correct output from the
new standalone document now. Big advantage of this method is
compatibility with all DocBook tools. Filtered document is normal
DocBook document and it does not require any special processing. Of
course, there is also one disadvantage – formating is now two
stage process – first you must filter source document and in
second step you could apply normal stylesheets on result of filtering.
This may be little bit inconvenient for many users, but whole task can
be very easily automated by set of shell scripts or batch files or
whatever else. Starting from version 1.50 of XSL stylesheets you can
do profiling in one step together with normal stylesheet
processing.</para>
<figure>
<title>Profiling stream</title>
<mediaobject>
<imageobject>
<imagedata fileref="profile-chain.png" format="PNG"/>
</imageobject>
</mediaobject>
</figure>
<para>When implementing filter, you can use many different approaches
and tools. I decided to use XSLT stylesheet. Writing necessary filter
is very easy in XSLT and many users have XSLT processor already
installed. Profiling stylesheet is part of standard XSL stylesheets
distribution and can be found in file
<filename>profiling/profile.xsl</filename>.</para>
</section>
<section>
<title>Usage</title>
<para>If you want to generate Unix specific guide from our sample
document (<xref linkend="ex.doc"/>) you can do it in the following
way. (We assume, that command <command>saxon</command> is able to run
XSLT processor on your machine. You can use your preffered XSLT
processor instead.)</para>
<screen><command>saxon</command> <option>-o</option> unixsample.xml sample.xml profile.xsl "os=unix"</screen>
<para>We are processing source document
<filename>sample.xml</filename> with profiling stylesheet
<filename>profile.xsl</filename>. Result of transformation is stored
in file <filename>unixsample.xml</filename>. By setting parameter
<parameter>os</parameter> to value <literal>unix</literal>, we tell
that only general and Unix specific parts of document should be copied
to the result document. If you will look at generated result, you will
notice that this is correct DocBook document:</para>
<programlisting><![CDATA[<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE chapter
PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd">
<chapter>
<title>How to setup SGML catalogs</title>
<para>Many existing SGML tools are able to map public identifiers to
files on your local file system. Mapping is specified in so called
catalog file. List of catalog files to use is stored in environment
variable <envar>SGML_CATALOG_FILES</envar>.</para>
<para os="unix">On Unix systems you can set this variable by invoking
command <command moreinfo="none">export SGML_CATALOG_FILES=/usr/lib/catalog</command>
on command line. If you want maintain value of the variable between
sessions, place this command into startup file,
e.g. <filename moreinfo="none">.profile</filename>.</para>
</chapter>]]></programlisting>
<para>It is same as the input document, only Windows specific
paragraph is missing. Same procedure can be used to get Windows
specific version of document. The result generated by profiling
stylesheet have correct document type declaration (DOCTYPE). Without
it some tools would not be able to process them further. On the result
of filtering you can run common tools – for example DSSSL or XSL
stylesheets.</para>
<para>Stylesheet support several attributes for specifying profiling
values. They are summarized in the following list.</para>
<variablelist>
<varlistentry>
<term><parameter>profile.os</parameter></term>
<listitem>
<para>This parameter is used for specifying operating system (<sgmltag
class="attribute">os</sgmltag> attribute) for which you want get
profiled version of document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.userlevel</parameter></term>
<listitem>
<para>This parameter is used for specifying user level (<sgmltag
class="attribute">userlevel</sgmltag> attribute) for which you want get
profiled version of document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.arch</parameter></term>
<listitem>
<para>This parameter is used for specifying hardware architecture (<sgmltag
class="attribute">arch</sgmltag> attribute) for which you want get
profiled version of document.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.condition</parameter></term>
<term><parameter>profile.conformance</parameter></term>
<term><parameter>profile.revision</parameter></term>
<term><parameter>profile.revisionflag</parameter></term>
<term><parameter>profile.security</parameter></term>
<term><parameter>profile.vendor</parameter></term>
<term><parameter>profile.role</parameter></term>
<term><parameter>profile.lang</parameter></term>
<listitem>
<para>These parameters can be used to specify target profile for
corresponding attributes.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.attribute</parameter></term>
<listitem>
<para>Name of attribute on which profiling should be based. It can be
used if profiling information is stored in other attributes then
<sgmltag class="attribute">os</sgmltag>, <sgmltag
class="attribute">userlevel</sgmltag> and <sgmltag class="attribute">arch</sgmltag>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.value</parameter></term>
<listitem>
<para>This parameter is used for specifying value for attribute
selected by <parameter>attr</parameter> parameter.</para>
<para>E.g. setting <literal>profile.attribute=os</literal> and
<literal>profile.value=unix</literal> is same as setting
<literal>os=unix</literal>.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>profile.separator</parameter></term>
<listitem>
<para>Separator for multiple target audience identifiers. Default is
<literal>;</literal>.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Current implementation is able to handle multiple profiling
targets in one attribute. In that case you must separate identifiers
by semicolon:</para>
<programlisting><![CDATA[<para os="unix;mac;win">...</para>]]></programlisting>
<para>It is possible to use different separator than semicolon by
setting <parameter>sep</parameter> parameter. There cann't be spaces
between separator and target names.</para>
<para>You can also perform profiling based on several profiling
attributes in a single step as stylesheet can handle all parameters
simultaneously. For example to get hypothetical guide for Windows
beginners, you can run profiling like this:</para>
<screen><command>saxon</command> <option>-o</option> xsample.xml sample.xml profile.xsl "profile.os=win" "profile.userlevel=beginner"</screen>
<para>As you can see above described profiling process can be used to
substitute SGML marked sections mechanism which is missing in XML.</para>
</section>
<section>
<title>Single pass profiling</title>
<para>If you are using XSL stylesheets version 1.50 and later with
EXSLT enabled XSLT processor (Saxon, xsltproc, Xalan) you can do
profiling and transformation to HTML or FO in a single step. To do this
use stylesheet with prefix <filename>profile-</filename> instead of
normal one (e.g. <filename>profile-docbook.xsl</filename>,
<filename>profile-chunk.xsl</filename> or
<filename>profile-htmlhelp.xsl</filename>). For example to get HTML
version of profiled document use:</para>
<screen><command>saxon</command> <option>-o</option> sample.html sample.xml .../html/profile-docbook.xsl "profile.os=win" "profile.userlevel=beginner"</screen>
<para>No additional processing is necessary. If you want to use
profiling with your customized stylesheets import profiling-able
stylesheet instead of normal one.</para>
</section>
<section>
<title>Conclusion</title>
<para>Profiling is necessary in many larger DocBook applications. It
can be quite easily implemented by simple XSLT stylesheet which is
presented here. This mechanism can also be used to simulate behavior
of marked sections known from SGML.</para>
</section>
</article>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-shorttag:t
End:
-->
|