1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279
|
<HTML><HEAD><TITLE>How the Cocoon Engine Works</TITLE><LINK href="resources/simple.css" rel="stylesheet" title="Simple Style" type="text/css"></HEAD><BODY><P class="legal">Cocoon Documentation</P><H1 class="title">How the Cocoon Engine Works</H1><DOCUMENT>
<BODY>
<H1>How Cocoon 1.8 works</H1><DIV id="s1">
<P>
This document tries to follow the operations of Cocoon from a
"document point of view" while the javadoc documentation describes it
from a "procedural point of view".
Therefore, here we try to be complementary to the
javadoc and not to simply repeat what is stated there already. Furthermore,
since the ultimate documentation is the <CODE>source code</CODE> itself, this
document tries not to go too deep but eventually to integrate with the comments
in the code. In fact, some people may find that reading the source code
directly will shed more light than just reading this (significantly incomplete)
overview.
</P>
<P>
Unless otherwise specified, for sake of brevity any class name
is assumed to have the <CODE>org.apache.cocoon</CODE> prefix prepended to it.
</P>
<H2>Cocoon</H2><DIV id="s2">
<P>
This is the "main" class, either when Cocoon is being used as a servlet
or for command-line use. Clearly, it contains the methods <CODE>init</CODE>
for the latter case as well as <CODE>main</CODE> for the first case.
</P>
<P>
Hereafter are described the operations in the two common cases of command-line
execution (typically used for offline site creation), and servlet usage.
</P>
<H3>From the Command-line</H3><DIV id="s3">
<P>
When <CODE>Cocoon</CODE> is invoked from the command-line, it requires as
arguments the location of the <CODE>cocoon.properties</CODE>, the name
of the file containing the XML to be processed, and the name of the output
file. After reading the properties file, it creates a new
<CODE>EngineWrapper</CODE> initialized with the above mentioned properties
and then calls the <CODE>handle</CODE> method, and hands it
an output <CODE>Writer</CODE> and an input <CODE>File</CODE>. There is no good
reason for this asymmetry - the command-line operation mode of Cocoon was
coded quickly as a temporary hack to meet a popular need, in lieu of the
better, more integrated and well-designed command-line support planned for
Cocoon 2.
</P>
<H4>EngineWrapper</H4><DIV id="s4">
<P>
This is a "hack" which provides a "fake" implementation of
the Servlet API methods that are needed by Cocoon, in the inner classes
<CODE>HttpServletRequestImpl</CODE> and
<CODE>HttpServletResponseImpl</CODE>. When Cocoon gets integrated
with Stylebook, this class will probably need to be cleaned up.
</P>
<P>
Basically, this class instantiates an <CODE>Engine</CODE> class and passes
it the "fake" request and response objects mentioned above.
</P>
</DIV>
</DIV>
<H3>As a Servlet</H3><DIV id="s3">
<H4>Startup Phase</H4><DIV id="s4">
<P>
As for any servlet, upon startup the <CODE>init</CODE> method is
invoked. In Cocoon, this tries to load the cocoon.properties file, and, if
that is successful, creates an <CODE>Engine</CODE> instance.
</P>
</DIV>
<H4>Production Phase</H4><DIV id="s4">
<P>
A <CODE>service</CODE> method is provided by <CODE>Cocoon</CODE>, which
accepts all incoming requests, whatever their type. Servlet programmers may be
accustomed to writing <CODE>doGet</CODE> or <CODE>doPost</CODE> methods to
handle different types of requests, which is fine for simple servlets;
however, a <CODE>service</CODE> method is the best way to implement a fully
generic servlet like Cocoon.
</P>
</DIV>
</DIV>
</DIV>
<H2>Engine</H2><DIV id="s2">
<P>
<EM>This class implements the engine that does all the document processing.
</EM>
</P>
<P>
What better definition of the function of this class than the words of its
author (Stefano Mazzocchi)? From this otherwise lapidary definition, one
should realize the importance of this Class in the context of the Cocoon
operations and thus one should carefully read it through in order to understand
the "big picture" of how Cocoon works.
</P>
<H3>Startup Phase</H3><DIV id="s3">
<P>
Either from command-line or from the servlet, upon startup of the cocoon
servlet the <CODE>Engine</CODE> is instantiated by the
<CODE>private Engine</CODE> constructor. For the sake of understanding Cocoon
operations, it is important to know that at this point in time (and only this
time in the whole lifespan of the Cocoon servlet) the objects performing the
initialization of the various components
are instantiated with the parameters contained by the Configuration object.
This is the reason why, if changes are applied to the cocoon.properties file,
these do not have any effect on Cocoon until the engine is stopped and
then restarted.
</P>
<P>These objects either directly represent the components (such as
<CODE>logger.ServletLogger</CODE>)
or are Factories to provide the correct components
for a particular request (such as <CODE>processor.ProcessorFactory</CODE>).
The long-winded setup code involved here reads class names from the
<CODE>cocoon.properties</CODE> file and dynamically loads and configures
the classes, thus allowing for easy "swapping in and out" of components
without recompiling the whole of Cocoon.
</P>
<P class="note">
In general, all components
referenced here must be loadable at startup, otherwise Cocoon will refuse
to initialize - even if the missing component(s) are not actually used in
the web-application. Still, this is exactly the same situation as with
a more convential Java application which does not store class names in
configuration files.
</P>
</DIV>
<H3>production phase</H3><DIV id="s3">
<P>
The <CODE>handle</CODE> method has been already mentioned previously
and is indeed the focal point for all the runtime operations of Cocoon.
It is invoked with two objects, one being the input
<CODE>HttpServletRequest</CODE> and one being the output
<CODE>HttpServletResponse</CODE> (just as in a servlet).
</P>
<P>Until the whole page is done, it repeats the following process for up to
10 times (the pipeline will only need to be repeated if an OutOfMemoryError
occurs, in which case the cache will be cleared out somewhat and the
pipeline restarted):
</P>
<OL>
<LI>Creates the <CODE>Page</CODE> wrapper for cacheing purposes</LI>
<LI>Gets the initial document <CODE>Producer</CODE> from the
<CODE>ProducerFactory</CODE>. The HTTP parameter "producer=myproducer"
can be used to select the producer; if this parameter is not present,
the default producer is used.</LI>
<LI>Calls the producer to generate an <CODE>org.w3c.dom.Document</CODE></LI>
<LI>Setup the hash table <CODE>environment</CODE> to pass various parameters
to the processor pipeline</LI>
<LI>Process the document through the document <CODE>Processor</CODE>s,
(obtained from the <CODE>ProcessorFactory</CODE>)
for each processor invoked in the <CODE>Document</CODE></LI>
<LI>Get the <CODE>Formatter</CODE> requested by the <CODE>Document</CODE>
from the <CODE>FormatterFactory</CODE></LI>
<LI>Format the page</LI>
<LI>Fill the <CODE>Page</CODE> bean with content</LI>
<LI>Set the content type and the encoding</LI>
</OL>
<P>Finally,</P>
<UL>
<LI>Print the page to the response's PrintWriter object</LI>
<LI>Append timing information as an XML comment, if the content type allows</LI>
<LI>Flush the PrinterWriter to the client</LI>
<LI>Cache the page (if cacheing is enabled)</LI>
</UL>
<P>
Now, I suggest you to take a deep breath and read the above steps again, since
the simplicity of the algorithm exploited is so beautiful that it makes sense
to appreciate it in depth and breath.
</P>
</DIV>
<P>
At this point the key elements are therefore the processors and the formatters,
which directly operate upon the content of the Document. We are going to
investigate them in detail. It should be already clear that indeed one can have
more than one <CODE>Processor</CODE> per <CODE>Document</CODE> and that these
are going to be applied sequentially one after the other. Namely, this is how
is implemented the "chaining" of various <CODE>Processors</CODE>:
in five lines of code (including debugging information).
Again, simplicity and good coding style are assets of this implementation.
Let us have a look then at what <CODE>Processors</CODE> and
<CODE>Formatters</CODE> are, since these could be leveraged further and indeed
these are going to be likely extended with new components for specific needs.
</P>
</DIV>
<H2>ProducerFactory</H2><DIV id="s2">
<P>
For each source there must be an appropriate Producer implemented. Currently
(version 1.8), only ProducerFromFile is implemented. This is because XSP provides
the best solution (both in terms of ease-of-use and forward-compatibility with
Cocoon 2) for nearly all dynamic content solutions, so there is usually
no need to write a Producer explicitly.
</P>
</DIV>
<H2>ProcessorFactory</H2><DIV id="s2">
<P>
For each processing instruction type there must be an appropriate Processor
implemented. Currently (version 1.8), the following ones are implemented:
</P>
<UL>
<LI>Light weight Directory Access Protocol (LDAP)</LI>
<LI>SQL (deprecated - SQL or EQSL taglibs are preferred)</LI>
<LI>eXtendible Server Pages (supercedes Dynamic Content Processor)</LI>
<LI>Dynamic Content Processor (deprecated, use XSP instead)</LI>
<LI>XInclude (attempts to implement a W3C draft standard, but may not always
be up to date with the standard - as it is still evolving)</LI>
<LI>XSLT (implements the W3C Recommendation, XSLT)</LI>
</UL>
</DIV>
<H2>FormatterFactory</H2><DIV id="s2">
<P>
For each format in which the output should be delivered
(e.g. PDF, TEXT, HTML, XML, XHTML ), there must be an appropriate Formatter
implemented. Currently (version 1.8), the following ones are distributed:
</P>
<UL>
<LI>HTML</LI>
<LI>XHTML (while the HTML formatter writes some tags without closing tags for
compatability with older user agents, the XHTML formatter is fully
XML-compliant - indeed, it is just the XML formatter with a specific doctype.)
</LI>
<LI>Text (i.e. plain text)</LI>
<LI>XML</LI>
<LI>FO2PDF (transforms XSL:FO to PDF which can be read by Acrobat Viewer/Reader)
</LI>
</UL>
<P>
Clearly, one might imagine many more formatters such as
</P>
<UL>
<LI>FO2RTF Microsoft Rich Text Format</LI>
<LI>FO2MIF FrameMaker Interchange Format</LI>
<LI>BRAILLE</LI>
</UL>
<P>
In Cocoon 1.8 all of the formatters provided are in fact implemented as simple
"wrapper" classes (as can be easily seen by examining the source code in the
<CODE>formatters</CODE> directory) which merely set the parameters to the Apache
Serializers, or in the case of FO2PDF, Apache FOP, and then delegate the actual
formatting to those classes. In a way, no "real work" actually goes on
in the Formatter classes themselves. As you can see, Cocoon is a framework which
tries not to reinvent the wheel too often!
</P>
<P>
If you're wondering why FO2PDF isn't a Processor instead of a Formatter, the
answer is simple - it is conceptually more of a Processor (it transforms the entire
document), but for one vital difference - it does not output XML. Yes, there is
the workaround that XSP uses internally, which is to output one XML element with
all the content inside that as a text node - but this method would be rather clunky
for FO2PDF and would provide no real benefit.
</P>
<P>
Note that the CPU-intensive processing required for FO2PDF can be obviated by
the use of newer XML-compliant graphics and document markup languages on the client
side, such as SVG (Scalable Vector Graphics), or XSL:FO itself, which can just be
written out as XML. This is definitely the future for dynamic web
publishing, since the "rendering" of dozens of concurrent users' documents into PDF
all on the server does not make any sense from a performance point of view - it is
advantageous today of course because current popular browsers do not support XSL:FO
or SVG natively, but in the future this will change.
</P>
<P>In fact, XML markup languages like VoiceXML are supported by Cocoon by returning XML
and indeed in that case the parameter to cocoon-format is <CODE>text/xml</CODE>! In the
case of VRML, the cocoon format is <CODE>model/vrml</CODE> which in the
<CODE>cocoon.properties</CODE>
configuration file is mapped to <CODE>TextFormatter</CODE>.
</P>
</DIV>
</DIV>
</BODY>
</DOCUMENT><P class="legal">Copyright © 1999-2000 The Apache Software Foundation.<BR>All rights reserved.</P></BODY></HTML>
|