File: how-it-works.html

package info (click to toggle)
cocoon 1.8-1
links: PTS
area: contrib
in suites: woody
size: 12,016 kB
ctags: 3,793
sloc: xml: 16,682; java: 8,089; sh: 174; makefile: 61
file content (279 lines) | stat: -rw-r--r-- 13,591 bytes
<HTML><HEAD><TITLE>How the Cocoon Engine Works</TITLE><LINK href="resources/simple.css" rel="stylesheet" title="Simple Style" type="text/css"></HEAD><BODY><P class="legal">Cocoon Documentation</P><H1 class="title">How the Cocoon Engine Works</H1><DOCUMENT>
 
<BODY>
 <H1>How Cocoon 1.8 works</H1><DIV id="s1">
  <P>
   This document tries to follow the operations of Cocoon from a
   &quot;document point of view&quot; while the javadoc documentation describes it
   from a &quot;procedural point of view&quot;.
   Therefore, here we try to be complementary to the
   javadoc and not to simply repeat what is stated there already. Furthermore,
   since the ultimate documentation is the <CODE>source code</CODE> itself, this
   document tries not to go too deep but eventually to integrate with the comments
   in the code. In fact, some people may find that reading the source code
   directly will shed more light than just reading this (significantly incomplete)
   overview.
  </P>
  <P>
   Unless otherwise specified, for sake of brevity any class name
   is assumed to have the <CODE>org.apache.cocoon</CODE> prefix prepended to it.
  </P>

  <H2>Cocoon</H2><DIV id="s2">
   <P>
    This is the &quot;main&quot; class, either when Cocoon is being used as a servlet
    or for command-line use. Clearly, it contains the methods <CODE>init</CODE>
    for the latter case as well as <CODE>main</CODE> for the first case.
   </P>
   <P>
    Hereafter are described the operations in the two common cases of command-line
    execution (typically used for offline site creation), and servlet usage.
   </P>
   <H3>From the Command-line</H3><DIV id="s3">
    <P>
     When <CODE>Cocoon</CODE> is invoked from the command-line, it requires as
     arguments the location of the <CODE>cocoon.properties</CODE>, the name
     of the file containing the XML to be processed, and the name of the output
     file. After reading the properties file, it creates a new
     <CODE>EngineWrapper</CODE> initialized with the above mentioned properties
     and then calls the <CODE>handle</CODE> method, and hands it
     an output <CODE>Writer</CODE> and an input <CODE>File</CODE>. There is no good
     reason for this asymmetry - the command-line operation mode of Cocoon was
     coded quickly as a temporary hack to meet a popular need, in lieu of the
     better, more integrated and well-designed command-line support planned for
     Cocoon 2.
    </P>
    <H4>EngineWrapper</H4><DIV id="s4">
     <P>
      This is a &quot;hack&quot; which provides a &quot;fake&quot; implementation of
      the Servlet API methods that are needed by Cocoon, in the inner classes
      <CODE>HttpServletRequestImpl</CODE> and
      <CODE>HttpServletResponseImpl</CODE>. When Cocoon gets integrated
      with Stylebook, this class will probably need to be cleaned up.
     </P>
     <P>
      Basically, this class instantiates an <CODE>Engine</CODE> class and passes
      it the &quot;fake&quot; request and response objects mentioned above.
     </P>
    </DIV>
   </DIV>
   <H3>As a Servlet</H3><DIV id="s3">
    <H4>Startup Phase</H4><DIV id="s4">
     <P>
      As for any servlet, upon startup the <CODE>init</CODE> method is
      invoked. In Cocoon, this tries to load the cocoon.properties file, and, if
      that is successful, creates an <CODE>Engine</CODE> instance.
     </P>
    </DIV>
    <H4>Production Phase</H4><DIV id="s4">
     <P>
      A <CODE>service</CODE> method is provided by <CODE>Cocoon</CODE>, which
      accepts all incoming requests, whatever their type. Servlet programmers may be
      accustomed to writing <CODE>doGet</CODE> or <CODE>doPost</CODE> methods to
      handle different types of requests, which is fine for simple servlets;
      however, a <CODE>service</CODE> method is the best way to implement a fully
      generic servlet like Cocoon.
     </P>

     
    </DIV>
   </DIV>
  </DIV>
  <H2>Engine</H2><DIV id="s2">
   <P>
    <EM>This class implements the engine that does all the document processing.
    </EM>
   </P>
   <P>
    What better definition of the function of this class than the words of its
    author (Stefano Mazzocchi)? From this otherwise lapidary definition, one
    should realize the importance of this Class in the context of the Cocoon
    operations and thus one should carefully read it through in order to understand
    the &quot;big picture&quot; of how Cocoon works.
   </P>
   <H3>Startup Phase</H3><DIV id="s3">
    <P>
     Either from command-line or from the servlet, upon startup of the cocoon
     servlet the <CODE>Engine</CODE> is instantiated by the
     <CODE>private Engine</CODE> constructor. For the sake of understanding Cocoon
     operations, it is important to know that at this point in time (and only this
     time in the whole lifespan of the Cocoon servlet) the objects performing the
     initialization of the various components

     

     are instantiated with the parameters contained by the Configuration object.
     This is the reason why, if changes are applied to the cocoon.properties file,
     these do not have any effect on Cocoon until the engine is stopped and
     then restarted.
    </P>

    <P>These objects either directly represent the components (such as
     <CODE>logger.ServletLogger</CODE>)
     or are Factories to provide the correct components
     for a particular request (such as <CODE>processor.ProcessorFactory</CODE>).
     The long-winded setup code involved here reads class names from the
     <CODE>cocoon.properties</CODE> file and dynamically loads and configures
     the classes, thus allowing for easy &quot;swapping in and out&quot; of components
     without recompiling the whole of Cocoon.
    </P>

    <P class="note">
     In general, all components
     referenced here must be loadable at startup, otherwise Cocoon will refuse
     to initialize - even if the missing component(s) are not actually used in
     the web-application. Still, this is exactly the same situation as with
     a more convential Java application which does not store class names in
     configuration files.
    </P>
   </DIV>
   <H3>production phase</H3><DIV id="s3">
    <P>
     The <CODE>handle</CODE> method has been already mentioned previously
     and is indeed the focal point for all the runtime operations of Cocoon.
     It is invoked with two objects, one being the input
     <CODE>HttpServletRequest</CODE> and one being the output
     <CODE>HttpServletResponse</CODE> (just as in a servlet).
    </P>
    <P>Until the whole page is done, it repeats the following process for up to
     10 times (the pipeline will only need to be repeated if an OutOfMemoryError
     occurs, in which case the cache will be cleared out somewhat and the
     pipeline restarted):
    </P>
    <OL>
     <LI>Creates the <CODE>Page</CODE> wrapper for cacheing purposes</LI>
     <LI>Gets the initial document <CODE>Producer</CODE> from the
      <CODE>ProducerFactory</CODE>. The HTTP parameter &quot;producer=myproducer&quot;
      can be used to select the producer; if this parameter is not present,
      the default producer is used.</LI>
     <LI>Calls the producer to generate an <CODE>org.w3c.dom.Document</CODE></LI>
     <LI>Setup the hash table <CODE>environment</CODE> to pass various parameters
      to the processor pipeline</LI>
     <LI>Process the document through the document <CODE>Processor</CODE>s,
      (obtained from the <CODE>ProcessorFactory</CODE>)
      for each processor invoked in the <CODE>Document</CODE></LI>
     <LI>Get the <CODE>Formatter</CODE> requested by the <CODE>Document</CODE>
      from the <CODE>FormatterFactory</CODE></LI>
     <LI>Format the page</LI>
     <LI>Fill the <CODE>Page</CODE> bean with content</LI>
     <LI>Set the content type and the encoding</LI>
    </OL>
    <P>Finally,</P>
    <UL>
     <LI>Print the page to the response's PrintWriter object</LI>
     <LI>Append timing information as an XML comment, if the content type allows</LI>
     <LI>Flush the PrinterWriter to the client</LI>
     <LI>Cache the page (if cacheing is enabled)</LI>
    </UL>
    <P>
     Now, I suggest you to take a deep breath and read the above steps again, since
     the simplicity of the algorithm exploited is so beautiful that it makes sense
     to appreciate it in depth and breath.
    </P>
   </DIV>
   <P>
    At this point the key elements are therefore the processors and the formatters,
    which directly operate upon the content of the Document. We are going to
    investigate them in detail. It should be already clear that indeed one can have
    more than one <CODE>Processor</CODE> per <CODE>Document</CODE> and that these
    are going to be applied sequentially one after the other. Namely, this is how
    is implemented the &quot;chaining&quot; of various <CODE>Processors</CODE>:
    in five lines of code (including debugging information).
    Again, simplicity and good coding style are assets of this implementation.
    Let us have a look then at what <CODE>Processors</CODE> and
    <CODE>Formatters</CODE> are, since these could be leveraged further and indeed
    these are going to be likely extended with new components for specific needs.
   </P>
  </DIV>

  <H2>ProducerFactory</H2><DIV id="s2">
   <P>
    For each source there must be an appropriate Producer implemented. Currently
    (version 1.8), only ProducerFromFile is implemented. This is because XSP provides
    the best solution (both in terms of ease-of-use and forward-compatibility with
    Cocoon 2) for nearly all dynamic content solutions, so there is usually
    no need to write a Producer explicitly.
   </P>
  </DIV>

  <H2>ProcessorFactory</H2><DIV id="s2">
   <P>
    For each processing instruction type there must be an appropriate Processor
    implemented. Currently (version 1.8), the following ones are implemented:
   </P>
   <UL>
    <LI>Light weight Directory Access Protocol (LDAP)</LI>
    <LI>SQL (deprecated - SQL or EQSL taglibs are preferred)</LI>
    <LI>eXtendible Server Pages (supercedes Dynamic Content Processor)</LI>
    <LI>Dynamic Content Processor (deprecated, use XSP instead)</LI>
    <LI>XInclude (attempts to implement a W3C draft standard, but may not always
     be up to date with the standard - as it is still evolving)</LI>
    <LI>XSLT (implements the W3C Recommendation, XSLT)</LI>
   </UL>
  </DIV>

  <H2>FormatterFactory</H2><DIV id="s2">
   <P>
    For each format in which the output should be delivered
    (e.g. PDF, TEXT, HTML, XML, XHTML ), there must be an appropriate Formatter
    implemented. Currently (version 1.8), the following ones are distributed:
   </P>
   <UL>
    <LI>HTML</LI>
    <LI>XHTML (while the HTML formatter writes some tags without closing tags for
     compatability with older user agents, the XHTML formatter is fully
     XML-compliant - indeed, it is just the XML formatter with a specific doctype.)
    </LI>
    <LI>Text (i.e. plain text)</LI>
    <LI>XML</LI>
    <LI>FO2PDF (transforms XSL:FO to PDF which can be read by Acrobat Viewer/Reader)
    </LI>
   </UL>
   <P>
    Clearly, one might imagine many more formatters such as
   </P>
   <UL>
    <LI>FO2RTF Microsoft Rich Text Format</LI>
    <LI>FO2MIF FrameMaker Interchange Format</LI>
    <LI>BRAILLE</LI>
   </UL>
   <P>
    In Cocoon 1.8 all of the formatters provided are in fact implemented as simple
    &quot;wrapper&quot; classes (as can be easily seen by examining the source code in the
    <CODE>formatters</CODE> directory) which merely set the parameters to the Apache
    Serializers, or in the case of FO2PDF, Apache FOP, and then delegate the actual
    formatting to those classes. In a way, no &quot;real work&quot; actually goes on
    in the Formatter classes themselves. As you can see, Cocoon is a framework which
    tries not to reinvent the wheel too often!
   </P>

   <P>
    If you're wondering why FO2PDF isn't a Processor instead of a Formatter, the
    answer is simple - it is conceptually more of a Processor (it transforms the entire
    document), but for one vital difference - it does not output XML. Yes, there is
    the workaround that XSP uses internally, which is to output one XML element with
    all the content inside that as a text node - but this method would be rather clunky
    for FO2PDF and would provide no real benefit.
   </P>

   <P>
    Note that the CPU-intensive processing required for FO2PDF can be obviated by
    the use of newer XML-compliant graphics and document markup languages on the client
    side, such as SVG (Scalable Vector Graphics), or XSL:FO itself, which can just be
    written out as XML. This is definitely the future for dynamic web
    publishing, since the &quot;rendering&quot; of dozens of concurrent users' documents into PDF
    all on the server does not make any sense from a performance point of view - it is
    advantageous today of course because current popular browsers do not support XSL:FO
    or SVG natively, but in the future this will change.
   </P>

   <P>In fact, XML markup languages like VoiceXML are supported by Cocoon by returning XML
    and indeed in that case the parameter to cocoon-format is <CODE>text/xml</CODE>! In the
    case of VRML, the cocoon format is <CODE>model/vrml</CODE> which in the
    <CODE>cocoon.properties</CODE>
    configuration file is mapped to <CODE>TextFormatter</CODE>.
   </P>
  </DIV>
 </DIV>
</BODY>
</DOCUMENT><P class="legal">Copyright &copy; 1999-2000 The Apache Software Foundation.<BR>All rights reserved.</P></BODY></HTML>