File: cocoon2.html

package info (click to toggle)
cocoon 1.8-1
links: PTS
area: contrib
in suites: woody
size: 12,016 kB
ctags: 3,793
sloc: xml: 16,682; java: 8,089; sh: 174; makefile: 61
file content (302 lines) | stat: -rw-r--r-- 15,417 bytes
<HTML><HEAD><TITLE>Cocoon 2</TITLE><LINK href="resources/simple.css" rel="stylesheet" title="Simple Style" type="text/css"></HEAD><BODY><P class="legal">Cocoon Documentation</P><H1 class="title">Cocoon 2</H1><DOCUMENT>
 

 <BODY>
 <H1>A new look</H1><DIV id="s1">
  <P>The Cocoon Project will evidence its new course with a new logo that was
  designed by Cocoon's creator Stefano Mazzocchi. Here it is:</P>

  <IMG alt="The new Cocoon Logo" class="figure" src="images/cocoon2.gif">
 </DIV>

 <H1>Introduction</H1><DIV id="s1">
  <P>The Cocoon Project has gone a long way since its creation on
  January 1999. It started as a simple servlet for static XSL styling and became
  more and more powerful as new features were added. Unfortunately, design
  decisions made early in the project influenced its evolution. Today, some of
  those constraints that shaped the project were modified as XML standards have evolved and
  solidified. For this reason, those design decisions need to be reconsidered
  under this new light.</P>

  <P>While Cocoon started as a small step in the direction of a new
  web publishing idea based on better design patterns and reviewed estimations
  of management issues, the technology used was not mature enough for tools to
  emerge. Today, most web engineers consider XML as the key for an improved web
  model and web site managers see XML as a way to reduce costs and ease
  production.</P>

  <P>In an era where services rather than software will be key for
  economic success, a better and less expensive model for web publishing will
  be a winner, especially if based on open standards.</P>
 </DIV>

 <H1>Passive APIs vs. Active APIs</H1><DIV id="s1">
  <P>Web serving environments must be fast and scalable to be
  useful. Cocoon1 was born as a &quot;proof of concept&quot; rather than
  production software and had significant design restrictions, based mainly on
  the availability of freely redistributable tools. Other issues were lack of
  detailed knowledge on the APIs available as well as underestimation of the
  project success, being created as a way to learn XSL rather than a full
  publishing system capable of taking care of all XML web publishing needs.</P>

  <P>For the above reasons, Cocoon 1 was based on the DOM level 1
  API which is a <EM>passive</EM> API and was intended mainly for client side
  operation. This is mainly due to the fact that most DOM
  implementations require the document to reside in memory. While this is
  practical for small documents and thus good for the &quot;proof of
  concept&quot; stage, it is now considered a main design constraint for Cocoon
  scalability.</P>

  <P>Since the goal of Cocoon 2 is the ability to process
  simultaneously multiple 100Mb documents in JVM with a few Mbs of heap size,
  careful memory use and tuning of internal components is a key issue. To reach
  this goal, an improved API model was needed. This is now identified in the SAX
  API which is, unlike DOM, event based (so <EM>active</EM>, in the sense that its
  design is based on the <EM>inversion of control</EM> principle).</P>

  <P>The event model allows document generators to trigger events that get handled
  by the various processing stages and finally get
  serialized onto the response stream. This has a significant impact on both
  performance (effective and user perceived) and memory needs:</P>

  <DL>
    <LI><STRONG>Incremental operation</STRONG> - 
      The response is created during document production.
      Client's perceived performance is dramatically
      improved since clients can start receiving data as soon as it is created,
      not after all processing stages have been performed. In those cases where
      incremental operation is not possible (for example, element sorting),
      internal buffers store the events until the operation can be performed.
      However, even in these cases performance can be increased with the use of
      tuned memory structures.
    </LI>
    
    <LI><STRONG>Lowered memory consumption</STRONG> - 
      Since most of the
      server processing required in Cocoon is incremental, an incremental model
      allows XML production events to be transformed directly into output events
      and character written on streams, thus avoiding the need to store them in
      memory.
    </LI>
    
    <LI><STRONG>Easier scalability</STRONG> - 
      Reduced memory needs allow a greater number of
      concurrent operations to take place simultaneously, thus allowing the
      publishing system to scale as the load increases.
    </LI>
    
    <LI><STRONG>More optimizable code model</STRONG> - 
      Modern virtual machines are based on the idea of hotspots,
      code fragments that are used often and, if optimized, increase the process
      execution speed by large amounts.
      This new event model allows easier detection of hotspots since it is a
      method driven operation, rather than a memory driven one. Hot methods can
      be identified earlier and can be better optimized.
    </LI>
    
    <LI><STRONG>Reduced garbage collection</STRONG> - 
      Even the most advanced
      and lightweight DOM implementation require at least three to five times
      (and sometimes much more than this) more memory than the original document
      size. This not only reduces the scalability of the operation, but also
      impacts overall performance by increasing the amount of memory garbage that
      must be collected, tying up CPU cycles. Even if modern
      virtual machines have reduced the overhead of garbage collection,
      less garbage will always benefit performance and scalability.
    </LI>
    
  </DL>

  <P>The above points alone would be enough for the Cocoon 2
  paradigm shift, even if this event based model impacts not only the general
  architecture of the publishing system but also its internal processing
  components such as XSLT processing and PDF formatting. These components will
  require substantial work and maybe design reconsideration to be able to follow
  a pure event-based model. The Cocoon Project will work closely with the other
  component projects to be able to influence their operation in this direction.</P>
</DIV>

<H1>Reactors Reconsidered</H1><DIV id="s1">
  <P>Another design choice that should be revised is the reactor
  pattern that was introduced to allow components to be connected in more
  flexible way. In fact, by contrast to the fixed pipe model used up to Cocoon
  1.3.1, the reactor approach allows components to be dynamically connected,
  depending on reaction instructions introduced inside the documents.</P>

  <P>While this at first seemed a very advanced and highly
  appealing model, it turned out to be a very dangerous approach. The first
  concern is mainly technical: porting the reactor pattern under an event-based
  model requires limitations and tradeoffs since the generated events must be
  cached until a reaction instruction is encountered.</P>

  <P>But even if the technical difficulties could be solved, a key limitation
  remains: there is no single point of management.</P>
</DIV>

<H1>Management Considerations</H1><DIV id="s1">
  <P>The web was created to reduce information management costs by
  distributing them back on information owners. While this model is great for
  user communities (scientists, students, employees, or people in general) each
  of them managing small amount of personal information, it becomes impractical
  for highly centralized information systems where <EM>distributed management</EM>
  is simply not practical.</P>

  <P>While in the HTML web model the page format and URL names
  where the only necessary contracts between individuals to create a world wide
  web, in more structured information systems the number of contracts increases
  by a significant factor due to the need of coherence between the
  hosted information: common style, common design issues, common languages,
  server side logic integration, data validation, etc...</P>

  <P>It is only under this light that XML and its web model reveal
  their power: the HTML web model had too little in the way of contracts to be
  able to develop a structured and more coherent distributed information system,
  a reason that is mainly imposed by the lack of good and algorithmically certain
  information indexing and knowledge seeking systems. Lacks that tend to degrade
  the quality of the truly distributed web in favor of more structured web sites
  (that based their improved site structure on internal contracts).</P>

  <P>The simplification and engineering of web site management is
  considered one of the most important Cocoon 2 goals. This is done mainly by
  technologically imposing a reduced number of contracts and placing them in a
  hierarchical shape, suitable for replacing current high-structure web site
  management models.</P>

  <P>The model that Cocoon 2 adopts is the &quot;pyramid model of
  web contracts&quot; which is outlined in the picture below</P>

  <IMG alt="The Cocoon 2 Pyramid Model of Contracts" class="figure" src="images/pyramid-model.gif">

  <P>and is composed by four different working contexts (the rectangles)</P>

  <DL>
    <LI><STRONG>Management</STRONG> - 
      The people that decide what the site should
      contain, how it should behave and how it should appear
    </LI>
    
    <LI><STRONG>Content</STRONG> - 
      The people responsible for writing, owning and managing
      the site content. This context may contain several sub-contexts -
      one for each language used to express page content.
    </LI>
    
    <LI><STRONG>Logic</STRONG> - 
      The people responsible for integration with dynamic
      content generation technologies and database systems.
    </LI>
    
    <LI><STRONG>Style</STRONG> - 
      The people responsible for information
      presentation, look &amp; feel, site graphics and its maintenance.
    </LI>
    
  </DL>

  <P>and five contracts (the lines)</P>

  <UL>
    <LI>management - content</LI>
    <LI>management - logic</LI>
    <LI>management - style</LI>
    <LI>content - logic</LI>
    <LI>content - style</LI>
  </UL>

  <P>Note that there is no <EM>logic - style</EM> contract. Cocoon 2 aims to
  provide both software and guidelines to allow you to remove such a
  contract.</P>
</DIV>

<H1>Overlapping contexts and Chain Mapping</H1><DIV id="s1">
  <P>The above model can be applied only if the different contexts
  never overlap, otherwise there is no chance of having a single management
  point. For example, if the W3C-recommended method to link stylesheets to XML
  documents is used, the content and style contexts overlap and it's impossible
  to change the styling behavior of the document without changing it. The same
  is true for the processing instructions used by the Cocoon 1 reactor to drive
  the page processing: each stage specifies the next stage to determine the result,
  thus increasing management and debugging complexity. Another overlapping in
  context contracts is the need for URL-encoded parameters to drive the page output.
  These overlaps break the pyramid model and increase the management costs.</P>

  <P>In Cocoon 2, the reactor pattern will be abandoned in favor of
  a pipeline mapping technique. This is based on the fact that the number of
  different contracts is limited even for big sites and grows with a rate
  that is normally much less than its size.</P>

  <P>Also, for performance reasons, Cocoon 2 will try to compile
  everything that is possibly compilable (pages/XSP into generators, stylesheets
  into transformers, etc...) so, in this new model, the <EM>processing chain</EM>
  that generates the page contains (in a direct executable form) all the
  information/logic that handles the requested resource to generate its
  response.</P>

  <P>This means that instead of using event-driven request-time DTD interpretation
  (done in all Cocoon 1 processors), these will be either compiled into transformers
  directly (XSLT stylesheet compilation) or compiled into generators using
  logicsheets and XSP which will remove totally the need for request-time
  interpretation solutions like DCP that will be removed.</P>

  <P class="note">Some of these features are already present in latest Cocoon 1.x
   releases but the Cocoon 2 architecture will make them central to its new
   core.</P>
</DIV>

<H1>Sitemap</H1><DIV id="s1">
  <P>In Cocoon 2 terminology, a <EM>sitemap</EM> is the collection of pipeline
  matching informations that allow the Cocoon engine to associate the requested
  URI to the proper response-producing pipeline.</P>

  <P>The sitemap physically represents the central repository for web site
  administration, where the URI space and its handling is maintained.</P>

  <P>Please, refer to the Cocoon 2 CVS module for more information on this.</P>

</DIV>

<H1>Pre-compilation, Pre-generation and Caching</H1><DIV id="s1">
  <P>The cache system in Cocoon 1 will be ported with very little
  design changes since it's very flexible and was not polluted by early design
  constraints since it appeared in later versions. The issue regarding static
  file caching that, no matter what, will always be slower than direct web server
  caching, means that Cocoon 2 will be as <EM>proxy friendly</EM> as possible.</P>

  <P>To be able to put most of the static part of the job back on the web
  server (where it belongs), Cocoon 2 will greatly improve its command line
  operation, allowing the creation of <EM>site makefiles</EM> that will
  automatically scan the web site and the source documents and will provide a
  way to <EM>regenerate</EM> the static part of a web site (images and tables
  included!) based on the same XML model used in the dynamic operation version.</P>

  <P>Cocoon 2 will, in fact, be the integration between Cocoon 1 and Stylebook.</P>

  <P>It will be up to the web server administrator to use static
  regeneration capabilities on a time basis, manually or triggered by some
  particular event (e.g. database update signal) since Cocoon 2 will only provide
  servlet and command line capabilities. The nice integration is based on the
  fact that there will be no behavioral difference if the files are dynamically
  generated in Cocoon 2 via the servlet operation and cached internally or
  pre-generated and served directly by the web server, as long as URI contracts
  are kept the same by the system administrator (via URL-rewriting or aliasing)</P>

  <P>Also, it will be possible to avoid on-the-fly page and stylesheet
  compilation (which makes debugging harder) with command line pre-compilation
  hooks that will work like normal compilers from a developer's point of view.</P>
</DIV>

<H1>Development Status</H1><DIV id="s1">
  <P>Cocoon 2 development has already started, but should be considered &quot;alpha
  quality&quot; - i.e. certainly not ready for use on public websites!
  If you dare, you might take a look at it on the <EM>xml-cocoon2</EM>
  CVS branch in the Cocoon CVS module. If you are not a CVS expert, this means
  typing:</P>

  <PRE>cvs checkout -r xml-cocoon2 xml-cocoon</PRE>

  <P>For more information on CVS access, refer to the CVS docs on this web
  site.</P>
</DIV>

</BODY>
</DOCUMENT><P class="legal">Copyright &copy; 1999-2000 The Apache Software Foundation.<BR>All rights reserved.</P></BODY></HTML>