1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
|
<!-- vim:tabstop=4:shiftwidth=4:noexpandtab:textwidth=80
-->
<sect1 id="kernel_part_cpage">
<title>CPage</title>
<para>
Pdf dictionaries referenced from <xref linkend="page_tree"/> are
called page objects. These dictionaries must contain a "Type" entry
and the value must be a name object equal to "Page". Page objects
are basic building blocks of pdf files. Each page can be independent
with all required information stored in its dictionary. Some of its
properties can be inherited from parent pages. Page object describes
the appereance of the page (witdth, length, fonts used, rotation etc.).
</para>
<para>
The core of a page is one or more content streams which specify what
is on a page (text, pictures, graphics ...).
</para>
<sect2 id="observing_cpage">
<title>Observing cpage</title>
Generally, changing an object can result in many other changes.
Objects often depend on other objects. Changing a page property
can result in
<itemizedlist mark="opencircle">
<listitem><para>redraw of the page</para></listitem>
<listitem><para>redraw of other pages</para></listitem>
</itemizedlist>
This is the reason why cpage implements observer subject interface (see
<xref linkend="general_utils_observers"/>.
An object can be notified if cpage changes.
</sect2>
<sect2 id="changing_cpage">
<title>Changing cpage</title>
Page dictionary can be changed in two ways. Either by cpage methods or
by requesting raw dictionary by its reference number. If we do not
want to parse the whole pdf file, we do not have the information
whether an object is a page or just an object with page type. This
problem is solved by <xref linkend="observer" />. We observe the
underlying dictionary which implements the observer subject interface.
This way, we know every time the dictionary is changed either by cpage
or by cobject.
</sect2>
<sect2 id="changing_page_contents">
<title>Changing page contents</title>
Any change to a visible part of a page are delegated to content stream
object (<xref linkend="kernel_part_ccontentstream"/>).
</sect2>
<sect2 id="displaying_page">
<title>Displaying page</title>
Xpdf has the best displaying engine of all tested viewers. All calls
to display methods are delegated to this engine. When displaying a page
CPage creates xpdf Page object from its actual state. Then it uses Page
object method <emphasis>displaySlice()</emphasis> to display a
rectangle of a page. Xpdf creates the graphical environment for drawing
into an output device. Finally it draws the page into supplied device.
</sect2>
<sect2 id="objects_on_page">
<title>Objects on a page</title>
<para>
Everything on a page is in its content stream(s).
Every operation in a content stream means altering the actual position
by moving the drawing pen from one position to another position creating
a rectangle which can be used to constrain each object. The rectangle can
be used to order all objects into a structure which can be easily
searched. This enables effective selection of only some objects.
</para>
<sect3 id="text_on_page">
<title>Text on a page</title>
<para>
Pdf specification does not force pdf converters to keep text
structure of converted document. This means that no text element
needs to correspond to an element in the original file e.g.
paragraphs, lines, words. Even the order of single letters can
be arbitrary. Xpdf (or any other sophisticated) viewer only
guesses which letter form a word or which words form a line.
We use xpdf text engine to extract, search text.
</para>
</sect3>
<sect3 id="fonts_on_page">
<title>Fonts on a page</title>
<para>
Pdf file can refer either to extern system fonts which are
specified by pdf specification and each viewer should have
these fonts avaliable or it can inline font metrics into
the pdf file. The latter option is very tricky because it
allows the font to contain only those letters which are
actually used. CPage object supports adding fonts which
can be used on a page. Fonts must be present in the pdf
file or they must refer to system fonts.
</para>
</sect3>
</sect2>
</sect1>
|