1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548
|
Road Map for Future Development of PDF::Builder 01 Apr 2023
In order to encourage others to contribute code and/or algorithms to the
effort, I am publishing this road map of where I would like the product to go.
Please, no copyrighted code or patented algorithms, unless the owner releases
them under an Open Source license! The content of this road map is open to
discussion, too, on the GitHub bugs list (feature requests with the
"enhancement" label or "general discussion" label). If you have a one-off
suggestion, there is a contact page on my site, so you don't have to sign up
for GitHub to be heard.
If there are no contributions on something, I reserve the right to write my
own modules (dependent on PDF::Builder) and sell them.
I make no promises that any of the following items will be implemented; it
depends on how much free time I can come up with, and how many people chip in
to help with code and algorithms. I'll be happy to discuss coding specific
requirements for money/donations (but the result is still free software).
The assignment to section I or II is somewhat arbitrary, and an item could move
from one to the other. Some of these items are already listed in bug reports,
or as feature requests. There is no particular order to these items (i.e.,
they are not ranked by priority).
=============================================================================
I. Items to add to the core product
These are things that should be in the base PDF::Builder product, as everyone
will need them (or, it would be cleaner to have it in the base rather than as
an add-on separate module).
A. ## DONE ## release 3.018
Proper TTF/OTF support (RT 113700), especially ligature replacement and
complex script alphabets using GSUB and GPOS information. I've been looking
at Pango and directly using HarfBuzz, but both look to be a lot of work.
I know of a developer tinkering with an add-on layer using Pango for both
PDF::API2 and Builder, but it's not clear to me how far he intends to go
beyond simple markup to change fonts (a la HTML presentation tags). If it
uses Pango, hopefully the other stuff will come along for free. We'll see.
For Western (simple) scripts, automatic ligature support would be wonderful,
but we need to be able to suppress selected ligatures (e.g., 'ff' in the
English word 'shelfful'). Support for swashes and alternate glyph choices
would be very nice to have (embedded markup language?). For complex scripts
like Arabic family and southern/southeastern Asian families, proper support
(Pango?) is vital.
UPDATE: See Text::Layout and HarfBuzz::Shaper packages. Layout is usable
with Builder (but no explicit support yet). Shaper is supported by Builder
for ligatures and complex scripts. Need to see if it supports true small
and petite capitals (included with font) as a sort of alternate glyphs.
It's probably not feasible to decompose outline fonts and shrink them
down nonlinearly (stroke widths reduced less than overall height/width)
and recreate the new outlines as synthetic small/petite caps.
(A1.) Look at examples/HarfBuzz.pl to see some problems with ligatures. In some
cases, such as "waffle", a PDF Reader can search for and find it even if
"ffl" has been replaced by a ligature (single glyph). However, in other
cases, such as "strasse", the Reader can NOT find the word when "ss" has
been replaced by an eszet. My keyboard doesn't have an eszet, so I can't
easily test if it can be searched for. I don't think there's anything in
the PDF::Builder code which is substituting eszet for "ss". Interestingly,
the "st" ligature in the same word does not present any problem.
B. Unification of font support: including character set and encoding support
improvements [see CTS 16/#81 and RT 120048] to make more commonality between
using UTF-8 and single byte encodings, across all the font types (core,
TrueType, Type1/PS, etc.). One problem with core fonts is, even though most
core fonts are already TrueType, that only the Latin-1 glyph set has widths
defined, and only single byte encodings are possible (similar for Type1/PS
fonts). To support UTF-8 for core and PS, the font might have to be built on
the fly for a page (like a synthetic font), with translations to single
bytes for all glyphs. If the resulting font exceeds 256 characters,
something would have to be done to split the page internally into two or
more sections, each with their own embedded virtual font. Glyph widths would
have to be available for all characters.
Add: start a subfont with the ASCII set and empty top 128. Add new chars
to it (from x80 to xFF) as single byte glyphs (not matching any standard
encoding) and use this new subfont. When it fills up and more characters
are needed on a page, start another subfont.
C. Improved documentation, possibly even a book giving detailed explanations
and examples, as both a reference and a tutorial. Needless to say, there
would have to be sufficient interest to warrant the time and expense of
writing/editing and publishing (in any format) a book to be sold!
D. PDF/A (archival document management, RT 120375): this might be more than
throwing a few flags/overriding flags to force font embedding and no
encryption/ passwords. There may be other stuff that needs to be done to
achieve recognition as a proper archival format (and there are apparently
several archival formats).
E. JPEG2000 image file support (CTS 12/#97): I don't know if this is worth it,
as there seems to be very little use of this, but if someone is interested,
have at it... any other newish image formats that PDF can support?
F. Fix Bar Code generation (CTS 1/#48): there seems to be something quite wrong
with the current bar code generation, so it's possible that no one is using
it in real documents yet. It's also possible that I'm not writing my test
cases properly -- does anyone know if they work? I suspect that the use of
XForms (relocatable text and graphics) for the bar image is not scaling
nicely, and may have to be replaced by drawn graphics primitives (text and
graphics drawn in their final place). Many other 1D and 2D bar codes
(including QR) would be good, but perhaps the bar codes should go into a
separate module, due to their potentially large code size and use of "new"
Perl modules. Even the existing 5 or so formats could be moved out, as
presumably no one is using them yet (if they are, in fact, broken). This is
in section I, as bar codes are already implemented in the base, but it's
possible that bar codes could be removed and reimplemented in section II as
a separate library or module.
Update: Actually printing out the example bar codes separates the "merged"
blobs into discrete lines. This may indicate rounding errors when presenting
on a low-to-medium resolution computer screen. However, on a consumer-grade
laser printer, the lines are still so close that I fear most scanners would
hae trouble reading the bar codes. I may have to do something with reducing
bar widths a little to allow for irregular edges ("blotting").
Take a look at package PDF::QRCode. It uses Text::QRCode internally, but
the interesting feature is using a code monkey to integrate itself (like
a retrovirus) into PDF::Builder, and can be called $gfx->qrcode(parms).
Long term: consider a new package that generates an output-agnostic generic
graphics list, along with sample GD, PS, PDF::Builder drivers; as well as
PDF::Builder generic barcode. Grab all sorts of Perl and non-Perl open
source generators and snag their algorithms to incorporate into the library.
incl PHP tecnickcom/tc-lib/barcode, Perl PDF::QRCode, etc. Allow qw/code1
code2/ in use BarCodes statement, as most users will just want to import a
very small subset of available codes.
G. Fix Small Caps (and capitalization in general) for ligatures (CTS 13/#79):
some ligatures given in Unicode or single byte encodings don't get properly
uppercased. The probable solution would be to decompose ligatures to their
individual letters before capitalization or Small/Petite Caps (if an
uppercase version doesn't exist in the font, or use GSUB processing to
recreate a ligature from the capital letters). As Perl doesn't seem to handle
capitalizing ligatures properly, a "capitals" function would need to be
offered, as well as improvements to the Small Caps in "synfonts". Various
non-Latin single characters (e.g., Greek terminal/nonterminal sigma, German
eszett, long s) also may need proper handling for capitalization.
UPDATE: It may be better to use individual letters (rather than ligature
Unicode points), allowing easy capitalization and small caps. Then use
HarfBuzz::Shaper to replace lowercase letter sequences with true ligatures
on the fly.
H. Fallback glyphs (CTS 5/#56) when a desired glyph is not found in one font,
but can be found in another. This is similar to HTML when you give a font
family list in CSS. Pango might help with this.
UPDATE: This is being considered for Text::Layout, but nothing scheduled yet.
I. Support for tagged structure (CTS 17/#76 and RT 120375). At the least, don't
corrupt an existing tagged PDF file when extracting pages.
J. Adding comment fields to any object (and possibly standalone comments as
their own objects). An example would be an image object with a comment
giving the source image file, for debugging purposes).
K. Text method to move to arbitrary points: relative or absolute movement
horizontally and vertically (a range of units), including tab support
(including \t and \v embedded in text), and maybe \n while we're at it.
Note that tabs bring up some issues. First, a tab by character count (the
traditional way, e.g., to the next n8-th column) is useful only for
monospaced fonts, and no changes in font size in the line. Thus, tab stops
would be more useful when defined by some absolute dimension (e.g., inches
or mms) of column position. Second, tabbing is usually done to get text
columns (sub columns), which involves a lot of manual setup and twiddling of
text. Consider using a TABLE within the column or page to get text organized
into the desired format (see "tbl" addition in section II).
UPDATE: The $text->distance() call permits arbitrary delta-x and delta-y
(in points) of text movement. It's not hard to convert non-points to
points, such as distance(2.4/in, -1.23/cm). For tables, take a look at
the PDF::Table package for now.
L. Determine what it is about "CJK" fonts (.ttf and .otf) that makes them
incompatible with synfont [RT 130040] and embedding [RT 130041], and fix if
possible. Are separate CJK fonts even necessary these days? Also note that
many CJK fonts refuse to "subset" when embedded (the entire font gets
embedded, even if you only use a handful of glyphs!).
M. Add decorative rectangular box effects around sections of text. With or
without border (allow rounded corners) and background color, drop shadows
(3D effect), etc. The box is drawn at given dimensions and location, and
the text written over it in the usual manner. Content clipping might also
be supported. See PDF::Table for drawing rules (and borders) for ideas, as
well as block background colors (gfx_bg object before text object).
N. Extend HarfBuzz::Shaper use (see A.) to flow paragraphs and sections (fill)
to match capability of existing text-fill calls. Architect so as to extend
easily to full paragraph shaping and "pouring" text into arbitrary columns,
with balancing. Justification to avoid ragged-left or -right needs to be
handled carefully for connected glyphs (e.g., Arabic, Indic, cursive Latin).
Treat HarfBuzz::Shaper handled-fonts just like any other font when it comes
to various text-handling routines (including length, justifying and aligning
text, filling lines, paragraph, section, textlabel, etc.).
O. ## DONE ## release 3.020
Support for PNM and related graphics images. See RT 132844.
P. Look into using TeX::Hyphen or Text::Hyphen to split words properly. The
latter supports 3 languages besides English. Both use TeX (Knuth-Liang)
algorithms. Eventually need means to override built-in hyphenation (e.g.,
force 're-cord' or 'rec-ord', per context), similar to ligature control.
One way would be to insert soft-hypen ­, this might require all
syllable breaks to be so marked, since TH might not see it any more.
Use any place where a string is flowed into multiple lines, and eventually
for complete paragraph shaping. Retain camelCase and punctuation/numeric
splitting as fallback for non-words. Consider own built-in version of either
TH, using CTAN directly (would take rewrite of the Perl code?), if can't
get owner to upgrade it.
Q. ## DONE ## release 3.020
Look at Basic/PDF/File.pm 'Q>' unpack (ref RT 133131). Supposedly will not
run on a 32 bit platform -- will it work on such platform to check if high
33 bits are all 0, and convert the low 32 bits to a fullword integer?
R. TTF/OTF font embedding: consider -forceASCII and -forceLatin1 options to
make the entire ASCII alphabet (or the entire Latin-1 alphabet) embedded,
not just what subset of characters (and glyphs) were used in text. This
might be useful in fillable forms and any other situation where an end user
gets to type in text. There is already the ability to embed the entire
font, but that's often overkill.
S. Many requests in wkHTMLtoPDF to output an entire web page into a single
PDF page. This is not as ridiculous as it sounds, as there are roll
printers that effectively have no fixed page height. For Builder, this
would mean outputting to a page of indefinite height, which in turn means
that Y-coordinate values as input would have to be updated to the new,
extended page. Many calls to Builder routines involve x-y coordinates on
the (fixed size) page, as opposed to a continuous print-at-the-bottom
model. Perhaps start with a normal fixed size (minimum) page and allow
Y-coordinates to go negative, and fix them all up at the end before or
during render to the file? Another possibility is to use a negative
Y-coordinate to mean "place me at the bottom". Also remember that beyond
200 inches in height, support in readers and tools will vary.
T. Extend the pageLabel() call to not only label the reader's thumb, but
also place the SAME page label text somewhere on the page. This might be
combined with header() and footer() calls, possibly to call pageLabel()
when the page numbering field is encountered (if flag set to do both).
The idea is to minimize labor and ensure a consistent page numbering
between the paper and the reader's thumb. header() and footer() might be
placed in the top and bottom margins, leaving it to the user not to write
into these areas. See #171 for pageLabel() enhancements. Also, page label
for outlines/bookmarks should be consistent with the thumb and what's
printed on the page. A manual page number call to put the page label on
an arbitrary place (such as centered in the outside margin) would be good.
U. Consider PCF (bitmapped font format) font support, in a manner similar to
the more primitive BDF support. PCF fonts are common on some systems (X11
used them extensively), but it's unlikely that there would be much interest
to widely use them today.
See https://fontforge.org/docs/techref/pcf-format.html
V. Add calls to insert Javascript actions for various objects. Not all Readers
may support the same Javascript level, so be careful. There are many actions
and many triggers that are supported. There may be some overlap with
annotation links (such as opening files or links on a click).
W. Per #187, some PDF Writers produce PDFs with /StructTreeRoot. Check to see
if this item is properly handled (not lost or corrupted) by Builder. Issue
was PDF::API2 accused of producing bad PDF, but turned out to be referencing
a /StructTreeRoot but that object was missing (so apparently not a bug in
API2 or Builder). See if newly-created PDF should include a STR, or at least
check if it is referenced but not present (in general, check if any
referenced object does not exist).
=============================================================================
II. Items to add to a separate area (new module or sub-module)
These are things that not everyone will require, and so should be split out
into possibly a separate module (dependent on PDF::Builder). Some of these
things are getting into the realm of support for markup languages and word
processing.
A. Hyphenation and paragraph shaping: including CTS 20/#183 (Hyphenation) and
CTS 24/#95 (pseudo page objects). The idea is to use Knuth et al.'s line-
splitting and paragraph shaping algorithms to flow text into a space in a
visually pleasing manner, while obeying widows and orphans constraints (as
well as not orphaning headings). Pango may help here with line and word
splitting.
** Look into item P in section I **
Text::KnuthPlass might be usable for general paragraph setup (allows non-
rectangular paragaph), uses Text::Hyphen internally, don't know yet if it
does the full Knuth-Plass "rivers of white" and "too many hyphens in a row"
bit. Interaction with Text::Layout stuff (font, size, changes)?
I've gotten Text::Hyphen and Text::KnuthPlass working, and tests suggest
that they (mostly) work great. Unfortunately, KP doesn't build on all
systems (a simple fix), so I may have to forcibly take it over if the owner
doesn't respond reasonably soon, or just take it into the product. Hyphen
could use some improvements for multilanguage support, also waiting for a
response. Some sort of "virtual printing (see 'B') would be necessary, as
dealing with widows and orphans would be much easier that way.
B. Virtual pages: this would be related to item (A) (paragraph shaping), where
PDF code would not be immediately written to an output page, but would be
buffered, and output only later. This permits easier paragraph shaping and
other rearrangements across columns and pages, where the starting location
of a line of text is in the buffer, and it can be updated when moving the
line around. Even individual words might be tagged (location and hyphenation
points) so that lines could be broken at will. Even a limited amount of
virtuality (virtual line output) could be useful for resetting a baseline
to accommodate a change in font size -- this might involve tagging a word or
block of words of the same height.
PDF::Table product (or table functionality within Builder) could make use
of virtual printing to "print" to a mini-page within a cell (fixed width,
min/max height). If it doesn't fit on the page, decide where to split row
to avoid widows and orphans. Also knowing cell height and row height, can
vertically align a cell's contents.
C. General text flowing capability, to fill irregularly shaped columns (such
as with intruding inserts or margin notes) in a balanced manner, including
spanning headings across all columns, where appropriate. This would also
include flowing text around images, tables, or other inserts to avoid
leaving large empty sections of pages (e.g., have a large table that floats
to the next page, with text after it that could easily come before it on
the original page). Something to handle cross references would be handy
here, to output "see table X above" or "below", "on the previous page"/"on
the next page", "on page X", etc. in a prescribed and consistent manner.
Note that it might be good to notify the user during processing that such a
move has been done, so that it can be inspected.
UPDATE: Columns could be any shape, drawn with lines, polylines, arcs,
circles, splines, etc. Text baselines don't necessarily have to be
horizontal. Clip baselines to the "column" shape to get the line length
and starting point for each line of text. There may have to be some
iterations to reshape a line if its height results in a shift of the
baseline into a wider or narrower area.
D. Font Families: per CTS 22/#94, make it easier to deal with switching fonts
and variations within a font (bold, italic, size, color, underline, small
caps, etc.), possibly with HTML tags inlined. The idea would be to only have
to specify a typeface and initial size, and then switch in and out of
variants (bold, italic, etc.) without having to call all the font routines
yourself. Perhaps several formats of markup could be supported (HTML,
Markdown, troff), driven by a definition file? Pango may help with this, at
least with font-specification markup.
UPDATE: Text::Layout package may prove very useful here, although it needs
some enhancements (a new "back end" to return text for shaping, rather
than directly outputting it).
E. Continuing (D), eventually much of HTML and other common markups (headings,
quotes, HTML entities, tables, lists, etc.) supported. One goal would be to
eventually support enough of each markup to have a separate converter
product (HTMLtoPDF, troffToPDF, etc.), but support for full Javascript and
CSS (for HTML pages) will be a bear! Some level of macros (predefined
strings) would be useful. Non-HTML might be converted to HTML or Pango
format first.
F. Support for SVG graphics (drawing), support for troff's eqn, pic, and tbl
markup languages to make it easier to do anything other than plain text.
LaTeX equation and table handling would be good to have, too, to avoid
having to rewrite marked up text. Also provide a full graphing functions
library (stacked/unstacked line, bar, scatter plots etc. in 2D and 3D).
eqn might be done with something based on (translated from) the MathJax
Javascript package (TypeScript source). There doesn't appear to be a Perl
implementation, and supposedly MathJax is highly dependent on the browser
DOM, which could be a problem. However, it might be easier than starting
from scratch (with all due credit and links to the MathJax authors/site).
I won't know unless I take a deep dive into the MathJax source.
G. Prepress production markup: convenience functions to place a watermark or
draft notice on all (or selected) pages, crop marks (based on trimbox),
temporarily draw page bounding boxes, temporarily draw object limit boxes,
color dots/bars for color printing alignment, instructions to the (human)
printer.
Page background color or pattern should extend to the full size of the page
and not end when content ends part way down the page. Remind users that
most printers will not print all the way to the edge. See Boxes.pl example.
H. Incorporate PDF::Table into PDF::Builder::Table. Simplify it somewhat (e.g.,
instead of separate line-width and color settings, use a list: w (width in
points with default color and solid line), or [w, color, optional-dash-
pattern]. Use it for borders and rules, and possibly frames. A "frame" would
be the enclosure for the table, and would be either a line spec or a width
and pattern (3D raised, 3D sunken, sunken table, raised table, floating
table with shadow, etc.). A "rule" would be horizontal and vertical
divider lines, and a "border" would be cell dividers ([w, color, margin-to-
cell edge, optional-dash-pattern]). Other simplifications and consolidations
of settings as justified (do not have to maintain absolute compatibility
with existing PDF::Table). Tables continued to the next page would not get
a full frame at the bottom/top, but a heavy dashed line (if breaking in
middle of a cell) or heavy solid line (if breaking at a row boundary). It
might be good not to automatically create a next page and start outputting
the rest of the table, but hold the contents and alert the programmer that
at least one more call is needed to finish.
Currently, PDF::Table basically equalizes column widths as much as it can,
but consider a starting point of relative and/or absolute column widths,
like many other table implementations. Some columns absolute widths, and
remaining space divided up even among *'s (with a column N*), subject to
minimum and maximum widths.
Within a cell, ideally it would be treated as a mini-page, with all the
normal PDF construction capabilities including paragraph shaping, flow into
column(s), images, etc. This would be better than the "text_block" used in
PDF::Table (more uniform coding and treatment), although some ideas from
text_block might find a home. However, such complex treatment (a table
could embed a table) requires virtual pages to permit a lot of rearrangement.
I. Consider Optional Content Groups (Layers), per 32000-2008 section 8.11.
This permits drawings to be shown by layer, or a watermark/copyright layer
to show only on printing.
J. ## DONE ## release 3.020 (documentation improved)
Per wkHTMLtoPDF issue 4846, at least some phone cameras output "portrait"
mode photos in landscape mode (rotated), with an "orientation" tag. JPEG
(at least) image handling may need a rotation flag in the call, and/or
pay attention to the Exif orientation flag. Confirmed that
Builder JPEG support does NOT respect the orientation flag.
https://www.google.com/search?client=firefox-b-1-d&q=jpeg+orientation+metadata
Unfortunately, there are a number of ways to specify the orientation flag
including XML '<exif:Orientation>Top-left</exif::Orientation>' and buried
somewhere in the Exif or JFIF header of the file. It might be best to ask
the image what its orientation is, and leave translation and rotation of
the placed image to user code, rather than trying to flip the contents of
the image file directly. See writeup in Docs.pm.
Supposedly CSS img { image-orientation: from-image; } will get from EXIF,
although this may have been deprecated.
K. A means to update PDF content already written to the $pdf data structure,
but not yet written to file. This could involve moving a text write (one
or more translates) to somewhere else on the page, or even moving from one
page to another. More general editing of material already "written" out,
such as changing a font size or a color -- would need a way to find the
desired content to be modified. Ability to erase content or move it from
page to page (e.g., started a table and then found it wouldn't fit on one
page -- move it to the top of the next page to allow display on a single
page). If virtual pages, design with all this in mind, so easy to update
content before being "written" out. If nothing is written to the PDF file
until the end (save or saveas), this might even supersede virtual pages.
L. (See B and H items above) Consider a $page->subpage(x, y, w, h, opts) to
subclass a page into a restricted area (not necessarily rectangular, or
bounded in height). Things like columns and table cells could then be
treated as normal pages (inherit text and graphics contexts?) and anything
could be put into them. May want to wait until have virtual write
capability rather than writing to file right away, for fitting table cells
into a page, and balancing columns. Once in, build-in PDF::Table capability
as PDF::Builder::Table, and can have arbitrary content in a table cell.
=============================================================================
III. A new architecture, in a new package
These are things that would be useful in being able to handle page design
in a higher-level manner. This would almost certainly be another package,
such as PDF::Builder::Environments.
A. Introduce the concept of environments, which take care of handling much of
the busy-work of laying out a page.
page -- including margins (define the work area); page numbering, outline,
and slider thumb consistent page numbers; headings/footings. also
odd/even page layouts, odd-only, or single page (roll printer) layout
column(s) -- define single or multiple columns, not necessarily of
rectangular outline or of the same width or height. lower
level environments can modify the initial outline (eat
away at it). balanced columns could be done
heading -- per column or per page, keep together along with first two
lines of following paragraph
footnote -- per column or per page, grows up from the bottom and can
spill to the next page. mostly for keeping track of the
bottom of the usable area
margin note -- a minipage eating into a column and margin, limited to
column height. must be defined before filling column
inset -- usually rectangular (but could be any shape) that eats into
the space for one or two columns
floats -- per column or per page, for images, etc., may float left or
right (but not cause a hole in the middle of a column)
table -- per column or per page, somewhat like PDF::Table, but probably
only fixed width columns, and content fill left to author
list -- per column or per page; sl, ul, ol, dl; content fill left to
author
span all or selected columns
heading
footnote
floats -- may be centered here
table
list
The idea is that you start an environment (either implicitly or explicitly)
and are responsible for all the content. At the start, you are told the
position of the upper left corner (not necessarily a baseline), the allowed
width, and the maximum height on this page. You fill in the content as you
wish. At the end, you close the environment. The current PDF::Builder is
only used for low-level stuff (primitives); the higher-level structure is
in this package.
The purpose is to allow you to put any kind of content anywhere, such as
another list within a list item, or another table within a table cell (as
well as a table within a list or a list within a table cell). The current
PDF::Table severely limits your content to (currently) simple text of
constant font, font size, color, etc. It is up to you to obey the limits
of the extent of your content (no checks for overflow). Some environments
(e.g., insets) will eat into the real estate defined for their parents, so
they will have to be fully defined and filled before you start filling
their parents.
A table row would need to be defined (filled) before ink can be put down,
so there may also need to be a table row environment. To handle vertical
alignment, or decide where to split a row across a page break, the table
environment needs to know the height of each row. This may need a means of
tentatively writing a virtual row, moving part to a new page (or down their
cell) if necessary, and commanding that the ink be put down. It may even be
necessary to move the entire row to the next page (e.g., a cell's content
is an image). Note that a new page is handled by the top-level program,
rather than having one automatically created -- unlike PDF::Table,
permitting more control over the layout and appearance.
Balanced columns could be done. If the columns are purely rectangular (no
insets, floats, etc.) and of the same width (not necessarily the same
height), we could fill in the same order as they are read and if not
completely filled, move already-formatted lines one at a time until
balanced. Columns which do not meet these criteria would be more difficult.
An estimate could be made of where to cut the columns by tracking the "area"
used by the longer column(s), and redistributing it to the short column(s),
then refilling the columns to the new height limit. This would be expensive,
particularly if using Knuth-Plass paragraph shaping, as it might have to be
done several times per page. Likewise, fixing widows and orphans by changing
leading would be much more expensive in non-rectangular columns.
Lists are vertical, and include unordered lists (you select marker), simple
lists (like unordered, but no marker), ordered list (decimal, lc roman,
UC Roman, lc alpha, UC Alpha) with selectable starting value (default 1).
For unordered and ordered lists, the marker is right justified in a
specified width, with a specified gutter before the content (and specified
prefix and suffix strings around the marker). The marker can be specified to
be 'inside' or 'outside' the content (if inside, the content is indented).
The content area coordinates are returned to the user, who fills it in with
whatever they want, obeying the size limits and any indentation. Reasonable
defaults can be used for all these settings. Definition lists would have
the term(s) by default bold, and always break and indent for the content,
or only if the term exceeds some maximum width. Again, the content is the
responsibility of the user, so any kind of content can be given. Each list
item is handled separately, although a wrapper could be written to do
multiple items at once (simple text only).
B. Before implementing any of "A", see how a RTL language should be handled.
|