File: usermanual-clusters.xml

package info (click to toggle)
harfbuzz 12.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 100,084 kB
  • sloc: ansic: 77,785; cpp: 61,949; python: 4,961; xml: 4,651; sh: 426; makefile: 105
file content (701 lines) | stat: -rw-r--r-- 26,079 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
  <!ENTITY version SYSTEM "version.xml">
]>
<chapter id="clusters">
  <title>Clusters</title>
  <section id="clusters-and-shaping">
    <title>Clusters and shaping</title>
    <para>
      In text shaping, a <emphasis>cluster</emphasis> is a sequence of
      characters that needs to be treated as a single, indivisible
      unit. A single letter or symbol can be a cluster of its
      own. Other clusters correspond to longer subsequences of the
      input code points &mdash; such as a ligature or conjunct form
      &mdash; and require the shaper to ensure that the cluster is not
      broken during the shaping process.
    </para>
    <para>
      A cluster is distinct from a <emphasis>grapheme</emphasis>,
      which is the smallest unit of meaning in a writing system or
      script.
    </para>
    <para>
      The definitions of the two terms are similar. However, clusters
      are only relevant for script shaping and glyph layout. In
      contrast, graphemes are a property of the underlying script, and
      are of interest when client programs implement orthographic 
      or linguistic functionality.
    </para>
    <para>
      For example, two individual letters are often two separate
      graphemes. When two letters form a ligature, however, they
      combine into a single glyph. They are then part of the same
      cluster and are treated as a unit by the shaping engine &mdash;
      even though the two original, underlying letters remain separate
      graphemes.
    </para>
    <para>
      HarfBuzz is concerned with clusters, <emphasis>not</emphasis>
      with graphemes &mdash; although client programs using HarfBuzz
      may still care about graphemes for other reasons from time to time.
    </para>
    <para>
      During the shaping process, there are several shaping operations
      that may merge adjacent characters (for example, when two code
      points form a ligature or a conjunct form and are replaced by a
      single glyph) or split one character into several (for example,
      when decomposing a code point through the
      <literal>ccmp</literal> feature). Operations like these alter
      clusters; HarfBuzz tracks the changes to ensure that no clusters
      get lost or broken during shaping. 
    </para>
    <para>
      HarfBuzz records cluster information independently from how
      shaping operations affect the individual glyphs returned in an
      output buffer. Consequently, a client program using HarfBuzz can
      utilize the cluster information to implement features such as:
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  Correctly positioning the cursor within a shaped text run,
	  even when characters have formed ligatures, composed or
	  decomposed, reordered, or undergone other shaping operations.
	</para>
      </listitem>
      <listitem>
	<para>
	  Correctly highlighting a text selection that includes some,
	  but not all, of the characters in a word. 
	</para>
      </listitem>
      <listitem>
	<para>
	  Applying text attributes (such as color or underlining) to
	  part, but not all, of a word.
	</para>
      </listitem>
      <listitem>
	<para>
	  Generating output document formats (such as PDF) with
	  embedded text that can be fully extracted.
	</para>
      </listitem>
      <listitem>
	<para>
	  Determining the mapping between input characters and output
	  glyphs, such as which glyphs are ligatures.
	</para>
      </listitem>
      <listitem>
	<para>
	  Performing line-breaking, justification, and other
	  line-level or paragraph-level operations that must be done
	  after shaping is complete, but which require examining
	  character-level properties.
	</para>
      </listitem>
    </itemizedlist>
  </section>
  <section id="working-with-harfbuzz-clusters">
    <title>Working with HarfBuzz clusters</title>
    <para>
      When you add text to a HarfBuzz buffer, each code point must be
      assigned a <emphasis>cluster value</emphasis>.
    </para>
    <para>
      This cluster value is an arbitrary number; HarfBuzz uses it only
      to distinguish between clusters. Many client programs will use
      the index of each code point in the input text stream as the
      cluster value. This is for the sake of convenience; the actual
      value does not matter.
    </para>
    <para>
      Some of the shaping operations performed by HarfBuzz &mdash;
      such as reordering, composition, decomposition, and substitution
      &mdash; may alter the cluster values of some characters. The
      final cluster values in the buffer at the end of the shaping
      process will indicate to client programs which subsequences of
      glyphs represent a cluster and, therefore, must not be
      separated.
    </para>
    <para>
      In addition, client programs can query the final cluster values
      to discern other potentially important information about the
      glyphs in the output buffer (such as whether or not a ligature
      was formed).
    </para>
    <para>
      For example, if the initial sequence of cluster values was:
    </para>
    <programlisting>
      0,1,2,3,4
    </programlisting>
    <para>
      and the final sequence of cluster values is:
    </para>
    <programlisting>
      0,0,3,3
    </programlisting>
    <para>
      then there are two clusters in the output buffer: the first
      cluster includes the first two glyphs, and the second cluster
      includes the third and fourth glyphs. It is also evident that a
      ligature or conjunct has been formed, because there are fewer
      glyphs in the output buffer (four) than there were code points
      in the input buffer (five).
    </para>
    <para>
      Although client programs using HarfBuzz are free to assign
      initial cluster values in any manner they choose to, HarfBuzz
      does offer some useful guarantees if the cluster values are
      assigned in a monotonic (either non-decreasing or non-increasing)
      order.
    </para>
    <para>
      For buffers in the left-to-right (LTR)
      or top-to-bottom (TTB) text flow direction,
      HarfBuzz will preserve the monotonic property: client programs
      are guaranteed that monotonically increasing initial cluster
      values will be returned as monotonically increasing final
      cluster values.
    </para>
    <para>
      For buffers in the right-to-left (RTL)
      or bottom-to-top (BTT) text flow direction,
      the directionality of the buffer itself is reversed for final
      output as a matter of design. Therefore, HarfBuzz inverts the
      monotonic property: client programs are guaranteed that
      monotonically increasing initial cluster values will be
      returned as monotonically <emphasis>decreasing</emphasis> final
      cluster values.
    </para>
    <para>
      Client programs can adjust how HarfBuzz handles clusters during
      shaping by setting the
      <literal>cluster_level</literal> of the
      buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
      clustering support for this property:
    </para>
    <itemizedlist>
      <listitem>
	<para><emphasis>Level 0</emphasis> is the default.
	</para>
	<para>
	  The distinguishing feature of level 0 behavior is that, at
	  the beginning of processing the buffer, all code points that
	  are categorized as <emphasis>marks</emphasis>,
	  <emphasis>modifier symbols</emphasis>, or
	  <emphasis>Emoji extended pictographic</emphasis> modifiers,
	  as well as the <emphasis>Zero Width Joiner</emphasis> and
	  <emphasis>Zero Width Non-Joiner</emphasis> code points, are
	  assigned the cluster value of the closest preceding code
	  point from <emphasis>different</emphasis> category.
	</para>
	<para>
	  In essence, whenever a base character is followed by a mark
	  character or a sequence of mark characters, those marks are
	  reassigned to the same initial cluster value as the base
	  character. This reassignment is referred to as
	  "merging" the affected clusters. This behavior is based on
	  the Grapheme Cluster Boundary specification in <ulink
	  url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
	  Technical Report 29</ulink>.
	</para>
	<para>
	  This cluster level is suitable for code that likes to use
	  HarfBuzz cluster values as an approximation of the Unicode
	  Grapheme Cluster Boundaries as well.
	</para>
	<para>
	  Client programs can specify level 0 behavior for a buffer by
	  setting its <literal>cluster_level</literal> to
	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>. 
	</para>
      </listitem>
      <listitem>
	<para>
	  <emphasis>Level 1</emphasis> tweaks the old behavior
	  slightly to produce better results. Therefore, level 1
	  clustering is recommended for code that is not required to
	  implement backward compatibility with the old HarfBuzz.
	</para>
	<para>
	  <emphasis>Level 1</emphasis> differs from level 0 by not merging the
	  clusters of marks and other modifier code points with the
	  preceding "base" code point's cluster. By preserving the
	  separate cluster values of these marks and modifier code
	  points, script shapers can perform additional operations
	  that might lead to improved results (for example, coloring
	  mark glyphs differently than their base).
	</para>
	<para>
	  Client programs can specify level 1 behavior for a buffer by
	  setting its <literal>cluster_level</literal> to
	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>. 
	</para>
      </listitem>
      <listitem>
	<para>
	  <emphasis>Level 2</emphasis> differs significantly in how it
	  treats cluster values. In level 2, HarfBuzz never merges
	  clusters.
	</para>
	<para>
	  This difference can be seen most clearly when HarfBuzz processes
	  ligature substitutions and glyph decompositions. In level 0
	  and level 1, ligatures and glyph decomposition both involve
	  merging clusters; in level 2, neither of these operations
	  triggers a merge.
	</para>
	<para>
	  Client programs can specify level 2 behavior for a buffer by
	  setting its <literal>cluster_level</literal> to
	  <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>. 
	</para>
      </listitem>
    </itemizedlist>
    <para>
      As mentioned earlier, client programs using HarfBuzz often
      assign initial cluster values in a buffer by reusing the indices
      of the code points in the input text. This gives a sequence of
      cluster values that is monotonically increasing (for example,
      0,1,2,3,4).
    </para>
    <para>
      It is not <emphasis>required</emphasis> that the cluster values
      in a buffer be monotonically increasing. However, if the initial
      cluster values in a buffer are monotonic and the buffer is
      configured to use cluster level 0 or 1, then HarfBuzz
      guarantees that the final cluster values in the shaped buffer
      will also be monotonic. No such guarantee is made for cluster
      level 2.
    </para>
    <para>
      In levels 0 and 1, HarfBuzz implements the following conceptual
      model for cluster values:
    </para>
    <itemizedlist spacing="compact">
      <listitem>
	<para>
          If the sequence of input cluster values is monotonic, the
	  sequence of cluster values will remain monotonic.
	</para>
      </listitem>
      <listitem>
	<para>
          Each cluster value represents a single cluster.
	</para>
      </listitem>
      <listitem>
	<para>
          Each cluster contains one or more glyphs and one or more
          characters.
	</para>
      </listitem>
    </itemizedlist>
    <para>
      In practice, this model offers several benefits. Assuming that
      the initial cluster values were monotonically increasing
      and distinct before shaping began, then, in the final output:
    </para>
    <itemizedlist spacing="compact">
      <listitem>
	<para>
	  All adjacent glyphs having the same final cluster
	  value belong to the same cluster.
	</para>
      </listitem>
      <listitem>
	<para>
          Each character belongs to the cluster that has the highest
	  cluster value <emphasis>not larger than</emphasis> its
	  initial cluster value.
	</para>
      </listitem>
    </itemizedlist>
  </section>

  <section id="a-clustering-example-for-levels-0-and-1">
    <title>A clustering example for levels 0 and 1</title>
    <para>
      The basic shaping operations affect clusters in a predictable
      manner when using level 0 or level 1: 
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  When two or more clusters <emphasis>merge</emphasis>, the
	  resulting merged cluster takes as its cluster value the
	  <emphasis>minimum</emphasis> of the incoming cluster values.
	</para>
      </listitem>
      <listitem>
	<para>
	  When a cluster <emphasis>decomposes</emphasis>, all of the
	  resulting child clusters inherit as their cluster value the
	  cluster value of the parent cluster.
	</para>
      </listitem>
      <listitem>
	<para>
	  When a character is <emphasis>reordered</emphasis>, the
	  reordered character and all clusters that the character
	  moves past as part of the reordering are merged into one cluster.
	</para>
      </listitem>
    </itemizedlist>
    <para>
      The functionality, guarantees, and benefits of level 0 and level
      1 behavior can be seen with some examples. First, let us examine
      what happens with cluster values when shaping involves cluster
      merging with ligatures and decomposition.
    </para>

    <para>
      Let's say we start with the following character sequence (top row) and
      initial cluster values (bottom row):
    </para>
    <programlisting>
      A,B,C,D,E
      0,1,2,3,4
    </programlisting>
    <para>
      During shaping, HarfBuzz maps these characters to glyphs from
      the font. For simplicity, let us assume that each character maps
      to the corresponding, identical-looking glyph:
    </para>
    <programlisting>
      A,B,C,D,E
      0,1,2,3,4
    </programlisting>
    <para>
      Now if, for example, <literal>B</literal> and <literal>C</literal>
      form a ligature, then the clusters to which they belong
      &quot;merge&quot;. This merged cluster takes for its cluster
      value the minimum of all the cluster values of the clusters that
      went in to the ligature. In this case, we get:
    </para>
    <programlisting>
      A,BC,D,E
      0,1 ,3,4
    </programlisting>
    <para>
      because 1 is the minimum of the set {1,2}, which were the
      cluster values of <literal>B</literal> and
      <literal>C</literal>. 
    </para>
    <para>
      Next, let us say that the <literal>BC</literal> ligature glyph
      decomposes into three components, and <literal>D</literal> also
      decomposes into two components. Whenever a cluster decomposes,
      its components each inherit the cluster value of their parent: 
    </para>
    <programlisting>
      A,BC0,BC1,BC2,D0,D1,E
      0,1  ,1  ,1  ,3 ,3 ,4
    </programlisting>
    <para>
      Next, if <literal>BC2</literal> and <literal>D0</literal> form a
      ligature, then their clusters (cluster values 1 and 3) merge into
      <literal>min(1,3) = 1</literal>:
    </para>
    <programlisting>
      A,BC0,BC1,BC2D0,D1,E
      0,1  ,1  ,1    ,1 ,4
    </programlisting>
    <para>
      Note that the entirety of cluster 3 merges into cluster 1, not
      just the <literal>D0</literal> glyph. This reflects the fact
      that the cluster <emphasis>must</emphasis> be treated as an
      indivisible unit.
    </para>
    <para>
      At this point, cluster 1 means: the character sequence
      <literal>BCD</literal> is represented by glyphs
      <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
      further.
    </para>
  </section>
  <section id="reordering-in-levels-0-and-1">
    <title>Reordering in levels 0 and 1</title>
    <para>
      Another common operation in some shapers is glyph
      reordering. In order to maintain a monotonic cluster sequence
      when glyph reordering takes place, HarfBuzz merges the clusters
      of everything in the reordering sequence.
    </para>
    <para>
      For example, let us again start with the character sequence (top
      row) and initial cluster values (bottom row):
    </para>
    <programlisting>
      A,B,C,D,E
      0,1,2,3,4
    </programlisting>
    <para>
      If <literal>D</literal> is reordered to the position immediately
      before <literal>B</literal>, then HarfBuzz merges the
      <literal>B</literal>, <literal>C</literal>, and
      <literal>D</literal> clusters &mdash; all the clusters between
      the final position of the reordered glyph and its original
      position. This means that we get:
    </para>
    <programlisting>
      A,D,B,C,E
      0,1,1,1,4
    </programlisting>
    <para>
      as the final cluster sequence.
    </para>
    <para>
      Merging this many clusters is not ideal, but it is the only
      sensible way for HarfBuzz to maintain the guarantee that the
      sequence of cluster values remains monotonic and to retain the
      true relationship between glyphs and characters.
    </para>
  </section>
  <section id="the-distinction-between-levels-0-and-1">
    <title>The distinction between levels 0 and 1</title>
    <para>
      The preceding examples demonstrate the main effects of using
      cluster levels 0 and 1. The only difference between the two
      levels is this: in level 0, at the very beginning of the shaping
      process, HarfBuzz merges the cluster of each base character
      with the clusters of all Unicode marks (combining or not) and
      modifiers that follow it.
    </para>
    <para>
      For example, let us start with the following character sequence
      (top row) and accompanying initial cluster values (bottom row):
    </para>
    <programlisting>
      A,acute,B
      0,1    ,2
    </programlisting>
    <para>
      The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
      using cluster level 0 on this sequence, then the
      <literal>A</literal> and <literal>acute</literal> clusters will
      merge, and the result will become:
    </para>
    <programlisting>
      A,acute,B
      0,0    ,2
    </programlisting>
    <para>
      This merger is performed before any other script-shaping
      steps.
    </para>
    <para>
      This initial cluster merging is the default behavior of the
      Windows shaping engine, and the old HarfBuzz codebase copied
      that behavior to maintain compatibility. Consequently, it has
      remained the default behavior in the new HarfBuzz codebase.
    </para>
    <para>
      But this initial cluster-merging behavior makes it impossible
      for client programs to implement some features (such as to
      color diacritic marks differently from their base
      characters). That is why, in level 1, HarfBuzz does not perform
      the initial merging step.
    </para>
    <para>
      For client programs that rely on HarfBuzz cluster values to
      perform cursor positioning, level 0 is more convenient. But
      relying on cluster boundaries for cursor positioning is wrong: cursor
      positions should be determined based on Unicode grapheme
      boundaries, not on shaping-cluster boundaries. As such, using
      level 1 clustering behavior is recommended. 
    </para>
    <para>
      One final facet of levels 0 and 1 is worth noting. HarfBuzz
      currently does not allow any
      <emphasis>multiple-substitution</emphasis> GSUB lookups to 
      replace a glyph with zero glyphs (in other words, to delete a
      glyph).
    </para>
    <para>
      But, in some other situations, glyphs can be deleted. In
      those cases, if the glyph being deleted is the last glyph of its
      cluster, HarfBuzz makes sure to merge the deleted glyph's
      cluster with a neighboring cluster.
    </para>
    <para>
      This is done primarily to make sure that the starting cluster of the
      text always has the cluster index pointing to the start of the text
      for the run; more than one client program currently relies on this
      guarantee.
    </para>
    <para>
      Incidentally, Apple's CoreText does something different to
      maintain the same promise: it inserts a glyph with id 65535 at
      the beginning of the glyph string if the glyph corresponding to
      the first character in the run was deleted. HarfBuzz might do
      something similar in the future.
    </para>
  </section>
  <section id="level-2">
    <title>Level 2</title>
    <para>
      HarfBuzz's level 2 cluster behavior uses a significantly
      different model than that of level 0 and level 1.
    </para>
    <para>
      The level 2 behavior is easy to describe, but it may be
      difficult to understand in practical terms. In brief, level 2 
      performs no merging of clusters whatsoever.
    </para>
    <para>
      This means that there is no initial base-and-mark merging step
      (as is done in level 0), and it means that reordering moves and
      ligature substitutions do not trigger a cluster merge.
    </para>
    <para>
      Only one shaping operation directly affects clusters when using
      level 2:
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  When a cluster <emphasis>decomposes</emphasis>, all of the
	  resulting child clusters inherit as their cluster value the
	  cluster value of the parent cluster.
	</para>
      </listitem>
    </itemizedlist>
    <para>
      When glyphs do form a ligature (or when some other feature
      substitutes multiple glyphs with one glyph) the cluster value
      of the first glyph is retained as the cluster value for the
      resulting ligature.
    </para>
    <para>
      This occurrence sounds similar to a cluster merge, but it is
      different. In particular, no subsequent characters &mdash;
      including marks and modifiers &mdash; are affected. They retain
      their previous cluster values. 
    </para>
    <para>
      Level 2 cluster behavior is ultimately less complex than level 0
      or level 1, but there are several cases for which processing
      cluster values produced at level 2 may be tricky. 
    </para>
    <section id="ligatures-with-combining-marks-in-level-2">
      <title>Ligatures with combining marks in level 2</title>
      <para>
	The first example of how HarfBuzz's level 2 cluster behavior
	can be tricky is when the text to be shaped includes combining
	marks attached to ligatures.
      </para>
      <para>
	Let us start with an input sequence with the following
	characters (top row) and initial cluster values (bottom row):
      </para>
      <programlisting>
	A,acute,B,breve,C,circumflex
	0,1    ,2,3    ,4,5
      </programlisting>
      <para>
	If the sequence <literal>A,B,C</literal> forms a ligature,
	then these are the cluster values HarfBuzz will return under
	the various cluster levels:
      </para>
      <para>
	Level 0:
      </para>
      <programlisting>
	ABC,acute,breve,circumflex
	0  ,0    ,0    ,0
      </programlisting>
      <para>
	Level 1:
      </para>
      <programlisting>
	ABC,acute,breve,circumflex
	0  ,0    ,0    ,5
      </programlisting>
      <para>
	Level 2:
      </para>
      <programlisting>
	ABC,acute,breve,circumflex
	0  ,1    ,3    ,5
      </programlisting>
      <para>
	Making sense of the level 2 result is the hardest for a client
	program, because there is nothing in the cluster values that
	indicates that <literal>B</literal> and <literal>C</literal>
	formed a ligature with <literal>A</literal>.
      </para>
      <para>
	In contrast, the "merged" cluster values of the mark glyphs
	that are seen in the level 0 and level 1 output are evidence
	that a ligature substitution took place. 
      </para>
    </section>
    <section id="reordering-in-level-2">
      <title>Reordering in level 2</title>
      <para>
	Another example of how HarfBuzz's level 2 cluster behavior
	can be tricky is when glyphs reorder. Consider an input sequence
	with the following characters (top row) and initial cluster
	values (bottom row):
      </para>
      <programlisting>
	A,B,C,D,E
	0,1,2,3,4
      </programlisting>
      <para>
	Now imagine <literal>D</literal> moves before
	<literal>B</literal> in a reordering operation. The cluster
	values will then be:
      </para>
      <programlisting>
	A,D,B,C,E
	0,3,1,2,4
      </programlisting>
      <para>
	Next, if <literal>D</literal> forms a ligature with
	<literal>B</literal>, the output is:
      </para>
      <programlisting>
	A,DB,C,E
	0,3 ,2,4
      </programlisting>
      <para>
	However, in a different scenario, in which the shaping rules
	of the script instead caused <literal>A</literal> and
	<literal>B</literal> to form a ligature
	<emphasis>before</emphasis> the <literal>D</literal> reordered, the
	result would be:
      </para>
      <programlisting>
	AB,D,C,E
	0 ,3,2,4   
      </programlisting>
      <para>
	There is no way for a client program to differentiate between
	these two scenarios based on the cluster values
	alone. Consequently, client programs that use level 2 might
	need to undertake additional work in order to manage cursor
	positioning, text attributes, or other desired features.
      </para>
    </section>
    <section id="other-considerations-in-level-2">
      <title>Other considerations in level 2</title>
      <para>
	There may be other problems encountered with ligatures under
	level 2, such as if the direction of the text is forced to
	the opposite of its natural direction (for example, Arabic text
	that is forced into left-to-right directionality). But,
	generally speaking, these other scenarios are minor corner
	cases that are too obscure for most client programs to need to
	worry about.
      </para>
    </section>
  </section>
</chapter>