File: README

package info (click to toggle)
libstring-tagged-perl 0.16-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 332 kB
  • sloc: perl: 1,786; makefile: 2
file content (630 lines) | stat: -rw-r--r-- 20,640 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
NAME

    String::Tagged - string buffers with value tags on extents

SYNOPSIS

     use String::Tagged;
    
     my $st = String::Tagged->new( "An important message" );
    
     $st->apply_tag( 3, 9, bold => 1 );
    
     $st->iter_substr_nooverlap(
        sub {
           my ( $substring, %tags ) = @_;
    
           print $tags{bold} ? "<b>$substring</b>"
                             : $substring;
        }
     );

DESCRIPTION

    This module implements an object class, instances of which store a
    (mutable) string buffer that supports tags. A tag is a name/value pair
    that applies to some non-empty extent of the underlying string.

    The types of tag names ought to be strings, or at least values that are
    well-behaved as strings, as the names will often be used as the keys in
    hashes or applied to the eq operator.

    The types of tag values are not restricted - any scalar will do. This
    could be a simple integer or string, ARRAY or HASH reference, or even a
    CODE reference containing an event handler of some kind.

    Tags may be arbitrarily overlapped. Any given offset within the string
    has in effect, a set of uniquely named tags. Tags of different names
    are independent. For tags of the same name, only the latest, shortest
    tag takes effect.

    For example, consider a string with three tags represented here:

     Here is my string with tags
     [-------------------------]  foo => 1
             [-------]            foo => 2
          [---]                   bar => 3

    Every character in this string has a tag named foo. The value of this
    tag is 2 for the words my and string and the space inbetween, and 1
    elsewhere. Additionally, the words is and my and the space between them
    also have the tag bar with a value 3.

    Since String::Tagged does not understand the significance of the tag
    values it therefore cannot detect if two neighbouring tags really
    contain the same semantic idea. Consider the following string:

     A string with words
     [-------]            type => "message"
              [--------]  type => "message"

    This string contains two tags. String::Tagged will treat this as two
    different tag values as far as iter_tags_nooverlap is concerned, even
    though get_tag_at yields the same value for the type tag at any
    position in the string. The merge_tags method may be used to merge tag
    extents of tags that should be considered as equal.

NAMING

    I spent a lot of time considering the name for this module. It seems
    that a number of people across a number of languages all created
    similar functionality, though named very differently. For the benefit
    of keyword-based search tools and similar, here's a list of some other
    names this sort of object might be known by:

      * Extents

      * Overlays

      * Attribute or attributed strings

      * Markup

      * Out-of-band data

CONSTRUCTOR

 new

       $st = String::Tagged->new( $str )

    Returns a new instance of a String::Tagged object. It will contain no
    tags. If the optional $str argument is supplied, the string buffer will
    be initialised from this value.

    If $str is a String::Tagged object then it will be cloned, as if
    calling the clone method on it.

 new_tagged

       $st = String::Tagged->new_tagged( $str, %tags )

    Shortcut for creating a new String::Tagged object with the given tags
    applied to the entire length. The tags will not be anchored at either
    end.

 clone (class)

       $new = String::Tagged->clone( $orig, %opts )

    Returns a new instance of String::Tagged made by cloning the original,
    subject to the options provided. The returned instance will be in the
    requested class, which need not match the class of the original.

    The following options are recognised:

    only_tags => ARRAY

      If present, gives an ARRAY reference containing tag names. Only those
      tags named here will be copied; others will be ignored.

    except_tags => ARRAY

      If present, gives an ARRAY reference containing tag names. All tags
      will be copied except those named here.

    convert_tags => HASH

      If present, gives a HASH reference containing tag conversion
      functions. For any tags in the original to be copied whose names
      appear in the hash, the name and value are passed into the
      corresponding function, which should return an even-sized key/value
      list giving a tag, or a list of tags, to apply to the new clone.

       my @new_tags = $convert_tags->{$orig_name}->( $orig_name, $orig_value )
       # Where @new_tags is ( $new_name, $new_value, $new_name_2, $new_value_2, ... )

      As a further convenience, if the value for a given tag name is a
      plain string instead of a code reference, it gives the new name for
      the tag, and will be applied with its existing value.

      If only_tags is being used too, then the source names of any tags to
      be converted must also be listed there, or they will not be copied.

 clone (instance)

       $new = $orig->clone( %args )

    Called as an instance (rather than a class) method, the newly-cloned
    instance is returned in the same class as the original.

 from_sprintf

       $str = String::Tagged->from_sprintf( $format, @args )

    Since version 0.15.

    Returns a new instance of a String::Tagged object, initialised by
    formatting the supplied arguments using the supplied format.

    The $format string is similar to that supported by the core sprintf
    operator, though a few features such as out-of-order argument indexing
    and vector formatting are missing. This format string may be a plain
    perl string, or an instance of String::Tagged. In the latter case, any
    tags within it are preserved in the result.

    In the case of a %s conversion, the value of the argument consumed may
    itself be a String::Tagged instance. In this case it will be appended
    to the returned object, preserving any tags within it.

    All other conversions are handled individually by the core sprintf
    operator and appended to the result.

METHODS

 str

       $str = $st->str
    
       $str = "$st"

    Returns the plain string contained within the object.

    This method is also called for stringification; so the String::Tagged
    object can be used in a plain string interpolation such as

     my $message = String::Tagged->new( "Hello world" );
     print "My message is $message\n";

 length

       $len = $st->length
    
       $len = length( $st )

    Returns the length of the plain string. Because stringification works
    on this object class, the normal core length function works correctly
    on it.

 substr

       $str = $st->substr( $start, $len )

    Returns a String::Tagged instance representing a section from within
    the given string, containing all the same tags at the same conceptual
    positions.

 plain_substr

       $str = $st->plain_substr( $start, $len )

    Returns as a plain perl string, the substring at the given position.
    This will be the same string data as returned by substr, only as a
    plain string without the tags

 apply_tag

       $st->apply_tag( $start, $len, $name, $value )

    Apply the named tag value to the given extent. The tag will start on
    the character at the $start index, and continue for the next $len
    characters.

    If $start is given as -1, the tag will be considered to start "before"
    the actual string. If $len is given as -1, the tag will be considered
    to end "after" end of the actual string. These special limits are used
    by set_substr when deciding whether to move a tag boundary. The start
    of any tag that starts "before" the string is never moved, even if more
    text is inserted at the beginning. Similarly, a tag which ends "after"
    the end of the string, will continue to the end even if more text is
    appended.

    This method returns the $st object.

       $st->apply_tag( $e, $name, $value )

    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers. The new tag will apply at the given
    extent.

 unapply_tag

       $st->unapply_tag( $start, $len, $name )

    Unapply the named tag value from the given extent. If the tag extends
    beyond this extent, then any partial fragment of the tag will be left
    in the string.

    This method returns the $st object.

       $st->unapply_tag( $e, $name )

    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers.

 delete_tag

       $st->delete_tag( $start, $len, $name )

    Delete the named tag within the given extent. Entire tags are removed,
    even if they extend beyond this extent.

    This method returns the $st object.

       $st->delete_tag( $e, $name )

    Alternatively, an existing extent object can be passed as the first
    argument instead of two integers.

 merge_tags

       $st->merge_tags( $eqsub )

    Merge neighbouring or overlapping tags of the same name and equal
    values.

    For each pair of tags of the same name that apply on neighbouring or
    overlapping extents, the $eqsub callback is called, as

      $equal = $eqsub->( $name, $value_a, $value_b )

    If this function returns true then the tags are merged.

    The equallity test function is free to perform any comparison of the
    values that may be relevant to the application; for example it may
    deeply compare referred structures and check for equivalence in some
    application-defined manner. In this case, the first tag of a pair is
    retained, the second is deleted. This may be relevant if the tag value
    is a reference to some object.

 iter_extents

       $st->iter_extents( $callback, %opts )

    Iterate the tags stored in the string. For each tag, the CODE reference
    in $callback is invoked once, being passed an extent object that
    represents the extent of the tag.

     $callback->( $extent, $tagname, $tagvalue )

    Options passed in %opts may include:

    start => INT

      Start at the given position; defaults to 0.

    end => INT

      End after the given position; defaults to end of string. This option
      overrides len.

    len => INT

      End after the given length beyond the start position; defaults to end
      of string. This option only applies if end is not given.

    only => ARRAY

      Select only the tags named in the given ARRAY reference.

    except => ARRAY

      Select all the tags except those named in the given ARRAY reference.

 iter_tags

       $st->iter_tags( $callback, %opts )

    Iterate the tags stored in the string. For each tag, the CODE reference
    in $callback is invoked once, being passed the start point and length
    of the tag.

     $callback->( $start, $length, $tagname, $tagvalue )

    Options passed in %opts are the same as for iter_extents.

 iter_extents_nooverlap

       $st->iter_extents_nooverlap( $callback, %opts )

    Iterate non-overlapping extents of tags stored in the string. The CODE
    reference in $callback is invoked for each extent in the string where
    no tags change. The entire set of tags active in that extent is given
    to the callback. Because the extent covers possibly-multiple tags, it
    will not define the anchor_before and anchor_after flags.

     $callback->( $extent, %tags )

    The callback will be invoked over the entire length of the string,
    including any extents with no tags applied.

    Options may be passed in %opts to control the range of the string
    iterated over, in the same way as the iter_extents method.

    If the only or except filters are applied, then only the tags that
    survive filtering will be present in the %tags hash. Tags that are
    excluded by the filtering will not be present, nor will their bounds be
    used to split the string into extents.

 iter_tags_nooverlap

       $st->iter_tags_nooverlap( $callback, %opts )

    Iterate extents of the string using iter_extents_nooverlap, but passing
    the start and length of each extent to the callback instead of the
    extent object.

     $callback->( $start, $length, %tags )

    Options may be passed in %opts to control the range of the string
    iterated over, in the same way as the iter_extents method.

 iter_substr_nooverlap

       $st->iter_substr_nooverlap( $callback, %opts )

    Iterate extents of the string using iter_extents_nooverlap, but passing
    the substring of data instead of the extent object.

     $callback->( $substr, %tags )

    Options may be passed in %opts to control the range of the string
    iterated over, in the same way as the iter_extents method.

 tagnames

       @names = $st->tagnames

    Returns the set of tag names used in the string, in no particular
    order.

 get_tags_at

       $tags = $st->get_tags_at( $pos )

    Returns a HASH reference of all the tag values active at the given
    position.

 get_tag_at

       $value = $st->get_tag_at( $pos, $name )

    Returns the value of the named tag at the given position, or undef if
    the tag is not applied there.

 get_tag_extent

       $extent = $st->get_tag_extent( $pos, $name )

    If the named tag applies to the given position, returns the extent of
    the tag at that position. If it does not, undef is returned. If an
    extent is returned it will define the anchor_before and anchor_after
    flags if appropriate.

 get_tag_missing_extent

       $extent = $st->get_tag_missing_extent( $pos, $name )

    If the named tag does not apply at the given position, returns the
    extent of the string around that position that does not have the tag.
    If it does exist, undef is returned. If an extent is returned it will
    not define the anchor_before and anchor_after flags, as these do not
    make sense for the range in which a tag is absent.

 set_substr

       $st->set_substr( $start, $len, $newstr )

    Modifies a extent of the underlying plain string to that given. The
    extents of tags in the string are adjusted to cope with the modified
    region, and the adjustment in length.

    Tags entirely before the replaced extent remain unchanged.

    Tags entirely within the replaced extent are deleted.

    Tags entirely after the replaced extent are moved by appropriate amount
    to ensure they still apply to the same characters as before.

    Tags that start before and end after the extent remain, and have their
    lengths suitably adjusted.

    Tags that span just the start or end of the extent, but not both, are
    truncated, so as to remove the part of the tag applied on the modified
    extent but preserving that applied outside.

    If $newstr is a String::Tagged object, then its tags will be applied to
    $st as appropriate. Edge-anchored tags in $newstr will not be extended
    through $st, though they will apply as edge-anchored if they now sit at
    the edge of the new string.

 insert

       $st->insert( $start, $newstr )

    Insert the given string at the given position. A shortcut around
    set_substr.

    If $newstr is a String::Tagged object, then its tags will be applied to
    $st as appropriate. If $start is 0, any before-anchored tags in will
    become before-anchored in $st.

 append

       $st->append( $newstr )
    
       $st .= $newstr

    Append to the underlying plain string. A shortcut around set_substr.

    If $newstr is a String::Tagged object, then its tags will be applied to
    $st as appropriate. Any after-anchored tags in will become
    after-anchored in $st.

 append_tagged

       $st->append_tagged( $newstr, %tags )

    Append to the underlying plain string, and apply the given tags to the
    newly-inserted extent.

    Returns $st itself so that the method may be easily chained.

 concat

       $ret = $st->concat( $other )
    
       $ret = $st . $other

    Returns a new String::Tagged containing the two strings concatenated
    together, preserving any tags present. This method overloads normal
    string concatenation operator, so expressions involving String::Tagged
    values retain their tags.

    This method or operator tries to respect subclassing; preferring to
    return a new object of a subclass if either argument or operand is a
    subclass of String::Tagged. If they are both subclasses, it will prefer
    the type of the invocant or first operand.

 matches

       @subs = $st->matches( $regexp )

    Returns a list of substrings (as String::Tagged instances) for every
    non-overlapping match of the given $regexp.

    This could be used, for example, to build a formatted string from a
    formatted template containing variable expansions:

     my $template = ...
     my %vars = ...
    
     my $ret = String::Tagged->new;
     foreach my $m ( $template->matches( qr/\$\w+|[^$]+/ ) ) {
        if( $m =~ m/^\$(\w+)$/ ) {
           $ret->append_tagged( $vars{$1}, %{ $m->get_tags_at( 0 ) } );
        }
        else {
           $ret->append( $m );
        }
     }

    This iterates segments of the template containing variables expansions
    starting with a $ symbol, and replaces them with values from the %vars
    hash, careful to preserve all the formatting tags from the original
    template string.

 split

       @parts = $st->split( $regexp, $limit )

    Returns a list of substrings by applying the regexp to the string
    content; similar to the core perl split function. If $limit is
    supplied, the method will stop at that number of elements, returning
    the entire remainder of the input string as the final element. If the
    $regexp contains a capture group then the content of the first one will
    be added to the return list as well.

 sprintf

       $ret = $st->sprintf( @args )

    Since version 0.15.

    Returns a new string by using the given instance as the format string
    for a "from_sprintf" constructor call. The returned instance will be of
    the same class as the invocant.

 debug_sprintf

       $ret = $st->debug_sprintf

    Returns a representation of the string data and all the tags, suitable
    for debug printing or other similar use. This is a format such as is
    given in the DESCRIPTION section above.

    The output will consist of a number of lines, the first containing the
    plain underlying string, then one line per tag. The line shows the
    extent of the tag given by [---] markers, or a | in the special case of
    a tag covering only a single character. Special markings of < and >
    indicate tags which are "before" or "after" anchored.

    For example:

      Hello, world
      [---]         word       => 1
     <[----------]> everywhere => 1
            |       space      => 1

Extent Objects

    These objects represent a range of characters within the containing
    String::Tagged object. The range they represent is fixed at the time of
    creation. If the containing string is modified by a call to set_substr
    then the effect on the extent object is not defined. These objects
    should be considered as relatively short-lived - used briefly for the
    purpose of querying the result of an operation, then discarded soon
    after.

 $extent->string

    Returns the containing String::Tagged object.

 $extent->start

    Returns the start index of the extent. This is the index of the first
    character within the extent.

 $extent->end

    Returns the end index of the extent. This is the index of the first
    character beyond the end of the extent.

 $extent->anchor_before

    True if this extent begins "before" the start of the string. Only
    certain methods return extents with this flag defined.

 $extent->anchor_after

    True if this extent ends "after" the end of the string. Only certain
    methods return extents with this flag defined.

 $extent->length

    Returns the number of characters within the extent.

 $extent->substr

    Returns the substring contained by the extent.

 $extent->plain_substr

    Returns the substring of the underlying plain string buffer contained
    by the extent.

TODO

      * There are likely variations on the rules for set_substr that could
      equally apply to some uses of tagged strings. Consider whether the
      behaviour of modification is chosen per-method, per-tag, or
      per-string.

      * Consider how to implement a clone from one tag format to another
      which wants to merge multiple different source tags together into a
      single new one.

AUTHOR

    Paul Evans <leonerd@leonerd.org.uk>