1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384
|
package VCP::Filter::stringedit;
=head1 NAME
VCP::Filter::stringedit - alter any field character by character
=head1 SYNOPSIS
StringEdit:
## Convert illegal p4 characters to ^NN hex escapes and the
## p4 wildcard "..." to a safe string. The "^" is not an illegal
## char, it's replaced with an escape to allow us to use it as
## an escape character without the (extremely small) risk of
## running across a file name that actually uses it.
## Order is significant in this ruleset.
# field(s) match replacement
name,labels /([\s@#*%^])/ ^%02x
name,labels "..." ^___
StringEdit:
## underscorify each unwanted character to a single "_"
name,labels /[\s@#*%^]/ _
StringEdit:
## underscorify each run of unwanted characters to a single "_"
name,labels /[\s@#*%^]*/ _
StringEdit:
## prefix labels that don't start with a letter or underscore:
labels /([^a-zA-Z_])/ _%c
=head1 DESCRIPTION
Allows field by field string editing, using Perl regular expressions
to match characters and substrings and sprintf-like replacement
strings.
=head2 Rules
A rule is a triplet of expressions specifying a (1) set of fields to match,
(2) a pattern to match against those fields' contents (matching contents
are removed), and (3) a string to replace each of the removed bits
with.
NOTE 1: the "match" expression uses perl5 regular expressions, not
filename wildcards used in most other places in VCP configurations.
The list of rules is evaluated top down and all rules are applied to
each string.
NOTE 2: The all-rules-apply nature of this filter is different from the
behaviors of the ...Map: filters, which stop after the first matching
rule. This is because ...Map: filters are rewriting entire strings and
there can be only one result string, while the StringEdit filter may be
rewriting pieces of string and multiple rewrites may be combined to good
effect.
=head2 The Fields List
A comma separated list of field names. Any field may be edited except
those that begin with "source_".
=head2 The Match Expression
For each field, the match expression is run against the field and, if it
matches, causes all matching portions of string to be replaced.
The match expression is a full perl5 regular expression enclosed in
/.../ delimiters or a plain string, either of which may be enclosed in
'' or "" delimiters if inline spaces are needed (rare, we hope).
=head2 The Replacement Expression
Each match is replaced by one instance of the replacement expression,
optionally enclosed in single or double quotation marks.
The replacement expression provides a limited list of C sprintf style
macros:
%d The decimal codes for each character in the match
%o The octal codes for each character in the match
%x The hex codes for each character in the match
Any non-letter preceded by a backslash "\" character is replaced by
itself. Some more or less useful examples:
\% \\ \" \' \` \{ \} \$ \* \+ \? \1
If a punctuation character other than a period (.) or slash "/" follows
a letter macro, it must be escaped using the backslash character (this
is to reserve room in the spec for postfix modifiers like "*", "+", and
"?"). So, to put a literal star (*) after a hex code, you would do
something like "%02x\*".
=for the_future
%x* %x{1} %x{1,} %x{,3} %x{1,3}
The "normal" perl5 letter abbreviations are also allowed:
\t tab (HT, TAB)
\n newline (NL)
\r return (CR)
\f form feed (FF)
\b backspace (BS)
\a alarm (bell) (BEL)
\e escape (ESC)
\033 octal char (ESC)
\x1b hex char (ESC)
\x{263a} wide hex char (SMILEY)
\c[ control char (ESC)
\N{name} named Unicode character
including the following escape sequences are available in constructs
that modify what follows:
\l lowercase next char
\u uppercase next char
\L lowercase till \E
\U uppercase till \E
\E end case modification
\Q quote non-word characters till \E
As shown above, normal sprintf-style options may be included (and are
recommended), so %02x produces results like "%09" (if the match was a
single TAB character) or "%20" (if the match was a SPACE character).
The dot precision modifiers (".3") are not supported, just the leading 0
and the field width specifier.
=head2 Case sensitivity
By default, all patterns are case sensitive. There is no way to
override this at present; one will be added.
=head2 Command Line Parsing
For large stringedits or repeated use, the stringedit is best specified
in a .vcp file. For quick one-offs or scripted situations, however, the
stringedit: scheme may be used on the command line. In this case, each
parameter is a "word" and every triple of words is a ( pattern, result )
pair.
Because L<vcp|vcp> command line parsing is performed incrementally and
the next filter or destination specifications can look exactly like a
pattern or result, the special token "--" is used to terminate the list
of patterns if StringEdit: is used on the command line. This may also
be the last word in the C<StringEdit:> section of a .vcp file, but that
is superfluous. It is an error to use "--" before the last word in a
.vcp file.
=for test_script t/61stringedit.t
=cut
$VERSION = 1 ;
@ISA = qw( VCP::Filter );
use strict ;
use VCP::Logger qw( lg );
use VCP::Debug qw( :debug );
use VCP::Utils qw( empty );
use VCP::Filter;
use VCP::Rev;
#use base qw( VCP::Filter );
#use fields (
# 'MAP_SUB', ## The rules to apply, compiled in to an anon sub
#);
sub _err {
my $replacement_expr = pop;
my $msg = join "", @_;
$msg =~ s/\s*\z/ /;
die $msg, "StringEdit replacement expression '", $replacement_expr, "'\n";
}
sub _compile_replacement_expr {
my $self = shift;
my ( $replacement_expr ) = @_;
local $_ = $replacement_expr;
my @setup;
my @out;
my $match_number = 1;
my $out_number = 1;
while ( /\G([^\\%]+|[\\%])/g ) {
if ( $1 eq "\\" ) {
if ( /\G([a-zA-Z])/gc ) {
push @out, qq{\\$1};
next;
}
goto LITERAL if /\G(.)/gc;
_err "lone backslash at end of ", $replacement_expr;
}
elsif ( $1 eq "%" ) {
/\G(\d*.)/g
or _err "lone % at end of ", $replacement_expr;
my $op = $1;
_err "unknown macro '%$op' in", $replacement_expr
unless $op =~ /[cdosx]\z/;
my $ord = $op !~ /s\z/ ? " ord" : "";
push @setup,
"my \$_$out_number = sprintf( '%$op',$ord \$$match_number );";
push @out, "\${_$out_number}";
++$match_number;
++$out_number;
}
else {
LITERAL:
my $s = $1;
$s =~ s/\$/\\\$/g;
$s =~ s/\@/\\\@/g;
$s =~ s/'/\\'/g;
$s =~ s/"/\\"/g;
push @out, $s;
}
}
my $out = join "", '"', @out, '"';
return @setup
? join "", map "$_\n", @setup, $out
: $out;
}
sub _compile_rule {
my $self = shift;
my ( $name, $fields, $pattern, $replacement ) = @_;
my @fields = split /\s*,\s*/, $fields;
die "no fields specified in stringedit rule $name\n"
unless @fields;
die "unkown field name '$_' in StringEdit field list '$fields'\n"
for grep !VCP::Rev->can( $_ ), @fields;
my ( $q1, $guts, $q2 ) = $pattern =~ /\A(\/|)(.*)(\1)\z/;
my $replacement_code = $self->_compile_replacement_expr( $replacement );
map {
my $field = $_;
my $code = join( "",
$q1 eq "/"
? "s{$guts}"
: "s{" . quotemeta( $guts ) . "}",
"{$replacement_code}msge;\n"
);
{
field => $field,
code => $code,
}
} @fields;
}
sub _compile_rules {
my $self = shift;
my ( $rules ) = @_;
## NOTE: making this a closure causes spurious warnings at exit so
## we pass $self explicitly.
my $preamble = <<END_PREAMBLE;
my ( \$self, \$rev ) = \@_;
END_PREAMBLE
$preamble .= qq{lg( "stringedit processing ", \$rev->as_string );\n\n}
if debugging;
my $rule_number = 0;
my @rules = map
$self->_compile_rule( "Rule " . ++$rule_number, @$_ ),
@$rules;
my @fields = do {
my %seen;
sort grep !$seen{$_}++, map $_->{field}, @rules;
};
my $code = join( "",
$preamble,
( map
{
my $field = $_;
my $is_array = $field eq "labels";
$is_array
? (
"\$rev->set_$field( [ map {\n",
map( $_->{code}, grep $_->{field} eq $field, @rules ),
" \$_;\n} \$rev->$field ] );\n",
)
: (
"{\n",
" local \$_ = \$rev->$field;\n",
" \$_ = '' unless defined;\n",
" my \$changed;\n",
map( " \$changed = 1 if " . $_->{code}, grep $_->{field} eq $field, @rules ),
" \$rev->set_$field( \$_ ) if \$changed;\n",
"}\n",
)
} @fields
),
"\$self->dest->handle_rev( \$rev ) if \$self->dest;\n",
);
$code =~ s/^/ /mg;
$code = "#line 1 VCP::Filter::stringedit::stringedit_function\n$code";
$code = "sub {\n$code}";
debug "stringedit code:\n$code" if debugging;
return( eval $code
or die "$@ compiling\n",
do {
my $w = length( $code =~ tr/\n// + 1 ) ;
my $ln;
1 while chomp $code;
$code =~ s{^}[sprintf "%${w}d|",++$ln]gme;
"$code\n";
},
);
}
sub new {
my $self = shift->SUPER::new;
## Parse the options
my ( $spec, $options ) = @_ ;
$self->{MAP_SUB} = $self->_compile_rules(
$self->parse_rules_list( $options, "Field(s)", "Match", "Replacement" )
);
return $self ;
}
sub filter_name { return "StringEdit" }
sub handle_rev {
my $self = shift;
$self->{MAP_SUB}->( $self, @_ );
}
=head1 LIMITATIONS
There is no way (yet) of telling the stringeditor to continue processing the
rules list. We could implement labels like C< <<I<label>>> > to be
allowed before pattern expressions (but not between pattern and result),
and we could then impelement C< <<goto I<label>>> >. And a C< <<next>>
> could be used to fall through to the next label. All of which is
wonderful, but I want to gain some real world experience with the
current system and find a use case for gotos and fallthroughs before I
implement them. This comment is here to solicit feedback :).
=head1 AUTHOR
Barrie Slaymaker <barries@slaysys.com>
=head1 COPYRIGHT
Copyright (c) 2000, 2001, 2002 Perforce Software, Inc.
All rights reserved.
See L<VCP::License|VCP::License> (C<vcp help license>) for the terms of use.
=cut
1
|