1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585
|
package VCP::Filter::map;
=head1 NAME
VCP::Filter::map - rewrite name, branch_id or delete revisions
=head1 SYNOPSIS
## In a .vcp file:
Map:
name_glob_1<branch_1> name_out_1<branch_result_1>
name_glob_2<branch_2> name_out_2<branch_result_2>
# ... etc ...
## From the command line:
vcp <source> map: name_glob_1<branch_1> name_out_1<branch_result_1> -- <dest>
## you may have one or more ( pattern match ) pairs on the command
## line, ending with --
## the <branch> part of the maps is optional.
=head1 DESCRIPTION
Maps source files, revisions, and branches to destination files and
branches while copying a repository. This is done by rewriting the
C<name> and C<branch_id> of revisions according to a list of rules.
=head2 Rules
A rule is a pair of expressions specifying a pattern to match against
each incoming revision's name and branch_id and a replacement expression
specifying the revision's new name and branch_id.
The list of rules is evaluated top down; the first rule in the list
that matches is used to generate the new name and branch_id. If no
other rules match the implicit default rule is to copy files as is.
=head2 Patterns and Replacement Expressions
Patterns and replacements are each are composed of two subexpressions, the
C<name_expr> and the C<branch_id_expr> like so:
name_expr<branch_id_expr>
The C<< <branch_id_expr> >> (including angle brackets) is optional and may
be forbidden by some sources or destinations that embed the concept of a
branch in the name_expr. (See L<VCP::Dest::p4|VCP::Dest::p4> for an
example, though this may be changed in the future).
For now, the symbols C<#> and C<@> are reserved for future used in all
expressions and must be escaped using C<\>, and various shell-like
wildcards are implemented in pattern expressions.
=head1 Pattern Expressions
Both the C<name_expr> and C<branch_id_expr> specify patterns using
shell regular expression syntax with the extension that parentheses
are used to extract portions of the match in to numbered variables
which may be used in the result construction, like Perl regular
expressions:
? Matches one character other than "/"
* Matches zero or more characters other than "/"
... Matches zero or more characters, including "/"
(foo) Matches "foo" and stores it in the $1, $2, etc
Some example pattern C<name_expr>s are:
Pattern
name_expr Matches
========= =======
foo the top level file "foo"
foo/bar the file "foo/bar"
... all files (like a missing name_expr)
foo/... all files under "foo/"
.../bar all files named "bar" anywhere
*/bar all files named "bar" one dir down
....pm all files ending in ".pm"
?.pm all top level 4 char files ending in ".pm"
\?.pm the top level file "?.pm"
(*)/... all files in subdirs, puts the top level dirname in $1
Unix-style slashes are used, even on operating systems where that may
not be the preferred local custom. A pattern consisting of the empty
string is legal and matches everything (NOTE: currently there is no way
to take advantage of this; quoting is not implemented in the forms
parser yet. use "..." instead).
Relative paths are taken relative to the rev_root indicated in the
source specification for pattern C<name_expr>s (or in the destination
specification for result C<name_expr>s). For now, a relative path is a
path that does not begin with the character C</>, so be aware that the
pattern C<(/)> is relative. This is a limitation of the implementation
and may change, until it does, don't rely on a leading "(" making a path
relative and use multiple rules to match multiple absolute paths.
If no C<name_expr> is provided, C<...> is assumed and the pattern will
match on all filenames.
Some example pattern C<branch_id_expr>s are:
Pattern
branch_id_expr Matches files on
============= ================
<> no branch label
<...> all branches (like a missing <branch_id_expr>)
<foo> branch "foo"
<R...> branches beginning with "R"
<R(...)> branches beginning with "R", the other chars in $1
If no C<branch_id_expr> is provided, files on all branches are matched.
C<*> and C<...> still match differently in pattern C<branch_id_expr>s, as
in <name_expr> patterns, but this is likely to make no difference, as
I've not yet seen a branch label with a "/" in it. Still, it is wise
to avoid "*" in C<branch_id_expr> patterns.
Some example composite patterns are (any $ variables set
are given in parenthesis):
Pattern Matches
======= =======
foo<> top level files named "foo" not on a branch
(...)<> all files not on a branch ($1)
(...)/(...)<> all files not on a branch ($1,$2)
...<R1> all files on branch "R1"
.../foo<R...> all files "foo" on branches beginning with "R"
(...)/foo<R(...)> all files "foo" on branches beginning with "R" ($1, $2)
=head2 Escaping
Null characters and newlines are forbidden in all expressions.
The characters C<#>, C<@>, C<[>, C<]>, C<{>, C<}>, C<E<gt>>, C<E<lt>>
and C<$> must be escaped using a C<\>, as must any wildcard characters
meant to be taken literally.
In result expressions, the wildcard characters C<*>, C<?>, the wildcard
trigraph C<...> and parentheses must each be escaped with single C<\> as
well.
No other characters are to be escaped.
=head2 Case sensitivity
By default, all patterns are case sensitive. There is no way to
override this at present; one will be added.
=head2 Result Expressions
Result expressions look a lot like patthern expressions except that
wildcards are not allowed and C<$1> and C<${1}> style variable
interpolation is.
To explore result expressions, let's look at converting set of example
files between cvs and p4 repositories. The difficulty here is that cvs
and p4 have differing branching implementations.
Let's assume our CVS repository has a module named C<flibble> with a
file named C<foo/bar> in it. Here is a branch diagram, with the main
development trunk shown down the left (C<1.1> through C<1.6>, etc) and
a single branch, tagged in CVS with a branch tag of C<beta_1>, is
shown forking off version C<1.5>:
flibble/foo/bar:
1.1
|
...
|
1.5
| \
| \ beta_1
| \
1.6 \
| 1.5.2.1
... |
|
1.5.2.2
|
...
NOTE 1: You can use C<vcp> to extract graphical branch diagrams by
installing AT&T's GraphViz package and the Perl CPAN module
GraphViz.pm. Then you can use a command like:
$ vcp cvs:/var/cvsroot:flibble/foo/bar \
branch_diagram:foo_bar.png
to generate a .png file showing something like the above diagram.
On the other hand, p4 users typically branch files using directory
names. Here's file C<foo/bar> again, with the main trunk held in the main
depot's //depot/main directory, again with a branch after the 5th
version of the file, but this time, the branch is represented by taking
a copy
//depot/main/foo/bar
#1
|
...
|
#5
|\
| \ //depot/beta_1/foo/bar
| \
#6 \
| #1
... |
|
#2
|
...
NOTE 2: the p4 command allows users to branch in very crafty and
creative ways; it does not enforce the semantic of 1 branch per
directory, and this gives p4 users a lot of power and flexibility.
It also means that you might need some pretty crafty and creative
branch maps when converting from p4 to other repositories.
NOTE 3: that branch looks like a copy, but is actually just a
metadata entry in the perforce repository, so it's very low
overhead in terms of server effort and disk space, usually
even more so than CVS branches.
NOTE 4: Using GraphViz (as described in NOTE 1 above), you can
build a diagram like this using vcp:
$ vcp p4:perforce.our.com:1666://depot/flibble/foo/bar \
branch_diagram:foo_bar.png
A user may or may not choose to label a branch in p4 with something
called a "branch specification" (see "p4 help branch" for details). For
this discussion, we'll assume they didn't.
First, let's look at cvs -> p4 conversion. To do this, we need to
match the branch tags in the CVS repository and use them to map branched
files in to a p4 subdirectory. Here's .vcp file for this:
## cvs2p4.vcp
Source:
# get all files in the flibble module from cvs
cvs:/var/cvsroot:flibble/...
Destination:
# Put the files in the flibble directory in the main depot of p4
p4:perforce.our.com:1666://depot/flibble/...
Map:
# Pattern Result
# ============ =======
(...)<> main/$1 # main trunk => //depot/flibble/main/...
(...)<(...)> $2/$1 # branches => //depot/flibble/$branch/...
The C<Source:> and C<Destination:> fields are just pieces of a normal
C<vcp> command line moved in to C<cvs2p4.vcp>. The C<Map:> field is a
list of rules composed of pattern, result expression pairs.
In this example, all of the map expressions are relative paths. The
patterns are relative to the C<Source:> cvs repositories' "C<flibble>"
module. The results are relative to the C<Destination:> p4
repositories' "C<//depot/flibble/>" directory.
The first rule maps all files that have no branch tag in to the p4
directory C<//depot/flibble/main/>. The C<< (...)<> >> pattern has two
parts: a C<name> part and a C<branch_id> part. The C<name> part,
C<(...)>, matches all path names and copies them to the C<$1> variable.
The C<branch_id> part, C< <> >, matches empty / missing C<branch_id>s
(C<vcp>'s name for the CVS branch tag associated with a file on a
branch). The C< main/$1 > result retrieves the C<name> part stored in
C<$1> and prefixes it with "C<main/>" to build the final C<name> value.
The second rule maps all files on branches to an appropriately named
subdirectory in the p4 destination. The pattern is a lot like the first
rule's, but has a C<branch_id> part that matches all C<branch_id>s and
copies them in to C<$2>. The rule merely uses this C<branch_id> from
C<$2> instead of the hardcoded "C<main/>" string to place the branches
in appropriate subdirectories.
Here's how our flibble/foo/bar file version fare when passed through
this mapping:
CVS flibble/... p4 //depot/flibble/...
======================== ======================
foo/bar#1.1 main/foo/bar#1
foo/bar#1.2 main/foo/bar#2
... ...
foo/bar#1.5.2.1 beta_1/foo/bar#1
foo/bar#1.5.2.2 beta_1/foo/bar#2
... ...
It's up to you to be sure there are no branches tagged "C<main>" in the
CVS repository. Also, no branch specification will be created in the
target p4 repository (this is a limitation that should be fixed).
=head2 Result Actions: <<delete>> and <<keep>>
The result expression C<< <<delete>> >> indicates to delete the revision,
while the result expression C<< <<keep>> >> indicates to pass it through
unchanged:
Map:
# Pattern Result
# ================= ==========
old_stuff/... <<delete>> # Delete all files in /old
old_stuff/.../*.c <<keep>> # except these
<<delete>> and <<keep>> may not appear in results; they are
standalone tokens.
=head2 The default rule
There is a default rule
... <<keep>> ## Default rule: passes everything through as-is
that is evaluated after all the other rules. Thus, if no other rule
matches a revision, it is passed through unchanged.
=head2 Command Line Parsing
For large maps or repeated use, the map is best specified in a .vcp
file. For quick one-offs or scripted situations, however, the map:
scheme may be used on the command line. In this case, each parameter
is a "word" (separated by whitespace) and every pair of words is a
( pattern, result ) pair.
Because L<vcp|vcp> command line parsing is performed incrementally and
the next filter or destination specifications can look exactly like
a pattern or result, the special token "--" is used to terminate the
list of patterns provided on the command line. This may also
be the last word in the C<Map:> section of a .vcp file, but that is
superfluous. It is an error to use "--" before the last word in a .vcp
file.
=for test_script t/61map.t
=cut
$VERSION = 1 ;
@ISA = qw( VCP::Filter );
use strict ;
use VCP::Logger qw( lg );
use VCP::Debug qw( :debug );
use VCP::Utils qw( shell_quote );
use VCP::Filter;
use Regexp::Shellish qw( compile_shellish );
#use base qw( VCP::Filter );
#use fields (
# 'MAP_SUB', ## The rules to apply, compiled in to an anon sub
#);
my @expr_order = qw( name branch_id );
sub _parse_expr {
my ( $type, $v ) = @_;
my %exprs;
return () unless defined $v;
if ( $type eq "result" ) {
return ( delete => 1, %exprs ) if $v eq "<<delete>>";
return ( keep => 1, %exprs ) if $v eq "<<keep>>";
}
@exprs{@expr_order} =
$v =~ m{
\A
(?:(
(?: \\. | [^<\\] )+ ## name
))?
(?:
<(
.* ## branch_id
)>
)?
\z
}x;
die "unable to parse map $type '$v'\n"
unless grep defined, values %exprs;
for ( @expr_order ) {
next unless defined $exprs{$_};
die "newline in the $_ expression '$exprs{$_}' of map $type '$v'\n"
if $exprs{$_} =~ tr/\n//;
die "unescaped '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n"
if $exprs{$_} =~
( $type eq "pattern"
? qr{(?<!\\)(?:\\\\)*([\@#<>\[\]{}\$])}
: qr{(?<!\\)(?:\\\\)*([\@#<>\[\]*?()]|\.\.\.)|(?<!\$)\{}
);
die "illegal escape sequence '$1' in the $_ expression '$exprs{$_}' of map $type '$v'\n"
if $exprs{$_} =~ qr{(?<!\\)(?:\\\\)*(\\(?!=\.\.\.)[^\@#<>\[\]{}*?()])};
}
return %exprs;
}
sub _compile_rule {
my $self = shift;
my ( $name, $pattern, $result ) = @_;
my %pattern_exprs = _parse_expr pattern => $pattern;
my %result_exprs = _parse_expr result => $result;
## The test expression is a single regexp that matches a string
## built up from some pieces of the rev metadata. Right now, only
## the name and the branch_id are tested, by someday the labels,
## change_id, rev_id, and comment could be tested. If so, the
## comment field would need to come last due to newline issues.
my $test_expr =
! keys %pattern_exprs
? 1 ## This happens iff the pattern was undef (which
## should only happen for the default rule).
: join(
"",
"m'\\A", ## Note the single-quotish context
join(
"\\n", ## Newlines are forbidden in all expressions.
map defined $_
? do {
my $re = compile_shellish( $_, { anchors => 0 } );
$re =~ s{(')}{\\`}g;
$re =~ s{\A\(\?[\w-]*: (.*) \)}{$1}gx; # for readability
# of dumped code
$re;
}
: ".*",
@pattern_exprs{@expr_order}
),
"\\z'",
);
$pattern = defined $pattern ? qq{"$pattern"} : "match all";
my $result_statement = join(
"",
debugging()
? qq{lg( ' matched $name ($pattern)' );\n}
: (),
$result_exprs{keep}
? (
debugging()
? qq{lg( " <<keep>>ing" );\n}
: (),
"return \$self->SUPER::handle_rev( \$rev );\n"
)
: $result_exprs{delete}
? (
debugging()
? qq{lg( " <<delete>>ing" );\n}
: (),
"return \$self->skip_rev;\n"
)
: (
map(
defined $result_exprs{$_}
? do {
my $expr = $result_exprs{$_};
$expr =~ s{([\\"])}{\\$1}g;
$expr =~ s{\n}{\\n}g;
(
debugging()
? qq{lg( " rewriting $_ to '$expr'" );\n}
: (),
qq{\$rev->$_( "$expr" );\n}
)
}
: (
debugging()
? qq{lg( " not rewriting $_" );\n}
: (),
),
@expr_order
),
"return \$self->dest->handle_rev( \$rev );\n"
)
);
$result_statement =~ s/^/ /gm;
"if ( $test_expr ) {\n$result_statement}\n";
}
sub _compile_rules {
my $self = shift;
my ( $rules ) = @_;
my $field_get_exprs = join ", ", map qq{\$rev->$_ || ""}, @expr_order;
## NOTE: making this a closure causes spurious warnings at exit so
## we pass $self explicitly.
my $preamble = <<END_PREAMBLE;
my ( \$self, \$rev ) = \@_;
local \$_ = join "\\n", $field_get_exprs;
END_PREAMBLE
$preamble .= qq{my \$s = \$_; \$s =~ s/\\n/\\\\n/g; lg( "map testing '\$s' (", \$rev->as_string, ")" );\n\n}
if debugging;
my $rule_number;
my $code = join "",
$preamble,
map $self->_compile_rule( @$_ ),
map( [ "Rule " . ++$rule_number, @$_ ], @$rules ),
[ "Default Rule", undef, "<<keep>>" ];
$code =~ s/^/ /mg;
$code = "#line 1 VCP::Filter::map::map_function\n$code";
$code = "sub {\n$code}";
debug "map code:\n$code" if debugging;
return( eval $code
or die "$@ compiling\n",
do {
my $w = length( $code =~ tr/\n// + 1 ) ;
my $ln;
1 while chomp $code;
$code =~ s{^}[sprintf "%${w}d|",++$ln]gme;
"$code\n";
},
);
}
sub new {
my $self = shift->SUPER::new;
my ( $spec, $options ) = @_ ;
$self->{MAP_SUB} = $self->_compile_rules(
$self->parse_rules_list( $options, "Pattern", "Replacement" )
);
return $self ;
}
sub handle_rev {
my $self = shift;
$self->{MAP_SUB}->( $self, @_ );
}
=head1 LIMITATIONS
There is no way (yet) of telling the mapper to continue processing the
rules list. We could implement labels like C<< <label> >> to be
allowed before pattern expressions (but not between pattern and result),
and we could then impelement C<< <goto label> >>. And a C<< <next> >>
could be used to fall through to the next label. All of which is
wonderful, but I want to gain some real world experience with the
current system and find a use case for gotos and fallthroughs before I
implement them. This comment is here to solicit feedback :).
=head1 AUTHOR
Barrie Slaymaker <barries@slaysys.com>
=head1 COPYRIGHT
Copyright (c) 2000, 2001, 2002 Perforce Software, Inc.
All rights reserved.
See L<VCP::License|VCP::License> (C<vcp help license>) for the terms of use.
=cut
1
|