File: repocutter.1

package info (click to toggle)
reposurgeon 4.38-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 23,500 kB
  • sloc: sh: 4,832; makefile: 514; python: 485; lisp: 115; awk: 91; ruby: 19
file content (571 lines) | stat: -rw-r--r-- 21,956 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
'\" t
.\"     Title: repocutter
.\"    Author: [see the "AUTHOR(S)" section]
.\" Generator: Asciidoctor 2.0.16
.\"      Date: 2023-02-28
.\"    Manual: \ \&
.\"    Source: \ \&
.\"  Language: English
.\"
.TH "REPOCUTTER" "1" "2023-02-28" "\ \&" "\ \&"
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.ss \n[.ss] 0
.nh
.ad l
.de URL
\fI\\$2\fP <\\$1>\\$3
..
.als MTO URL
.if \n[.g] \{\
.  mso www.tmac
.  am URL
.    ad l
.  .
.  am MTO
.    ad l
.  .
.  LINKSTYLE blue R < >
.\}
.SH "NAME"
repocutter \- surgical and filtering operations on Subversion dump files
.SH "SYNOPSIS"
.sp
\fBrepocutter\fP [\-q] [\-d n] [\-i \*(Aqfilename\*(Aq] [\-r \*(Aqselection\*(Aq] \*(Aqsubcommand\*(Aq
.SH "DESCRIPTION"
.sp
This program does surgical and filtering operations on Subversion dump
files.  While it is is not as flexible as reposurgeon(1), it can
perform Subversion\-specific transformations that reposurgeon cannot,
and can be useful for processing Subversion repositories into a form
suitable for conversion. Also, it supports the version 3 dumpfile
format, which reposurgeon does not.
.sp
In most commands, the \-r (or \-\-range) option limits the selection of
revisions over which an operation will be performed. Usually other
revisions will be passed through unaltered, except in the select and
deselect commands for which the option controls which revisions will be
passed through. A selection consists of one or more comma\-separated
ranges. A range may consist of an integer revision number or the
special name HEAD for the head revision. Or it may be a
colon\-separated pair of integers, or an integer followed by a colon
followed by HEAD.
.sp
If the output stream contains copyfrom references to missing revisions,
repocutter silently patch each copysources by stepping it backwards to
the most recent previous version that exists.
.sp
(Older versions of this tool, before 4.30, treated \-r as an implied
selection filter rather than passing through unselected revisions
unaltered. If you have old scripts using repocutter they may need
modification.)
.sp
Normally, each subcommand produces a progress spinner on standard
error; each turn means another revision has been filtered. The \-q (or
\-\-quiet) option suppresses this. Quiet mode is set when output is
redirected to a file or pipe.
.sp
The \-d option enables debug messages on standard error. It takes an
integer debug level. These messages are probably only of interest to
repocutter developers.
.sp
The \-i option sets the input source to a specified filename.
This is primarily useful when running the program under a debugger.
When this option is not present the program expects to read a
stream from standard input.
.sp
Generally, if you need to use this program at all, you will find that
you need to pipe your dump file through multiple instances of it doing
one kind of operation each.  This is not as expensive as it sounds;
with the exception of the reduce subcommand, the working set of this
program is bounded by the size of the the largest single blob plus its
metadata.  It does not need to hold the entire repo metadata in
memory.
.sp
The \-f/\-fixed option disables regexp compilation of PATTERN arguments,
treating them as literal strings.
.sp
The \-t option sets a tag to be included in error and warning messages.
This will be useful for determining which stage of a multistage
repocutter pipeline failed.
.sp
There are a few other command\-specific options described under
individual commands.
.sp
In the command descriptions, PATTERN arguments are regular expressions
to match pathnames, constrained so that each match must be a path
segment or a sequence of path segments; that is, the left end must be
either at the start of path or immediately following a /, and the
right end must precede a / or be at end of string.  With a leading ^
the match is constrained to be a leading sequence of the pathname;
with a trailing $, a trailing one.
.sp
The following subcommands are available:
.sp
select
.RS 4
The \*(Aqselect\*(Aq subcommand selects a range and permits only revisions and
nodes in that range to pass to standard output.  A range beginning with 0
includes the dumpfile header. Mergeinfo properties in all revisions are
updated so they no longer refer to omitted revisions.
.sp
Warning::valid dump that can be read by reposurgeon. In particular, it may delete
a revision that is referenced in a later copy\-from operation, which will
crash reposurgeon.
.RE
.sp
deselect
.RS 4
The \*(Aqdeselect\*(Aq subcommand selects a range and permits only revisions and nodes
NOT in that range to pass to standard output.  Any mergeinfo properties in other
revisions are updated so they no longer refer to dropped revisions.
.sp
Warning::valid dump that can be read by reposurgeon. In particular, it may delete
a revision that is referenced in a later copy\-from operation, which will
crash reposurgeon.
.RE
.sp
see
.RS 4
Render a very condensed report on the repository node structure, mainly
useful for examining strange and pathological repositories.  File content
is ignored.  You get one line per repository operation, reporting the
revision, operation type, file path, and the copy source (if any).
Directory paths are distinguished by a trailing slash.  The \*(Aqcopy\*(Aq
operation is really an \*(Aqadd\*(Aq with a directory source and target;
the display name is changed to make them easier to see. This report
can be restricted by a selection set.
.RE
.sp
renumber
.RS 4
Renumber all revisions, patching Node\-copyfrom headers as required.
Any selection option is ignored. Takes no arguments.  The \-b option
can be used to set the base to renumber from, defaulting to 0.
.RE
.sp
count
.RS 4
The \*(Aqcount\*(Aq subcommand lists the last revision number in the input stream.
This is normally the revision count, buut may not if the stream has omitted
revisions.
.RE
.sp
log
.RS 4
Generate a log report, same format as the output of svn log on a
repository, to standard output.
.RE
.sp
setlog
.RS 4
Replace the log entries in the input dumpfile with the corresponding entries
in the LOGFILE, which should be in the format of an svn log output.
Replacements may be restricted to a specified range.
.RE
.sp
propdel
.RS 4
Delete the property PROPNAME. May be restricted by a revision
selection. You may specify multiple properties to be deleted.
.RE
.sp
proprename
.RS 4
Rename the property OLDNAME to NEWNAME. May be restricted by a
revision selection. You may specify multiple properties to be renamed.
.RE
.sp
propset
.RS 4
Set the property PROPNAME to PROPVAL.
.sp
May be restricted by a revision selection. Note that specifying only a revision
will cause the property  to be seet on the revision properties and on all nodes
in the rtevision; you\(cqll probably want to specify a node index.
.sp
You may specify multiple property settings.
.RE
.sp
propclean
.RS 4
Every path with a suffix matching one of SUFFIXES gets a property turned
off.  The default property is svn::Another property may be set with the \-p option.
.RE
.sp
expunge
.RS 4
Delete all operations with Node\-path or Node\-copyfrom\-path headers matching
specified Golang regular expressions (opposite of \*(Aqsift\*(Aq).  Any revision
left with no Node records after this filtering has its Revision record dropped as
well. Mergeinfo properties in all revisions are updated so they no longer refer
to dropped revisions.
.sp
Warning::valid dump that can be read by reposurgeon. In particular, it may delete
a revision that is referenced in a later copy\-from operation, which will
crash reposurgeon.
.RE
.sp
sift
.RS 4
Delete all operations with either Node\-path or Node\-copyfrom\-path headers \fBnot\fP
matching specified Golang regular expressions (opposite of \*(Aqexpunge\*(Aq).
Any revision left with no Node records after this filtering has its Revision record
removed as well. Mergeinfo properties in all revisions are updated so they no longer refer
to dropped revisions.
.sp
This transform can be restricted by a selection set.
.sp
Warning::valid dump that can be read by reposurgeon. In particular, it may delete
a revision that is referenced in a later copy\-from operation, which will
crash reposurgeon.
.RE
.sp
closure
.RS 4
The \*(Aqclosure\*(Aq subcommand computes the transitive closure of a path set under the
relation \*(Aqcopies from\*(Aq \- that is, with the smallest set of additional paths such
that every copy\-from source is in the set.
.RE
.sp
pathlist
.RS 4
List all distinct node\-paths in the stream, once each, in the order first
encountered.
.RE
.sp
pathrename
.RS 4
Modify Node\-path headers, Node\-copyfrom\-path headers, and
svn::expression FROM; replace with TO.  TO may contain Golang\-style
backreferences (${1}, ${2} etc \- curly brackets not optional) to
parenthesized portions of FROM.
.sp
Matches are constrained so that each match must be a path segment or a
sequence of path segments; that is, the left end must be either at the
start of path or immediately following a /, and the right end must
precede a / or be at end of string.  With a leading ^ the match is
constrained to be a leading sequence of the pathname; with a trailing
$, a trailing one.
.sp
Multiple FROM/TO pairs may be specified and are applied in order.
This transform can be restricted by a selection set.
.sp
All mergeinfo properties are updated in accordance with the path renames,
.RE
.sp
setpath
.RS 4
In the specified revisions, replace the Node\-path with the specified PATH.
Does not alter mergeinfo properties as a side effect.
.RE
.sp
setcopyfrom
.RS 4
In the specified revisions, replace the Node\-copyfrom\-path with the specified PATH.
Does not alter mergeinfo properties as a side effect.  Terminates with error if any
selected node is not a copy.
.RE
.sp
pop
.RS 4
Pop initial segment off each path matching PATTERN \- by default, all paths.
.sp
May be useful after a sift command to turn a dump from a subproject
stripped from a dump for a multiple\-project repository into the normal
form with trunk/tags/branches at the top level.
.sp
This transform cannot be restricted by a selection set, as it is not possible to guarantee
that copyfro paths and mergeinfo properties will be modified consistently in the presence of
that kind of restriction.
.sp
Mergeinfo properties in all revisions are updated, as well as path and copyfrom parts.
.RE
.sp
push
.RS 4
Push an initial segment onto each matching path. Normally used to add a
"trunk" prefix to every path in a flat repository.  The \-s option can be used
rton set a different initial segment.
.sp
This transform cannot be restricted by a selection set, as it is not
possible to guarantee that copyfro paths and mergeinfo properties will
be modified consistently in the presence of that kind of restriction.
.sp
Mergeinfo properties in all revisions are updated toi refer to the
new pathnames.
.RE
.sp
filecopy
.RS 4
For each node in the revision range, stash the current version of the
node\-path\(cqs content.  For each later file copy operation with that source,
replace the file copy with an explicit add/change using the stashed content.
.sp
You can use this operation to sever links from obsolete branches
or non\-conformable directories in a multiproject repository so the
unwanted content can be expunged without changing the content of later
revisions.
.sp
If a PATTERN argument is provided, only replace copies with an explicit
add/change when the source node path matches PATTERN.
.sp
With the \-n flag, only the basename is required to match PATTERN if it is
provided. Otherwise, with \-n and no PATTERN, require a match of source to
target on basename only rather than the full path. This may be required
in order to extract filecopies from branches.
.sp
Restricting the range holds down the memory requirement of this tool,
which in the worst (and default) 1:$ case will keep a copy of every blob
in the repository until it\(cqs done processing the stream.
.RE
.sp
skipcopy
.RS 4
Replace the source revision and path of a copy at the upper end of the selection
with the source revisions and path of a copy at the lower end. Fails unless both
revisions are copies.  Used to remove an unwanted intermediate copy or
copies, cleaning up the history.
.RE
.sp
swap
.RS 4
Swap the top two elements of each pathname in every revision in the
selection set. Useful following a sift operation for straightening out
a common form of multi\-project repository.  If a PATTERN argument is given,
only paths matching it are swapped.
.RE
.sp
swapsvn
.RS 4
Like swap, but is aware of Subversion structure.  Used for transforming
multiproject repositories into a standard layout with trunk, tags, and
branches at the top level.
.sp
Fires when the second component of a matching path is "trunk", "branches",
or "tags", or the path consists of a single segment that is a top\-level
project directory; passes through all paths for this is not so unaltered.
.sp
Top\-level project directories with properties or comments make this command
die (return status 1) with an error message on stderr; otherwise these
directories are silently discarded.
.sp
Otherwise, swaps "trunk" and the top\-level (project) directory
straight up.  For tags and branches, the following \fBtwo\fP components
are swapped to the top.  thus, "foo/branches/release23" becomes
"branches/release23/foo", putting the project directory beneath the
branch.
.sp
Also fires when an entire project directory is copied; this is transformed
into a copy of trunk and copies of each subbranch and tag that exists.
.sp
After the swap, there are attempts to recognize spans of copies
into branch directories, and copies into tag subdirectories that are
parallel in all top\-level (project) directories. These are coalesced
into single copies in the inverted structure.  No attempts is made
to coalesce deletes; the user must manually trim unneeded branches.
.sp
Accordingly, copies with three\-segment sources and three\-segment
targets are transformed; for tags/ and branches/ paths the last
segment (the subdirectory below the branch name) is dropped, Following
copies are skipped.
.sp
This has two minor negative consequences. One is that metadata
belonging to all deletes or copies after the first one in a coalesced
span is lost.  The other is that branches and tags local to
individual project directories are promoted to global branches and
tags across the entire transformed repository; no content is lost this
way.
.sp
Parallel rename sequences are also coalesced.
.sp
If a PATTERN argument is given, only paths matching the pattern are swapped.
.sp
Note that the result of swapping does not have initial trunk/branches/tags
directory creations and can thus not be fed directly to svnload. reposurgeon
copes with this, but Subversion will not.
.sp
Merfeinfo propertied are updated to use the swapped path names.
.sp
This transform can be restricted by a selection set.
.RE
.sp
swapcheck
.RS 4
List directory prefixes of anomalous paths that would confuse swapsvn. This includes
any single\-segment path other than trunk/tags/branches or a project copy operation,
any path with two or more segments in which the second is not trunk/tags/branches, and
any path in which trunk/tags/branches occurs more than one segment down from the root.
.sp
Each report line has two fields; the first is the earliest revision containing
a path with the prefix given, and the second is the prefix.  Once a particular path
prefix has been recognized and reported as anomalous, later paths with that prefix
are not reported.
.sp
If feeding a Subversion dump to this subcommand doesn\(cqt produce an empty report,
you can expect swapsvn to produce an invalid dump that will confuse and possibly
crash reposurgeon. The remedy for this is a set of pathrenames and/or deselections
that yields paths conformable to being swapped into a regular Subversion structure.
.RE
.sp
replace
.RS 4
Perform a regular expression search/replace on blob content. The first
character of the argument (normally /) is treated as the end delimiter
for the regular\-expression and replacement parts. This transform can be
restricted by a selection set.
.RE
.sp
strip
.RS 4
Replace content with unique generated cookies on all node paths matching
the specified regular expressions; if no expressions are given, match all
paths.
.sp
This command is useful for reducing the bulk of a stream without touching
its metadata, so you can do test conversions more quickly.
.RE
.sp
hash
.RS 4
Replace content with hash on all node paths matching the specified regular
expressions; if no expressions are given, match all paths.
.RE
.sp
obscure
.RS 4
Replace path segments and committer IDs with arbitrary but consistent
names in order to obscure them. The replacement algorithm is tuned to
make the replacements readily distinguishable by eyeball.  This
transform can be restricted by a selection set.
.RE
.sp
reduce
.RS 4
Strip revisions out of a dump so the only parts left those likely to
be relevant to a conversion problem. This is done by dropping every
node that consists of a change on a file and has no property settings.
Mergeinfo properties in all revisions are updated so they no longer refer
to dropped revisions.
.RE
.sp
testify
.RS 4
Replace commit timestamps with a monotonically increasing clock tick
starting at the Unix epoch and advancing by 10 seconds per commit.
Replace all attributions with \*(Aqfred\*(Aq.  Discard the repository UUID.
Use this to neutralize procedurally\-generated streams so they can be
compared. This transform can be restricted by a selection set.
.RE
.sp
count
.RS 4
Set the debug level to the specified value on the selected revisions.
Setting debugging enables diagnostics to standard error, and suppresses
the progress baton for the entire run in order not to step on any
diagnostics that might be emitted.
.sp
For the meaning of the debug levels, see the source code.  This option
is probably only of interest to repocutter developers.
.RE
.sp
version
.RS 4
Report major and minor repocutter version.
.RE
.SH "HISTORY"
.sp
Under the name "svncutter", an ancestor of this program traveled in
the \*(Aqcontrib/\*(Aq director of the Subversion
distribution. It had functional overlap with reposurgeon(1) because it
was directly ancestral to that code. It was moved to the
reposurgeon(1) distribution in January 2016.  This program was ported
from Python to Go in August 2018, at which time the obsolete "squash"
command was retired.  The syntax of regular expressions in the
pathrename command changed at that time.
.sp
The reason for the partial functional overlap between repocutter and
reposurgeon is that repocutter was first written earlier and became a
testbed for some of the design concepts in reposurgeon. After
reposurgeon was written, the author learned that it could not
naturally support some useful operations very specific to Subversion,
and enhanced repocutter to do those.
.SH "RETURN VALUES"
.sp
Normally 0. Can be 1 if repocutter sees an ill\-formed dump, or if the
output stream contains any copyfrom references to missing revisions.
.SH "BUGS"
.sp
There is one regression since the Python version: repocutter no
longer recognizes Macintosh\-style line endings consisting of a carriage
return only. This may be addressed in a future version.
.SH "SEE ALSO"
.sp
reposurgeon(1).
.SH "EXAMPLE"
.sp
Suppose you have a Subversion repository with the following
semi\-pathological structure:
.sp
.if n .RS 4
.nf
.fam C
Directory1/ (with unrelated content)
Directory2/ (with unrelated content)
TheDirIWantToMigrate/
                branches/
                               crazy\-feature/
                                               UnrelatedApp1/
                                               TheAppIWantToMigrate/
                tags/
                               v1.001/
                                               UnrelatedApp1/
                                               UnrelatedApp2/
                                               TheAppIWantToMigrate/
                trunk/
                               UnrelatedApp1/
                               UnrelatedApp2/
                               TheAppIWantToMigrate/
.fam
.fi
.if n .RE
.sp
You want to transform the dump file so that TheAppIWantToMigrate can be
subject to a regular branchy lift. A way to dissect out the code of
interest would be with the following series of filters applied:
.sp
.if n .RS 4
.nf
.fam C
repocutter expunge \*(Aq^Directory1\*(Aq \*(Aq^Directory2\*(Aq
repocutter pathrename \*(Aq^TheDirIWantToMigrate/\*(Aq \*(Aq\*(Aq
repocutter expunge \*(Aq^branches/crazy\-feature/UnrelatedApp1/
repocutter pathrename \*(Aqbranches/crazy\-feature/TheAppIWantToMigrate/\*(Aq \*(Aqbranches/crazy\-feature/\*(Aq
repocutter expunge \*(Aq^tags/v1.001/UnrelatedApp1/\*(Aq
repocutter expunge \*(Aq^tags/v1.001/UnrelatedApp2/\*(Aq
repocutter pathrename \*(Aq^tags/v1.001/TheAppIWantToMigrate/\*(Aq \*(Aqtags/v1.001/\*(Aq
repocutter expunge \*(Aq^trunk/UnrelatedApp1/\*(Aq
repocutter expunge \*(Aq^trunk/UnrelatedApp2/\*(Aq
repocutter pathrename \*(Aq^trunk/TheAppIWantToMigrate/\*(Aq \*(Aqtrunk/\*(Aq
.fam
.fi
.if n .RE
.SH "LIMITATIONS"
.sp
The sift and expunge operations can produce output dumps that are
invalid.  The problem is copyfrom operations (Subversion branch and
tag creations).  If an included revision includes a copyfrom reference
to an excluded one, the reference target won\(cqt be in the emitted dump;
it won\(cqt load correctly in Subversion, and while reposurgeon has
fallback logic that backs down to the latest existing revision before
the kissing one this expedient is fragile. The revision number in a
copyfrom header pointing to a missing revision will be zero. Attempts
to be clever about this won\(cqt work; the problem is inherent in the
data model of Subversion.
.SH "AUTHOR"
.sp
Eric S. Raymond \c
.MTO "esr\(atthyrsus.com" "" "."
This tool is
distributed with reposurgeon; see the
.URL "http://www.catb.org/~esr/reposurgeon" "project page" "."