File: symbol-notes.txt

package info (click to toggle)
cvs2svn 2.4.0-4
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 3,720 kB
  • sloc: python: 22,383; sh: 512; perl: 121; makefile: 84
file content (377 lines) | stat: -rw-r--r-- 15,241 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
This is a description of how symbols (tags and branches) are handled
by cvs2svn, determined by reading the code.


Notation
========

    CVSFile -- a single file within the CVS repository.  This object
        basically only records the filename of the corresponding RCS
        file, and the relative filename that this file will have
        within the SVN repository.  A single CVSFile object is used
        for all of the CVSItems on all lines of development related to
        that file.


The following terms and the corresponding classes represent
project-wide concepts.  For example, a project will only have a single
Branch named "foo" even if many files appear on that branch.  Each of
these objects is assigned a unique integer ID during CollectRevsPass
which is preserved during the entire conversion (even if, say, a
Branch is mutated into a Tag).

    Trunk -- the main line of development for a particular Project in
        CVS.  The Trunk class inherits from LineOfDevelopment.

    Symbol -- a Branch or a Tag within a particular Project (see
        below).  Instances of this class are also used to represent
        symbols early in the conversion, before it has been decided
        whether to convert the symbol as a Branch or as a Tag.  A
        Symbol contains an id, a Project, and a name.

    Branch -- a symbol within a particular Project that will be
        treated as a branch in SVN.  Usually corresponds to a branch
        tag in CVS, but might be a non-branch tag that was mutated in
        CollateSymbolsPass.  In SVN, this will correspond to a
        subdirectory of the project's "branches" directory.  The
        Branch class inherits from Symbol and from LineOfDevelopment.

    Tag -- a symbol within a particular Project that will be treated
        as a tag in SVN.  Usually corresponds to a non-branch tag in
        CVS, but might be a branch tag that was mutated in
        CollateSymbolsPass.  In SVN, this will correspond to a
        subdirectory of the project's "tags" directory.  The Tags
        class inherits from Symbol and from LineOfDevelopment.

    ExcludedSymbol -- a CVS symbol that will be excluded from the
        cvs2svn output.

    LineOfDevelopment -- a Trunk, Branch, or Tag.


The following terms and the corresponding classes represent particular
CVS events in particular CVS files.  For example, the CVSBranch
representing the creation of Branch "foo" in one file will be distinct
from the CVSBranch representing the creation of branch "foo" in
another file, even if the two files are in the same Project.  Each
CVSItem is assigned a unique integer ID during CollectRevsPass which
is preserved during the entire conversion (even if, say, a CVSBranch
is mutated into a CVSTag).

    CVSItem -- abstract base class representing any discernible event
        within a single RCS file, for example the creation of revision
        1.6, or the tagging of the file with tag "bar".  Each CVSItem
        has a unique integer ID.

    CVSRevision -- a particular revision within a particular file
        (e.g., file.txt:1.6).  A CVSRevision occurs on a particular
        LineOfDevelopment.  CVSRevision inherits from CVSItem.

    CVSSymbol -- a CVSBranch or CVSTag (see below).  CVSSymbol
        inherits from CVSItem.

    CVSBranch -- the creation of a particular Branch on a particular
        file.  A CVSBranch has a Symbol instance telling the Symbol
        associated with the branch, and also records the
        LineOfDevelopment from which the branch was created.  In the
        SVN repository, a CVSBranch corresponds to an "svn copy" of a
        file to a subdirectory of the project's "branches" directory.
        CVSBranch inherits from CVSSymbol.

    CVSTag -- the creation of a particular Tag on a particular file.
        A CVSTag has a Symbol instance telling the Symbol associated
        with the tag, and also records the LineOfDevelopment from
        which the tag was created.  In the SVN repository, a CVSTag
        corresponds to an "svn copy" of a file to a subdirectory of
        the project's "tags" directory.  CVSTag inherits from
        CVSSymbol.


CollectRevsPass
===============

Collect all information about CVS tags and branches from the CVS
repository.

For each project, create a Trunk object to represent the trunk line of
development for that project.  The Trunk object for one Project is
distinct from the Trunk objects for other Projects.  For each symbol
name seen in each project, create a Symbol object.  The Symbol object
contains its id, project, and name.

The very first thing that is done when a symbol is read is that the
Project's symbol transform rules are given a chance to transform the
symbol name (or even cause it to be discarded).  The result of the
transformation is used as the symbol name in the rest of the program.
Because this transformation process is so low-level, it is capable of
making a more fundamental kind of change than the symbol strategy
rules that come later:

  * Symbols can be renamed.

  * Symbols can be fully discarded, as if they never appeared in the
    CVS repository.  This can even be done for a malformed symbol or
    for a branch symbol that refers to the same branch as another
    branch symbol (which would otherwise be a fatal error).

  * Two distinct symbols in different files within the same project
    can be transformed to the same name, in which case they are
    treated as a single symbol.

  * Two distinct symbols within a single file can be transformed to
    the same name, provided they refer to the same revision number.
    This effectively discards one of the symbols.

  * Two symbols with the same name in different files can be given
    distinct names, in which case they are treated as completely
    separate symbols.

For each Symbol object, collect the following statistics:

  * In how many files was the symbol used as a branch and in how many
    was it used as a tag.

  * In how many files was there a commit on a branch with that name.

  * Which other symbols branched off of a branch with that name.

  * In how many files could each other line of development have served
    as the source of this symbol.  These are called the "possible
    parents" of the symbol.

These statistics are used in CollateSymbolsPass to determine which
symbols can be excluded or converted from tags to branches or vice
versa.

The possible parents information is important because CVS is ambiguous
about what line of development was the source of a branch.  A branch
numbered 1.3.6 might have been created from trunk (revision 1.3), from
branch 1.3.2, or from branch 1.3.4; it is simply impossible to tell
based on the information in a single RCS file.

[Actually, the situation is even more confusing.  If a branch tag is
deleted from CVS, the branch number is recycled.  So it is even
possible that branch 1.3.6 was created from branch 1.3.8 or 1.3.10 or
...  We address this confusion by noting the order that the branches
were listed in the RCS file header.  It appears that CVS lists
branches in the header in reverse chronological order of creation.]

For each tag seen within each file, create a CVSTag object recording
its id, CVSFile, Symbol, and the id of the CVSRevision being tagged.

For each branch seen within each file, create a CVSBranch object
recording an id, CVSFile, Symbol, the branch number (e.g., '1.4.2'),
the id of the CVSRevision from which the branch sprouts, and the id of
the first CVSRevision on the branch (if any).

For each revision seen within each file, create a CVSRevision object
recording (among other things) and id, the line of development (trunk
or branch) on which the revision appeared, a list of ids of CVSTags
tagging the revision, and a list of ids of CVSBranches sprouting from
the revision.

This pass also adjusts the CVS dependency tree to work around some CVS
quirks.  (See design-notes.txt for the details.)  These adjustments
can result in CVSBranches being deleted, for example, if a file was
added on a branch.  In such a case, any CVSRevisions that were
previously on the branch will be created by adding the file to the
branch directory, rather than copying the file from the source
directory to the branch directory.


CleanMetadataPass
=================

N/A


CollateSymbolsPass
==================

Allow the project's symbol strategy rules to affect how symbols are
converted:

  * A symbol can be excluded from the conversion output (as long as
    there are no other non-excluded symbols that depend on it).  In
    this case, the Symbol will be converted into an ExcludedSymbol
    instance.

  * A tag symbol can be converted as a branch.  In this case, the
    Symbol will be converted into a Branch instance.

  * A branch symbol can be converted as a tag, provided there were
    never any commits on the branch.  In this case, the Symbol will be
    converted into a Tag instance.

  * The SVN path where a symbol will be placed is determined.
    Typically, symbols are laid out in the standard
    trunk/branches/tags Subversion repository layout, but strategy
    rules can in fact place symbols arbitrarily.

  * The preferred parent of each symbol is determined.  The preferred
    parent of a Symbol is chosen to be the line of development that
    appeared as a possible parent of this symbol in the most CVSFiles.

This pass creates the symbol database, SYMBOL_DB, which is accessed in
later passes via the SymbolDatabase class.  The SymbolDatabase
contains TypedSymbol (Branch, Tag, or ExcludedSymbol) instances
indicating how each symbol should be processed in the conversion.  The
ID used for a TypedSymbol is the same as the ID allocated to the
corresponding symbol in CollectRevsPass, so references in CVSItems do
not have to be updated.


FilterSymbolsPass
=================

Iterate through all of the CVSItems, mutating CVSTags to CVSBranches
and vice versa and excluding other CVSSymbols as specified by the
types of the TypedSymbols in the SymbolDatabase.  Additionally, filter
out any CVSRevisions that reside on excluded CVSBranches.

Write a line of text to CVS_SYMBOLS_DATAFILE for each surviving
CVSSymbol, containing its Symbol id and a pickled version of the
CVSSymbol.  (This file will be sorted in SortSymbolsPass then used in
InitializeChangesetsPass to create SymbolChangesets.)

Also adjust the file's dependency tree by grafting CVSSymbols onto
their preferred parents.  This is not always possible; if not, leave
the CVSSymbol where it was.

Finally, record symbol "openings" and "closings".  A CVSSymbol is
considered "opened" by the CVSRevision or CVSBranch from which the
CVSSymbol sprouts.  A CVSSymbol is considered "closed" by the
CVSRevision that overwrites or deletes the CVSSymbol's opening.
(Every CVSSymbol has an opening, but not all of them have closings;
for example, the opening CVSRevision might still exist at HEAD.)
Record in each CVSRevision and CVSBranch a list of all of the
CVSSymbols that it opens.  Record in each CVSRevision a list of all of
the CVSSymbols that it closes (CVSBranches cannot close CVSSymbols).


SortRevisionsPass
=================

N/A


SortSymbolsPass
===============

Sort CVS_SYMBOLS_DATAFILE, creating CVS_SYMBOLS_SORTED_DATAFILE.  The
sort groups together symbol items that might be added to the same
SymbolChangeset.


InitializeChangesetsPass
========================

Read CVS_SYMBOLS_SORTED_DATAFILE, grouping CVSSymbol items with the
same Symbol id into SymbolChangesets.


BreakRevisionChangesetCyclesPass
================================

N/A


RevisionTopologicalSortPass
===========================

N/A


BreakSymbolChangesetCyclesPass
==============================

Read in the changeset graph consisting only of SymbolChangesets and
break up symbol changesets as necessary to break any cycles that are
found.


BreakAllChangesetCyclesPass
===========================

Read in the entire changeset graph and break up symbol changesets as
necessary to break any cycles that are found.


TopologicalSortPass
===================

Update the conversion statistics with excluded symbols omitted.


CreateRevsPass
==============

Create SVNCommits and assign svn revision numbers to each one.  Create
a database (SVN_COMMITS_*) to map svn revision numbers to SVNCommits
and another (CVS_REVS_TO_SVN_REVNUMS) to map each CVSRevision id to
the number of the svn revision containing it.

Also, SymbolingsLogger writes a line to SYMBOL_OPENINGS_CLOSINGS for
each opening or closing for each CVSSymbol, noting in what SVN
revision the opening or closing occurred.


SortSymbolOpeningsClosingsPass
==============================

This pass sorts SYMBOL_OPENINGS_CLOSINGS into
SYMBOL_OPENINGS_CLOSINGS_SORTED.  This orders the file first by symbol
ID, and second by Subversion revision number, thus grouping all
openings and closings for each symbolic name together.


IndexSymbolsPass
================

Iterate through all the lines in SYMBOL_OPENINGS_CLOSINGS_SORTED,
writing out a pickled map to SYMBOL_OFFSETS_DB telling at what offset
in SYMBOL_OPENINGS_CLOSINGS_SORTED the lines corresponding to each
Symbol begin.  This will allow us to seek to the various offsets in
the file and sequentially read only the openings and closings that we
need.


OutputPass
==========

The filling of a symbol is triggered when SVNSymbolCommit.commit()
calls SVNRepositoryMirror.fill_symbol().  The SVNSymbolCommit contains
the list of CVSSymbols that have to be copied to a symbol directory in
this revision.  However, we still have to do a lot of work to figure
out what SVN revision number to use as the source of these copies, and
also to group file copies together into directory copies when
possible.

The SYMBOL_OPENINGS_CLOSINGS_SORTED file lists the opening and closing
SVN revision of each revision that has to be copied to the symbol
directory.  We use this information to try to find SVN revision
numbers that can serve as the source for as many files as possible, to
avoid having to pick and choose sources from many SVN revisions.

Furthermore, when a bunch of files in a directory have to be copied at
the same time, it is cheaper to copy the directory as a whole.  But if
not *all* of the files within the directory had to be copied, then the
unneeded files have to be deleted again from the copied directory.  Or
if some of the files have to be copied from different source SVN
revision numbers, then those files have to be overwritten in the
copied directory with the correct versions.

Finally, it can happen that a single Symbol has to be filled multiple
times (because the initial SymbolChangeset had to be broken up).  In
this case, the first fill can copy the source directory to the
destination directory (maybe with fixups), but subsequent copies have
to copy individual files to avoid overwriting content that is already
present in the destination directory.

To figure all of this out, we need to know all of the files that
existed in every previous SVN revision, in every line of development.
This is done using the SVNRepositoryMirror class, which keeps a
skeleton record of the entire SVN history in a database using data
structures similar to those used by SVN itself.