File: revision-reader.txt

package info (click to toggle)
cvs2svn 2.4.0-4
  • links: PTS
  • area: main
  • in suites: stretch
  • size: 3,720 kB
  • sloc: python: 22,383; sh: 512; perl: 121; makefile: 84
file content (103 lines) | stat: -rw-r--r-- 4,976 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
This file contains a description of the RevisionCollector /
RevisionReader mechanism.


cvs2svn now includes hooks to make it possible to avoid having to
invoke CVS or RCS zillions of times in OutputPass (which is otherwise
the most expensive part of the conversion).  Here is a brief
description of how the hooks work.

Most conversions [1] require an instance of RevisionReader, whose
responsibility is to produce the text contents of CVS revisions on
demand during OutputPass.  The RevisionReader can read the CVS
revision contents directly out of the RCS files during OutputPass.
But additional hooks support the construction of different kinds of
RevisionReader that record the CVS file revisions' contents during
FilterSymbolsPass then output the contents during OutputPass.

The interface that is used during FilterSymbolsPass to allow the
collection of revision information is:

    RevisionCollector -- can collect information during
        FilterSymbolsPass to help the RevisionReader produce RCS file
        revision contents during OutputPass.

The type of RevisionCollector/RevisionReader to be used for a run of
cvs2svn can be set using --use-internal-co, --use-rcs, or --use-cvs,
or via the --options file with lines like:

    ctx.revision_collector = MyRevisionCollector()
    ctx.revision_reader = MyRevisionReader()

The following RevisionCollectors are supplied with cvs2svn:

    NullRevisionCollector -- does nothing (for RevisionReaders that
        don't need anything to happen in FilterSymbolsPass).

    InternalRevisionCollector -- records the delta text and
        dependencies for required revisions in FilterSymbolsPass, for
        use with the InternalRevisionReader.

    GitRevisionCollector -- uses another RevisionReader to reconstruct
        the revisions' fulltext during FilterSymbolsPass, then writes
        the fulltexts to a blobfile in git-fast-import format.  This
        file, combined with the dumpfile created in OutputPass, can be
        loaded into git.

    ExternalBlobGenerator -- uses an external Python program to
        reconstruct the revision fulltexts in FilterSymbolsPass and
        write them to a blobfile in git-fast-import format.  This
        option is very fast because (1) it uses code similar to that
        used by InternalRevisionCollector/InternalRevisionReader, and
        (2) it processes all revisions from a file at once, thereby
        avoiding a lot of disk seeking.

The following RevisionReaders are supplied with cvs2svn:

    InternalRevisionReader -- reconstitutes the revisions' contents
        during OutputPass from the data recorded by
        InternalRevisionCollector.  This is by far the fastest option
        for cvs2svn conversions, but it requires a substantial amount
        of temporary disk space for the duration of the conversion.

    RCSRevisionReader -- uses RCS's "co" command to extract the
        revision text during OutputPass.  This is slower than
        InternalRevisionReader because "co" has to be executed very
        many times, but is better tested and does not require any
        temporary disk space.  RCSRevisionReader does not use a
        RevisionCollector.

    CVSRevisionReader -- uses the "cvs" command to extract the
        revision text during OutputPass.  This is even slower than
        RCSRevisionReader, but it can handle some CVS file quirks that
        stymy RCSRevisionReader (see the cvs2svn HTML documentation).
        CVSRevisionReader does not use a RevisionCollector.

It is possible to write your own RevisionCollector and RevisionReader
if you would like to do things differently.  A RevisionCollector, with
callback methods that are invoked as the CVS files are parsed, can be
used to collect information during FilterSymbolsPass.  Its
process_file() method is allowed to set an arbitrary token (for
example, a content hash) in CVSItem.revision_reader_token.  This token
is carried along by cvs2svn for use by the RevisionReader in
OutputPass.

Later, when OutputPass requires the file contents, it calls
RevisionReader.get_content(), which is passed a CVSRevision instance
and has to return the file revision's contents.  The fancy
RevisionReader could use the token to retrieve the pre-stored file
contents without having to call CVS or RCS at all.


[1] The exception is cvs2git conversions, which need a
    RevisionCollector but not a RevisionReader.  The reason is that
    "git fast-import" allows file revision contents to be written as
    "blobs" in arbitrary order, to be hooked together later into
    proper changesets.  This feature is very beneficial to the
    performance of cvs2git, because it allows all revisions of a
    single file to be generated at the same time (with good disk
    locality) rather than having to jump around from file to file
    getting single revisions in changeset order.  Unfortunately,
    neither "bzr fast-import" nor "hg fastimport" support separate
    blobs.