cvs2svn
view README.cvs2hg @ 233:19b322d42b1f
setup: add cvs2hg to the list of scripts to install.
| author | Greg Ward <greg@gerg.ca> |
|---|---|
| date | Wed Mar 16 20:57:23 2011 -0400 (14 months ago) |
| parents | f5dc4893e476 |
| children |
line source
1 cvs2hg: Mercurial backend for cvs2svn
2 =====================================
4 contact: Greg Ward <greg at gerg dot ca>
6 This project adds a third backend to cvs2svn: direct output to
7 Mercurial. (The first two, of course, are the original Subversion
8 output and cvs2git, which writes git fast-import files.)
10 (Do NOT confuse this cvs2hg with the variation on cvs2git that writes a
11 fastimport file that early versions of hg-fastimport could handle.
12 That's currently the cvs2hg you'll find in trunk checkouts of the
13 official cvs2svn. I apologize for the name conflict, but one of my
14 goals is to replace the fastimport-based cvs2hg with this direct-output
15 version. In the rest of this document, cvs2hg refers to the code you
16 have in front of you now.)
18 Ultimately, I want to see this work merged into the official cvs2svn
19 repository, at which point I will stop maintaining this project. For
20 now, though, I'm still actively developing, testing, and debugging this
21 code, so I'm going to keep maintaining it here:
23 http://hg.gerg.ca/cvs2svn/
26 Requirements
27 ------------
29 cvs2hg requires:
30 * Python 2.4 or later (but not Python 3.x) (same as cvs2svn)
31 * Mercurial 1.1 or later
32 * direct access to your CVS repository (again, same as cvs2svn)
34 If the appropriate Mercurial API cannot be found, cvs2hg will die with a
35 (hopefully) clear and self-explanatory error message.
37 If your OS does not include Mercurial 1.1 or later, you have a little
38 more work to do. On Unix (including Mac OS X), you can just download
39 and build Mercurial from source:
41 * download the latest released version from
42 http://mercurial.selenic.com/release/?C=M;O=D
44 * unpack it somewhere: for example, to /tmp/mercurial-$ver
46 * build it in place: cd /tmp/mercurial-$ver && make local
48 * before running cvs2hg, ensure that Python can find the Mercurial
49 library you just unpacked, e.g.
50 export PYTHONPATH=/tmp/mercurial-$ver
52 On Windows, apparently the easiest way is:
54 * download the appropriate mercurial-$ver.win32-pyX.Y.exe
55 installer from
56 http://bitbucket.org/tortoisehg/thg-winbuild/downloads/
57 E.g. for Mercurial 1.6.2 under Python 2.6, download
58 mercurial-1.6.2.win32-py2.6.exe
60 * install it -- this adds the Mercurial Python API to your
61 Python installation
63 * run
64 python <CVS2SVN_DIR>\cvs2hg ...
65 where <CVS2SVN_DIR> is the directory with cvs2hg and
66 this README file, and "python" is on your %PATH%.
69 Quick start
70 -----------
72 If you're not interested in understand the underlying concepts, you can
73 try converting a CVS repository right away:
75 ./cvs2hg --hgrepos myproject.hg /path/to/cvs/myproject
77 If you have problems, you might try a cvs2svn or cvs2git conversion
78 first. cvs2hg differs from those two only in the final pass; if there
79 is something weird in your CVS repository that confuses any of the first
80 15 passes, it should affect cvs2svn or cvs2git as well. If that is the
81 case, you should go to the cvs2svn developers for support. Try a trunk
82 checkout of cvs2svn first just to be sure; it's possible that my patches
83 to implement cvs2hg broke some obscure part of cvs2svn/cvs2git.
85 But if cvs2hg has problems on a CVS repository that cvs2svn and cvs2git
86 can handle, let me know. (My email address is at the top of this file.
87 Please cc the users@cvs2svn.tigris.org list too, so the conversation is
88 public and archived.)
90 If you convert a large CVS repository, keep an eye on the runtime and
91 memory usage. cvs2hg should be slightly faster than writing to a
92 Subversion repository, and hopefully not too much slower than writing a
93 Subversion dump file or git fast-import file. And memory usage should
94 be reasonable -- hopefully no worse than cvs2svn.
96 Also, if you have questions about the nature of the output Mercurial
97 repository, you should read the rest of this file.
100 Goals
101 -----
103 The basic goal of any CVS-to-X conversion is that you should be able to
104 get out of X the same as you get out of CVS. That is, a CVS checkout of
105 any branch or tag should give the exact same directory tree as a
106 corresponding Mercurial checkout. CVS' peculiar approach to branching
107 and tagging makes this particularly challenging, and the interesting
108 part of implementing cvs2hg was figuring out how to adopt the bizarre
109 things one sees in real-life CVS repositories to make a sensible
110 Mercurial repository.
112 As with human languages, there is a distinction between literal
113 translation and idiomatic translation. Literal translation is what
114 meets the above goal: checkout any CVS tag/branch, update to the same
115 tag/branch in Mercurial, and get identical trees. The problem with
116 literal translation is that it requires odd-looking hacks to make the
117 Mercurial repository look just like the CVS repository.
119 The goal of idiomatic translation is to produce a Mercurial repository
120 that you would have had if you had been using Mercurial from the
121 beginning. Obviously, this is a vague and fuzzy goal that is impossible
122 to fulfill in the general case. That's why literal translation is the
123 default; idiomatic translation will require you to muck about in a
124 cvs2hg.options file, think hard about your workflow, and tell cvs2hg
125 what sort of compromises to make in order to get a more natural-looking
126 translation.
129 Fixup commits
130 -------------
132 The most important hack used to achieve a literal translation is the
133 fixup commit. Consider a CVS repository with this structure:
135 /--+-- lib1/ ...
136 |
137 +-- lib2/ ...
138 |
139 +-- app1/ ...
140 |
141 +-- stuff/ ...
143 where 'lib1', 'lib2', and 'app1' are separate projects that comprise a
144 single product and 'stuff' is, well, just a bunch of stuff. Further
145 assume that you want to convert the entire CVS repository to one
146 Mercurial repository. (This might not be the best approach, but that's
147 your decision. cvs2hg has to do the right thing whatever you decide.)
149 If the usual release procedure for this product is to tag and branch
150 just 'lib1', 'lib2', and 'app1', then you have partial tags and
151 branches. That is, if you update a CVS working copy to a point on the
152 trunk immediately before tag 'release-1_0', then you'll get 'stuff'.
153 But if you update to 'release-1_0', or to the branch 'maint-1_0', then
154 CVS will remove 'stuff' from your working copy.
156 In order to get a correct conversion where release-1_0 and maint-1_0
157 denote exactly the same tree in Mercurial as in CVS, cvs2hg adds a fixup
158 commit in such cases. For example, in this case the fixup commit would
159 simply delete everything under stuff, ensuring those files are not in
160 the tag or on the branch.
162 But that's only the simplest type of fixup commit. There are many ways
163 in which a CVS tag or branch point can fail to denote a single point in
164 history. The goal of a fixup commit is always to create such a point in
165 history, because Mercurial tags always denote a single changeset, and
166 branches always branch off a single changeset.
169 Consequences of fixup commits
170 -----------------------------
172 Fixup commits have a couple of consequences. Notably, if done
173 carelessly, they would create many new heads: for example, your
174 Mercurial history might look like this:
176 [...]
177 o 6245:54a288630eb6 start work on 1.1 features
178 |
179 | o 6244:bef2a0bf46cf fixup commit: create tag 'release-1_0'
180 |/
181 o 6243:d71085af4f8b update release notes for 1.0
182 |
183 o 6242:9124110d9c3c final bug fix before release 1.0
184 [...]
186 where revisions 6242, 6243, and 6245 were originally done in CVS, but
187 6244 was created by cvs2hg. If you convert a CVS repository with 4000
188 partial tags, you might get a Mercurial repository with 4000 heads.
189 This is not good: Mercurial was not designed for this sort of repository
190 and does not perform well with it.
192 Thus, cvs2hg takes pains to avoid proliferating heads, mainly by
193 creating dummy merges. In the above case, it would actually create this
194 Mercurial history:
196 [...]
197 o 6245:0b262b3d12de start work on 1.1 features
198 |\
199 | o 6244:bef2a0bf46cf fixup commit: create tag 'release-1_0'
200 |/
201 o 6243:d71085af4f8b update release notes for 1.0
202 |
203 o 6242:9124110d9c3c final bug fix before release 1.0
204 [...]
206 Note that 6245 appears to be a merge of 6243 and 6244. It isn't really
207 a merge, because of course there was never any point in your CVS history
208 corresponding to 6244. But it's recorded as one to prevent 6244 from
209 being a head.
211 The logic for creating these dummy merges is simple: the next changeset
212 on the same branch as the fixup commit ('default' in this case, because
213 release-1_0 tagged the CVS trunk) will be a dummy merge. That rule
214 holds even if the next changeset is another fixup commit. This can make
215 for somewhat confusing history, but it keeps the number of artificial
216 changesets down.
218 There's one further peculiarity related to fixup commits: if cvs2hg
219 reaches the end of conversion and the last commit on a particular branch
220 was a fixup, it creates another artificial changeset to undo the effect
221 of the fixup. For example, say you created a partial tag 'trunk-stable'
222 of 'lib1', 'lib2', and 'app1' on the trunk right before running the
223 conversion. The resulting history might look like this:
225 o 9324:c4c5df90da01 fixup commit: create tag 'trunk-stable'
226 /
227 o 9323:20f642f732ed add latest feature to app1
228 |
229 o 9322:cbe9d7fed4dd add some handy things in stuff
230 |
231 o 9321:712e9fe8c56e refactor lib2 a bit
233 In this case, Mercurial revision 9324 is both tip and the head of the
234 'default' branch. But because 'trunk-stable' is a partial tag, it's
235 missing everything in 'stuff'. Thus, this conversion fails the most
236 basic test, which is that updating to the head of Mercurial 'default'
237 should give exactly the same tree as checking out the CVS trunk.
239 The hack in this case is a second artificial commit:
241 o 9325:a3f371ce1176 artificial commit: close fixup head c4c5df90da01
242 |\
243 | o 9324:c4c5df90da01 fixup commit: create tag 'trunk-stable'
244 |/
245 o 9323:20f642f732ed add latest feature to app1
246 |
247 o 9322:cbe9d7fed4dd add some handy things in stuff
248 |
249 o 9321:712e9fe8c56e refactor lib2 a bit
251 9325 is a dummy merge that is identical to 9323: it exists only to make
252 the head of the Mercurial branch match the head of the corresponding CVS
253 branch.
256 Branch fixups
257 -------------
259 Everything that holds for CVS tags also holds for branch points: they do
260 not necessarily correspond to a single changeset. Thus, cvs2hg also
261 creates fixup commits when adding a branch.
263 However, branch fixup commits do not generally contribute to the
264 proliferation of heads. After all, a CVS branch is going to become a
265 Mercurial branch, which usually ends in a head. Also, your CVS branch
266 won't be converted to Mercurial unless it has some commits, so the fixup
267 commit will immediately be "covered up" by real changesets converted
268 from CVS. Finally, CVS branches are usually much less frequent than
269 tags.
271 Thus, cvs2hg makes no effort to dummy-merge branch fixups.
274 Avoiding fixup commits
275 ----------------------
277 You might have decided that you want to avoid as many fixup commits as
278 you can. After all, they clutter up history and create potentially
279 confusing dummy merges.
281 cvs2hg tries to make this easy for you. First of all, it won't create a
282 fixup commit for a tag unless it really needs to. If you create a CVS
283 tag by tagging the latest version of every file, then there should be no
284 need for a fixup commit, and cvs2hg will not create one.
286 Second, an important special case of fixup commits is the kind that only
287 deletes files -- this corresponds to a CVS partial tag. If you had been
288 using Mercurial from the beginning, you would most likely simply tag the
289 latest changeset, rather than deliberately excluding certain directories
290 from the tag. Thus, idiomatic translation requires that you have the
291 option of disabling such "delete only" fixup commits.
293 Finally, everything that holds for branch fixups should hold for tag
294 fixups -- although the defaults should not necessarily be the same.
295 Currently, cvs2hg behaves as follows:
297 * For branches: by default, always create fixups, even if there is no
298 difference with the fixup's first parent revision (this is more for
299 consistency with cvs2svn than for any good reason). Override with
300 branch_fixup_mode.
302 * For tags: by default, only create fixups if there is a difference
303 with the fixup's first parent. Override with tag_fixup_mode.
305 * You can control how/when to create fixups for tags and branches
306 independently with the branch_fixup_mode and tag_fixup_mode optional
307 keyword arguments to HgOutputOption. (That is, to use this feature
308 you will need to create a cvs2hg.options file and edit it.) The
309 available fixup modes are:
311 - 'always' : always create the fixup (default for branch fixups)
312 - 'optional': create fixups when they are needed (default for tag
313 fixups)
314 - 'sloppy' : don't bother with fixups that only delete files
315 relative to the first parent
317 "Sloppy" fixup mode will break the prime requirement of literal
318 translation (Mercurial tags won't be identical to CVS tags), but it's
319 important for idiomatic translation (in Mercurial, you have to bend over
320 backwards to tag a subset of your tree, and who would bother?).
322 For example, if you always want sloppy fixup for both branches and tags,
323 your cvs2hg.options file would contain:
325 ctx.output_option = HgOutputOption(
326 [...]
327 branch_fixup_mode='sloppy',
328 tag_fixup_mode='sloppy',
329 [...]
330 )
333 Known problems
334 --------------
336 Occasionally, converting large repositories dies with a MemoryError
337 exception. I have seen this myself more than once, but have been
338 unable to reproduce it reliably (although I admit I haven't tried very
339 hard). The workaround is annoying but simple: after the crash, just
340 run remove the partial repository and run the output pass again:
342 rm -rf <output-repo>
343 cvs2hg [...other arguments...] --pass 16
345 I'm not sure why, but that seems to work.
348 Updates
349 -------
351 Since this project is meant to be a short-lived development branch of
352 cvs2svn, I am *not* going to bother with official releases. You can
353 always get the latest code by cloning (or pulling from) my Mercurial
354 repository:
356 http://vc.gerg.ca/hg/cvs2svn/
358 I regularly merge with the upstream Subversion repository for cvs2svn,
359 so this code should not diverge very far from the official source.
