1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506
|
=pod
=for comment
DO NOT EDIT. This Pod was generated by Swim v0.1.48.
See http://github.com/ingydotnet/swim-pm#readme
=encoding utf8
=head1 Introducing Git Subrepos
There is a new git command called C<subrepo> that is meant to be a solid
alternative to the C<submodule> and C<subtree> commands. All 3 of these
commands allow you to include external repositories (pinned to specific
commits) in your main repository. This is an often needed feature for project
development under a source control system like Git. Unfortunately, the
C<submodule> command is severely lacking, and the C<subtree> command (an
attempt to make things better) is also very flawed. Fortunately, the
C<subrepo> command is here to save the day.
This article will discuss how the previous commands work, and where they go
wrong, while explaining how the new C<subrepo> command fixes the issues.
It should be noted that there are 3 distinct roles (ways people use repos)
involved in discussing this topic:
=over
=item * B<owner> — The primary author and repo owner
=item * B<collaborators> — Other developers who contribute to the repo
=item * B<users> — People who simply use the repo software
=back
=head2 Introducing C<subrepo>
While the main point is to show how subrepo addresses the shortcomings
of submodule and subtree, I'll start by giving a quick intro to the
subrepo command.
Let's say that you have a project repo called 'freebird' and you want to have
it include 2 other external repos, 'lynyrd' and 'skynyrd'. You would do the
following:
git clone git@github.com/you/freebird
cd freebird
git subrepo clone git@github.com/you/lynyrd ext/lynyrd
git subrepo clone git@github.com/you/skynyrd ext/skynyrd --branch=1975
What these commands do (at a high level) should be obvious. They "clone" (add)
the repos content into the subdirectories you told them to. The details of
what is happening to your repo will be discussed later, but adding new
subrepos is easy. If you need to update the subrepos later:
git subrepo pull ext/lynyrd
git subrepo pull ext/skynyrd --branch=1976
The lynyrd repo is tracking the upstream master branch, and you've changed the
skynyrd subrepo to the 1976 branch. Since these subrepos are owned by 'you',
you might want to change them in the context of your freebird repo. When
things are working, you can push the subrepo changes back:
git subrepo push ext/lynyrd
git subrepo push ext/skynyrd
Looks simple right? It's supposed to be. The intent of C<subrepo> is to do the
right things, and to not cause problems.
Of course there's more to it under the hood, and that's what the rest of this
article is about.
=head2 Git Submodules
Submodules tend to receive a lot of bad press. Here's some of it:
=over
=item * L<http://ayende.com/blog/4746/the-problem-with-git-submodules>
=item * L<https://web.archive.org/web/20171101202911/http://somethingsinistral.net/blog/git-submodules-are-probably-not-the-answer/>
=item * L<http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/>
=back
A quick recap of some of the good and bad things about submodules:
Good:
=over
=item * Use an external repo in a dedicated subdir of your project.
=item * Pin the external repo to a specific commit.
=item * The C<git-submodule> command is a core part of the Git project.
=back
Bad:
=over
=item * Users have to know a repo has submodules.
=item * Users have to get the subrepos manually.
=item * Pulling a repo with submodules won't pull in the new submodule changes.
=item * A submodule will break if the referenced repo goes away.
=item * A submodule will break if a forced push removes the referenced commit.
=item * Can't use different submodules/commits per main project branch.
=item * Can't "try out" a submodule on alternate branch.
=item * Main repo can be pushed upstream pointing to unpushed submod commits.
=item * Command capability differs across Git versions.
=item * Often need to change remote url, to push submodule changes upstream.
=item * Removing or renaming a submodule requires many steps.
=back
Internally, submodules are a real mess. They give the strong impression of
being bolted on, well after Git was designed. Some commands are aware of the
existence of submodules (although usually half-heartedly), and many commands
are oblivious. For instance the git-clone command has a C<--recursive> option
to clone all subrepos, but it's not a default, so you still need to be aware
of the need. The git-checkout command does nothing with the submodules, even
if they are intended to differ across branches.
Let's talk a bit about how submodules are implemented in Git. Information
about them is stored in 3 different places (in the top level repo directory):
=over
=item * C<.gitmodules>
=item * C<.git/config>
=item * C<.git/modules> — The submodule repo's meta data (refs/objects)
=back
So some of the information lives in the repo history (.gitmodules), but other
info (.git/) is only known to the local repo.
In addition, the submodule introduces a new low level concept, to the
commitI<tree>blob graph. Normally a git tree object points to blob (file)
objects and more tree (directory) objects. Submodules have tree objects point
to B<commit> objects. While this seems clever and somewhat reasonable, it also
means that every other git command (which was built on the super clean Git
data model) has to be aware of this new possibility (and deal with it
appropriately).
The point is that, while submodules are a real need, and a lot of work has
gone into making them work decently, they are essentially a kludge to the Git
model, and it is quite understandable why they haven't worked out as well as
people would expect.
NOTE: Submodules I<are> getting better with each release of Git, but it's
still an endless catch up game.
=head2 Git Subtrees
One day, someone decided to think different. Instead of pointing to external
repos, why not just include them into the main repo (but also allow them to be
pulled and pushed separately as needed)?
At first this may feel like a wasteful approach. Why keep other repos
physically inside your main one? But if you think about it abstractly, what's
the difference? You want your users and collaborators to have all this code
because your project needs it. So why worry about how it happens? In the end,
the choice is yours, but I've grown very comfortable with this concept and
I'll try to justify it well. I should note that the first paragraph of the
C<submodule> doc suggests considering this alternative.
The big win here is that you can do this using the existing git model. Nothing
new is added. You are just adding commits to a history. You can do it
different on every branch. You can merge branches sensibly.
The git-subtree command seems to have been inspired by Git's subtree merge
strategy, which it uses internally, and possibly got its name from. A subtree
merge allows you to take a completely separate Git history and make it be a
subdirectory of your repo.
Adding a subtree was the easy part. All that needed to be done after that was
to figure out a way to pull upstream changes and push local ones back
upstream. And that's what the C<git-subtree> command does.
So what's the problem with git-subtree then?
Well unfortunately, it drops a few balls. The main problems come down to an
overly complicated commandline UX, poor collaborator awareness, and a fragile
and messy implementation.
Good:
=over
=item * Use an external repo in a dedicated subdir of your project.
=item * Pin the external repo to a specific commit.
=item * Users get everything with a normal clone command.
=item * Users don't need to know that subtrees are involved.
=item * Can use different submodules/commits per main project branch.
=item * Users don't need the subtree command. Only owners and collaborators.
=back
Bad:
=over
=item * The remote url and branch info is not saved (except in the history).
=item * Owners and collaborators have to enter the remote for every command.
=item * Collaborators aren't made aware that subtrees are involved.
=item * Pulled history is not squashed by default.
=item * Creates a messy historical view. (See below)
=item * Bash code is complicated.
=item * Only one test file. Currently is failing.
=back
As you can see, subtree makes quite a few things better, but after trying it
for a while, the experience was more annoying than submodules. For example,
consider this usage:
$ git subtree add --squash --prefix=foo git@github.com:my/thing mybranch
# weeks go by…
$ git subtree pull --squash --prefix=foo git@github.com:my/thing mybranch
# time to push local subtree changes back upstream
$ git subtree push --prefix=foo git@github.com:my/thing mybranch
The first thing you notice is the overly verbose syntax. It's justified in the
first command, but in the other 2 commands I really don't want to have to
remember what the remote and branch are that I'm using.
Moreover, my collaborators have no idea that subtrees are involved, let alone
where they came from.
Consider the equivalent subrepo commands:
$ git subrepo clone git@github.com:my/thing foo -b mybranch
$ git subrepo pull foo
$ git subrepo push foo
Collaborators see a file called 'foo/.gitrepo', and know that the subdir is a
subrepo. The file contains all the information needed by future commands
applied to that subrepo.
=head2 Git Subrepos
Now is a good time to dive into the techinical aspects of the C<subrepo>
command, but first let me explain how it came about.
As you may have surmised by now, I am the author of git-subrepo. I'd used
submodules on and off for years, and when I became aware of subtree I gave it
a try, but I quickly realized its problems. I decided maybe it could be
improved. I decided to write down my expected commandline usage and my ideals
of what it would and would not do. Then I set off to implement it. It's been a
long road, but what I ended up with was even better than what I wanted from
the start.
Let's review the Goods and Bads:
Good:
=over
=item * Use an external repo in a dedicated subdir of your project.
=item * Pin the external repo to a specific commit.
=item * Users get everything with a normal clone command.
=item * Users don't need to know that subrepos are involved.
=item * Can use different submodules/commits per main project branch.
=item * Meta info is kept in an obvious place.
=item * Everyone knows when a subdir is a subrepo.
=item * Commandline UX is minimal and intuitive.
=item * Pulled history is always squashed out locally.
=item * Pushed history is kept intact.
=item * Creates a clean historical view. (See below)
=item * Bash code is very simple and easy to follow.
=item * Comprehensive test suite.
=back
Bad:
=over
=item * --Subrepo is very new.-- (no longer true)
=item * --Not well tested in the wild.-- (no longer true)
=back
This review may seem somewhat slanted, but I honestly am not aware of any
"bad" points that I'm not disclosing. That said, I am sure time will reveal
bugs and shortcomings. Those can usually be fixed. Hopefully the B<model> is
correct, because that's harder to fix down the road.
OK. So how does it all work?
There are 3 main commands: cloneI<pull>push. Let's start with the clone
command. This is the easiest part. You give it a remote url, possibly a new
subdir to put it, and possibly a remote branch to use. I say possibly, because
the command can guess the subdir name (just like the git-clone command does),
and the branch can be the upstream default branch.
Given this we do the following steps internally:
=over
=item * Fetch the remote content (for a specific refspec)
=item * Read the remote head tree into the index
=item * Checkout the index into the new subdir
=item * Create a new subrepo commit object for the subdir content
=item * Add a state file called .gitrepo to the new subrepo/subdir
=item * Amend the merge commit with this new file
=back
This process adds something like this to the top of your history:
* 9b6ddc9 git subrepo clone git@github.com:you/foo.git foo/
* 37c61a5 Previous head commit of your repo
The entire history has been squashed down into one commit, and placed on
top of your history. This is important as it keeps your history as clean
as possible. You don't need to have the subrepo history in your main
project, since it is immutably available elsewhere, and you have a pointer
to that place.
The new foo/.gitrepo file looks like this:
[subrepo]
remote = git@github.com:you/foo.git
branch = master
commit = 14c96c6931b41257b2d42b2edc67ddc659325823
parent = 37c61a5a234f5dd6f5c2aec037509f50d3a79b8f
cmdver = 0.1.0
It contains all the info needed now and later. Note that the repo url is the
generally pushable form, rather than the publically readable (L<https://…)>
form. This is the best practice. Users of your repo don't need access to this
url, because the content is already in your repo. Only you and your
collaborators need this url to pull/push in the future.
The next command is the pull command. Normally you just give it the subrepo's
subdir path (although you can change the branch with -b), and it will get the
other info from the subdir/.gitrepo file.
The pull command does these steps:
=over
=item * Fetch the upstream content
=item * Check if anything needs pulling
=item * Create a branch of local subrepo commits since last pull
=item * Rebase this branch onto the upstream commits
=item * Commit the HEAD of the rebased content
=item * Update/amend the .gitrepo file
=back
=head3 Clean History
I've talked a bit about clean history but let me show you a comparison between
subrepo and subtree. Let's run this command sequence using both methods. Note
the differences between I<both> the command syntax required, and the branch
history produced.
Subrepo first:
$ git subrepo clone git@github.com:user/abc
$ git subrepo clone git@github.com:user/def xyz
$ git subrepo pull abc
$ git subrepo pull xyz
The resulting history is:
* b1f60cc subrepo pull xyz
* 4fb0276 subrepo pull abc
* bcef2a0 subrepo clone git@github.com:user/def xyz
* bebf0db subrepo clone git@github.com:user/abc
* 64eeaa6 (origin/master, origin/HEAD) O HAI FREND
Compare that to B<subtree>. This:
$ git subtree add abc git@github.com:user/abc master
$ git subtree add xyz git@github.com:user/def master
$ git subtree pull abc git@github.com:user/abc master
$ git subtree pull xyz git@github.com:user/def master
Produces this:
* 739e45a (HEAD, master) Merge commit '5f563469d886d53e19cb908b3a64e4229f88a2d1'
|\
| * 5f56346 Squashed 'xyz/' changes from 08c7421..365409f
* | 641f5e5 Merge commit '8d88e90ce5f653ed2e7608a71b8693a2174ea62a'
|\ \
| * | 8d88e90 Squashed 'abc/' changes from 08c7421..365409f
* | | 1703ed2 Merge commit '0e091b672c4bbbbf6bc4f6694c475d127ffa21eb' as 'xyz'
|\ \ \
| | |/
| |/|
| * | 0e091b6 Squashed 'xyz/' content from commit 08c7421
| /
* | 07b77e7 Merge commit 'cd2b30a0229d931979ed4436b995875ec563faea' as 'abc'
|\ \
| |/
| * cd2b30a Squashed 'abc/' content from commit 08c7421
* 64eeaa6 (origin/master, origin/HEAD) O HAI FREND
This was from a minimal case. Subtree history (when viewed this way at least)
gets unreasonably ugly fast. Subrepo history, by contrast, always looks as
clean as shown.
The final command, push, bascially just does the pull/rebase dance above
described, and pushes the resulting history back. It does not squash the
commits made locally, because it assumed that when you changed the local
subrepo, you made messages that were intended to eventually be published
back upstream.
=head2 Conflict Resolution
The commands described above can also be done "by hand". If something fails
during a pull or push (generally in the rebasing) then the command will tell
you what to do to finish up.
You might choose to do everything by hand, and do your own merging strategies.
This is perfectly reasonable. The C<subrepo> command offers a few other helper
commands to help you get the job done:
=over
=item * C<fetch> - Fetch the upstream and create a C<< subrepo/remote/<subdir> >> ref.
=item * C<branch> - Create a branch of local subdir commits since the last pull, called C<< subrepo/<subdir> >>.
=item * C<commit> - Commit a merged branch's HEAD back into your repo.
=item * C<status> - Show lots of useful info about the current state of the subrepos.
=item * C<clean> - Remove branches, ref and remotes created by subrepo commands.
=item * C<help> - Read the complete documentation!
=back
=head2 Conclusion
Hopefully by now, you see that submodules are a painful choice with a dubious
future, and that subtree, while a solid idea has many usage issues.
Give C<subrepo> a try. It's painless, easily revertable and just might be what
the doctor ordered.
=head2 Reference Links
=over
=item * L<http://longair.net/blog/2010/06/02/git-submodules-explained/>
=item * L<http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/>
=back
=cut
|