Written by
Heikki Orsila.
Published on 2008-04-12. Updated on 2009-10-11.
Why Not Bazaar
1. Introduction
The motivation of this article is to criticize
Bazaar version control tool
in favor of better tools such as
Mercurial (hg) and
Git.
After having written most of the article I discovered many things
had already been said. See comments by
Linus Torvalds and
Carl Worth.
Nevertheless, this is my view on the subject.
There are two basic assumptions that are the core of this article.
If you disagree with these assumptions, you probably don't
agree with the arguments and you're wasting your time reading the
article.
- One has to co-operate with many untrusted parties.
This is typical in Open Source projects.
- Being truly distributed is very important.
There are multiple untrusted parties and the distributed model allows
them to develop exactly as they please.
Each party should be able to develop efficiently.
The core of version control is managing changes made by others,
and distributed version control forces you to learn that better. You would
have to learn that anyway to be a successful developer, so it's not a
downside of the model.
Being distributed also enables off-line work, easier experimenting with
branches, and less politics about commit access.
2. Bazaar Is Not Truly Distributed
2.1 Bazaar's Merge Algorithm Is Incomplete
Being truly distributed means all version trees are equal
in possibilities. This means all version trees should be mergeable
together, at least in theory.
However, Bazaar version control tool demands that two trees that are
merged must have at least one common ancestor, or cherry-picking must
be used.
A truly distributed tool will try to merge any pieces that fit together
and records the ancestry properly to ensure easy merges in the future..
Having a common ancestor is a strong "hint" that a merge is possible,
but it should not be a technical requirement for merging.
Git and hg both allow merging arbitrary trees together.
Bazaar alleviates the problem by allowing cherry-picking. The problem
with Bazaar's cherry-picking is that it creates additional merging
conflicts in the future because ancestry of commits is not properly recorded
(or the merge tool is not clever enough).
For example, cherry-picking commit X from another repository
(that has no common ancestry with the current repo)
and then immediately after cherry-picking commit X+1 creates a conflict,
if X and X+1 both touch same files.
Git's merge strategy is simple and straightforward compared to Bazaar's.
There are practical situations where merging repositories without common
version history is needed, and Bazaar is inflexible in those cases:
- A third party starts a new branch from a snapshot of the project,
thus deleting common version history.
- Additional software components are merged into the development tree
from different projects
- Systems before version control are merged into the development tree
Forking projects and merging unrelated projects is rare,
but it really puts a distributed system into a test.
Bazaar's mental model seems to have come from coordinated organizations
where branches are made cleanly off
the master tree and changes are later merged back in a centralized
fashion. Unfortunately the real world is dirty sometimes, and that is
when being truly distributed helps.
I recently got real-life experience about merging non-versioned code
into a Git repository.
A web site in production had undergone changes without
version control, but at the same time the site was developed with Git.
This is of course a management problem by itself,
but it happened so it had to be fixed. Using diff -urN |diffstat
is of course an option to start with when merging changes back
from the production system, but I decided to try Git's merging
abilities instead. I found it to be an easier and faster solution
than bare diffing and manual inspection.
I created an ad-hoc Git-repository from the
production system and then merged the development tree on top of
it to see what had changed.
This is what I did:
$ rsync -avP website:/wwwroot .
$ cd wwwroot
$ git init && git add * && git commit -m "production system import"
$ git checkout -b integrationbranch
$ git pull /devtree.git master
By doing the merge (pull) I got all the changes, new files and
merge conflicts with the production tree.
The merge took just a few seconds. The repository had 400 files, 760
commits and 4700 objects.
There were 17 or so merge conflicts that took
me two hours to handle, but it was really helpful to get that information
so fast.
When this was done,
git commit && git checkout master && git merge integrationbranch
was enough to pull the changes into development.
2.2 Asymmetric View of Revision Numbers Between Branches
When merging changes back from another branch,
bzr log shows commits from the other branch as being different
than local branch commits. This is Bazaar's design decision that
bzr log orders commits with a local revision number rather than a
global revision number. A global revision number does exist in bzr,
but they are not used with the log command.
It is annoying to see that commit messages and metadata from other branches
are indented differently than local branch commits in bzr log,
because it should
not matter where the commit was made but what the commit does.
In distributed development each change should be equal.
Mercurial and Git have global revision numbers
(in Mercurial it is called changeset id) and logs are
ordered based on that. Therefore, each
repository, a fork or a branch, has exactly the same view on commits,
which is the spirit of distributed versioning.
3. Bazaar's Log Command Lacks Some Useful Options
Since the core of version control is to manage changes, it should be
expected that lots of effort is put into inspection tools. Here are
some of the features I'm missing from Bazaar:
- bzr log lacks the diffstat feature.
Diffstat feature prints the number of added and deleted lines for
each file in a commit. Try: git log --stat
- bzr log lacks the ability to browse that matches a given author.
I often lookup changes with Git by: git log --author=SURNAME.
- bzr lacks the shortlog that gives a good overall picture of commits
and authors in a given commit range. Try: git shortlog.
A note: bzr has added log -p option since my original article
(2008-04-12). Good work!
4. bzr bundle-revisions vs. git format-patch
bzr bundle-revisions is used to share changes with other
developers. The person who sends the changeset does not need a server
of any sorts to share the changeset. The person sends the changeset
by email, for example. Unfortunately, the changeset is not
human-readable, it's an ASCII coded blob.
Human-readable changesets would allow
fast reviews and wide audience on mailing lists and forums.
bzr diff can be used for mailing lists and forums but
changelog entries and metadata is lost. Therefore, Bazaar should add the
equivalent of git format-patch that generates a human-readable
changeset.
5. Small Issues
- Bazaar's advocacy material
is ridiculous. Actually, it was the main inspiration for this article.
- Quote: Less attitude - direct support for more workflows
What attitude? Which workflows are not supported in Git?
Articles at the Bazaar site list some workflows that are all supported by Git.
- Quote:
In Bazaar, it is possible to commit directly to the central server
whereas in Git it would need two actions: a local commit followed by a
push to remote host.
Indeed, in Git one has to do git commit && git push instead
of just git commit, but what micro-fraction of development time
does that take?
- Quote: Git's automatic merge & commit may be create problems.
Yes, and undoing the merge is one command: git reset --hard.
There is also git merge/pull --no-commit.
- Quote: Easier administration: ... To maintain performance, Git repositories require packing.
Is running "git gc" as a weekly cron job too much administration?
Developers can also set an option to automatically pack local repo every now and then.
- Quote:
Git prides itself on being a "content manager" and deriving what got
renamed using heuristics. This mostly works but breaks under certain
merge conditions. If you want your team or community to collaborate
without fear of breaking merges, Bazaar's robust renaming is essential
as explained by ...
Merging is not relevant here, any system can and will do
bad merges in some cases, and they will be handled manually in both
systems. Undoing a merge is just one command. Also, Bazaar's merge
is more limited than Git's, see Section 2 of this article.
What matters is tracking changes.
git log -M --follow -- file can dig changes (and renames) for a
given file. git blame -C file can even track individual code
lines moving from one module to another. Bazaar is unable to do the latter.
- Quote:
When changes in a branch are ready for sharing and you wish to
share asynchronously (e.g. via email instead of advertising a public
branch), Bazaar handles this better than Git. The recommended way to
do this in Git is via the format-patch command which generates a set
of normal patches which can be applied with it's apply-am command.
Bazaar implements this functionality via the send command which
generates an intelligent patch known as a "merge directive". In
addition to a preview of the overall change, a merge directive
includes metadata like renames, the base revision (common ancestor) of
a submit branch and digital signatures.
In addition to previous faults discussed with bzr bundle-revisions,
the merge directive also limits this method to a centralized
versioning scheme, whereas git format-patch generates changes that can
be applied to any repository.
Also, it is factually inaccurate to state that git format-patch is
a recommended method over any other method that Git provides.
Developers often share their
changes through public version trees. Many projects post
format-patches on mailing lists (Linux kernel project, for example),
but is simply one alternative only.
Note: bzr uncommit issue seems to have been resolved (2009-10-11).
I criticized bzr for not having a good substitute for
git reset.
6. Conclusions
Bazaar is clearly a better tool than Subversion. However, in my opinion,
it is worse than Git and Mercurial.
Bazaar can be used to manage large centralized projects with many developers,
and it is a good tool for off-line development in centralized projects.
However, Bazaar is not a truly distributed system like Git and Mercurial.
This limits Bazaar's usefulness in the fast living Open Source world
where forks happen and organization boundaries are (or should be) meaningless.
Most deficiencies in Bazaar could be fixed, but until that happens,
I don't recommend it.
Author
Please send any feedback and suggestions to
Heikki Orsila.