Written by Heikki Orsila. Published on 2008-04-12. Updated on 2009-10-11.

Why Not Bazaar

1. Introduction

The motivation of this article is to criticize Bazaar version control tool in favor of better tools such as Mercurial (hg) and Git. After having written most of the article I discovered many things had already been said. See comments by Linus Torvalds and Carl Worth. Nevertheless, this is my view on the subject.

There are two basic assumptions that are the core of this article. If you disagree with these assumptions, you probably don't agree with the arguments and you're wasting your time reading the article.

  1. One has to co-operate with many untrusted parties. This is typical in Open Source projects.
  2. Being truly distributed is very important. There are multiple untrusted parties and the distributed model allows them to develop exactly as they please. Each party should be able to develop efficiently.

    The core of version control is managing changes made by others, and distributed version control forces you to learn that better. You would have to learn that anyway to be a successful developer, so it's not a downside of the model. Being distributed also enables off-line work, easier experimenting with branches, and less politics about commit access.

2. Bazaar Is Not Truly Distributed

2.1 Bazaar's Merge Algorithm Is Incomplete

Being truly distributed means all version trees are equal in possibilities. This means all version trees should be mergeable together, at least in theory. However, Bazaar version control tool demands that two trees that are merged must have at least one common ancestor, or cherry-picking must be used. A truly distributed tool will try to merge any pieces that fit together and records the ancestry properly to ensure easy merges in the future..

Having a common ancestor is a strong "hint" that a merge is possible, but it should not be a technical requirement for merging. Git and hg both allow merging arbitrary trees together. Bazaar alleviates the problem by allowing cherry-picking. The problem with Bazaar's cherry-picking is that it creates additional merging conflicts in the future because ancestry of commits is not properly recorded (or the merge tool is not clever enough). For example, cherry-picking commit X from another repository (that has no common ancestry with the current repo) and then immediately after cherry-picking commit X+1 creates a conflict, if X and X+1 both touch same files. Git's merge strategy is simple and straightforward compared to Bazaar's.

There are practical situations where merging repositories without common version history is needed, and Bazaar is inflexible in those cases:

  1. A third party starts a new branch from a snapshot of the project, thus deleting common version history.
  2. Additional software components are merged into the development tree from different projects
  3. Systems before version control are merged into the development tree

Forking projects and merging unrelated projects is rare, but it really puts a distributed system into a test. Bazaar's mental model seems to have come from coordinated organizations where branches are made cleanly off the master tree and changes are later merged back in a centralized fashion. Unfortunately the real world is dirty sometimes, and that is when being truly distributed helps.

I recently got real-life experience about merging non-versioned code into a Git repository. A web site in production had undergone changes without version control, but at the same time the site was developed with Git. This is of course a management problem by itself, but it happened so it had to be fixed. Using diff -urN |diffstat is of course an option to start with when merging changes back from the production system, but I decided to try Git's merging abilities instead. I found it to be an easier and faster solution than bare diffing and manual inspection. I created an ad-hoc Git-repository from the production system and then merged the development tree on top of it to see what had changed.

This is what I did:

$ rsync -avP website:/wwwroot .
$ cd wwwroot
$ git init && git add * && git commit -m "production system import"
$ git checkout -b integrationbranch
$ git pull /devtree.git master

By doing the merge (pull) I got all the changes, new files and merge conflicts with the production tree. The merge took just a few seconds. The repository had 400 files, 760 commits and 4700 objects. There were 17 or so merge conflicts that took me two hours to handle, but it was really helpful to get that information so fast. When this was done, git commit && git checkout master && git merge integrationbranch was enough to pull the changes into development.

2.2 Asymmetric View of Revision Numbers Between Branches

When merging changes back from another branch, bzr log shows commits from the other branch as being different than local branch commits. This is Bazaar's design decision that bzr log orders commits with a local revision number rather than a global revision number. A global revision number does exist in bzr, but they are not used with the log command. It is annoying to see that commit messages and metadata from other branches are indented differently than local branch commits in bzr log, because it should not matter where the commit was made but what the commit does. In distributed development each change should be equal.

Mercurial and Git have global revision numbers (in Mercurial it is called changeset id) and logs are ordered based on that. Therefore, each repository, a fork or a branch, has exactly the same view on commits, which is the spirit of distributed versioning.

3. Bazaar's Log Command Lacks Some Useful Options

Since the core of version control is to manage changes, it should be expected that lots of effort is put into inspection tools. Here are some of the features I'm missing from Bazaar:

A note: bzr has added log -p option since my original article (2008-04-12). Good work!

4. bzr bundle-revisions vs. git format-patch

bzr bundle-revisions is used to share changes with other developers. The person who sends the changeset does not need a server of any sorts to share the changeset. The person sends the changeset by email, for example. Unfortunately, the changeset is not human-readable, it's an ASCII coded blob. Human-readable changesets would allow fast reviews and wide audience on mailing lists and forums. bzr diff can be used for mailing lists and forums but changelog entries and metadata is lost. Therefore, Bazaar should add the equivalent of git format-patch that generates a human-readable changeset.

5. Small Issues

Note: bzr uncommit issue seems to have been resolved (2009-10-11). I criticized bzr for not having a good substitute for git reset.

6. Conclusions

Bazaar is clearly a better tool than Subversion. However, in my opinion, it is worse than Git and Mercurial. Bazaar can be used to manage large centralized projects with many developers, and it is a good tool for off-line development in centralized projects. However, Bazaar is not a truly distributed system like Git and Mercurial. This limits Bazaar's usefulness in the fast living Open Source world where forks happen and organization boundaries are (or should be) meaningless.

Most deficiencies in Bazaar could be fixed, but until that happens, I don't recommend it.

Author

Please send any feedback and suggestions to Heikki Orsila.