Friday, February 5, 2010

Developing on HEAD scales to Google

I have a simple rule of thumb for what I am and am not allowed to say about Google. If I can find it said in some official-looking place from Google, then I think I'm allowed to say it. Otherwise not.

I was therefore very glad to run across this talk by Guido Van Rossum describing his code review tool named Mondrian, put on youtube by Google Tech Talks, and judging from the knowledge level of the audience, delivered to a non-Google audience. Confirming that this was not, in fact, accidentally open, I found an O'Reilly article confirming that it was public.

I therefore feel comfortable talking about anything and everything said in that video. So you can view this as my spin on the material in that video, and if you want the full version without the editorializing, feel free to take an hour and watch the video for the original source.

These days everyone who is competent has agreed that source control is a Good Thing to use. However opinions on how to use it vary widely. One argument in particular that I've seen in multiple organizations is on the value of using multiple branches versus having everyone develop in one branch, which is often called something like MAIN or HEAD. The issues involved are somewhat complex, and not obvious. So let me give a quick overview.

The primary argument for branching is that people can work on different features on different schedules without getting in each other's way. The primary argument for all developing on the same branch is that you discover conflicts immediately when they are easiest to resolve, rather than months later when people have moved on to different things and have lost context for the pieces of code that conflicts. The primary argument against branching is the pain of merging later. (Granted, distributed version control systems like git have done a lot to reduce this pain. But they have not eliminated it.) The primary argument against developing on HEAD is that it requires a constant level of diligence on the part of developers. When any developer can break all developers, you need to be careful in what you check in.

I've just kept this down to the primary arguments because the secondary back and forth arguments get long, involved, and heated. Also what I've described as two ways of working is really two families of working. There are a lot of ways to organizing multiple branches. And there are quite a few useful uses for branches in a software project even if everyone is developing on the same branch. None of this is made easier by the fact that (as with many religious wars) the people involved have imprinted on one way of doing things. That makes them hyper aware of the problems in other approaches, and they don't even notice potential pain points in their own.

Until I came to Google my personal position was that everyone working on HEAD was the best approach as long as your team was small enough that you could make it work. And I vaguely accepted that the pain of branching was a necessary evil on large software projects. Even when the pain reached the point of craziness, I was mostly willing to accept unsupported claims that it is necessary.

Then I came to Google. There are a lot of groups at Google, and they are free to do their own thing. Many do. However most groups develop out of one giant code base on HEAD. And it works as a development model. Google has made it scale.

Guido's talk describes many elements of what makes it scale. The first piece is having good developers. The second piece is an enforced policy of having every single patch go through a code review process before you check anything in. The third piece is a lot of proprietary infrastructure that Google has built up to make things work. And beyond that you have people paying attention to best practices such as consistent style, good unit testing, so on and so forth. (All of which are reinforced in the code review process.)

My opinion after seeing Google has changed. I freely admit that there are real process and tool challenges to making it possible for large teams to develop everything on HEAD and have it scale smoothly. However it is possible. I've seen it work. And, speaking personally, this is my preferred way to work.

Different organizations are different, have different capabilities, different needs, and different goals. Sight unseen I'm not going to tell anyone else that their organization should try to work like Google does. I simply don't have the facts about your situation. But if, like me not that long ago, you've accepted the claim that large development teams have no choice, you now know better.

Now admittedly most people can't go and see this first hand for themselves at Google. But if you want to watch a large successful project developing code to a high standard in this way, I recommend watching clang. And if you want to know more about what kinds of tool support you need to make this work, go listen to Guido. He's smarter than I am, has been doing it longer, and actually built some of the basic tool support for it.


ysth said...

I like branches for commiting a series of smaller changes, each of which leaves things in a, if not broken, at least undesirable state. The smaller, more focussed commits make for better later archeology.

Otherwise, I very much like developing on HEAD.

But doesn't the larger scale, combined with the delay inherent in code review, result in a lot of "well, it didn't conflict when I created the patch" pain? Maybe Guido addresses this, but I'm to lazy for long videos.

And congratulations on the new job!

John Tantalo said...

I think you mean "trunk", not "HEAD". HEAD is the most recent revision in a repository.

btilly said...

@ysth: The pain you are referring to is a tool problem. In Google's case the tools don't let you commit if there is a conflict.

Furthermore one effect of code review is that it is easier to get 10 10 line patches code reviewed than a single 100 line patch. This encourages committing in smaller increments which is great for archeology, and is good for avoiding last minute conflicts.

Apreche said...

If I had other people doing code review of every single line I write, and proper testing tools, and all that other goodness, then it really wouldn't matter if I did branches or not.

As it is, I don't have those things. I also work on a wide variety of things, not concentrated in just one area, but still all on the same codebase. Therefore I have to switch between working on big features, small features, bugs, critical failures, and more all at the same time. Without branches, DVCS and Git, I would be toast.