BRANCHING STRATEGIES
A sophisticated branching strategy can yield significant improvements
——————–
The function of a branching strategy has several functional goals : Create better software (fewer bugs); deliver software faster; allow for flexible (agile?) deployments, where an intelligent choice can be made as to which features are released, giving you the ability to do time-based releases with a variable feature list; and to optimize programmer and tester usage by keeping everyone maximally utilized over a project life cycle. There are several options available for branching a code base, and various reasoning has been given for various plans, but I will present my strategy, the reasoning behind it, and address common concerns.
The end goal is to have at least one branch of the code per environment that said code is deployed on.
In a typical web or client application, this implies that each developer has her own environment on a local machine, with a local database, or at least a dedicated copy of a schema on a shared machine. It is imperative that this level of isolation is achieved, either through interface definition (in the case of SOA architectures) or by copying schemas (in the case of classic n-tier architectures). Each branch will therefore represent the body of work of a single developer on a single feature deployed on a single environment, again implying that developers may have multiple branches for different features. The advantage here is that a developer can focus on changing one part of an application without fear of other changes affecting her *during* development. Furthermore, unit testing is much more targeted and effective because of the guarantee that there are no “unknown” changes affecting the desired changes. Inter-related dependencies will get resolved at a merge point.
Once work is complete and unit tested in a developers branch it can be merged into a release candidate branch. This release candidate branch in a typical organization corresponds to a shared development environment. It is in this branch that initial integration testing amongst branches occurs. Changes merged into this branch are tagged, built, deployed, and tested as necessary. Any fixes that may be required can be addressed directly in this branch and deployed back into the shared environment, again isolating individual users. When this code branch has been accepted, it can be merged into the next branch, which typically corresponds to a test/QA/user acceptance environment. The process is again followed with issue resolution and testing, until the final results are merged into production release candidate branch. At this point you can make the argument that the production branch is a defacto trunk, and any issues that result from here should spawn bug fix releases. I agree with this argument, so in practice when code is ready for release it is merged into the trunk and the production release is built from the trunk.
There are several reasons that are presented for a simpler branching strategy. Most commonly heard is the argument that this strategy is simply too complex; in a real environment there will not be enough discipline to enforce this, causing a tremendous management overhead. The overhead introduced by the sophisticated method is mitigated in several ways. First, as time passes this becomes second nature, or to put it another way the main cost involved is the learning curve, not the day to day operation. Second, the cost of merging in a simple situation is trivial; literally the click of a button. The added overhead is again minimal with the added benefit of a guaranteed clean release branch. Finally, the discussion of overhead must include the fact of overhead for inefficient use of development resources. When using simpler strategies, while the role of code librarian is eliminated or reduced in scope, the entire development team is essentially stopped while a build is tested or released. Also, In summary, the reasons to pursue other branching strategies are inherently flawed and are fueled more by fear an ignorance than by sound planning.
A Second common argument is that merging is time consuming and unreliable. This is also usually stated in terms of multiple developers modifying the same file or set of files concurrently, such as when adding new modules to a project in Visual Studio (TM). The response to this is threefold. First, it is an error of technical direction to have more than one developer modifying the same working set on a regular basis (bug fixes aside), except in certain very constrained scenarios as mentioned previously. For other cases, development should be focused on coding to interfaces not implementations and injecting stubs of known good data as early as possible to keep everyone moving forward. Since the scenarios where this is acceptable is so small, the issue of concurrent modification is invalid and can be reasonably be managed by a single person handling code elevations and merges, which again shifts complexity and stoppage from the whole team to one person, which is a good thing. Second, current control management systems have very robust merging capabilities that take into account shared common ancestors to perform diff-of-diffs type of comparisons, resulting in much better reliability. Finally, by following this strategy a side benefit is revealed in that by one person (or role) handling the elevation and merge you have introduced a rational check point for validating changes and performing code review and documentation tracing. Since this is inherent in the strategy it can easily be incorporated into a software lifecycle, and more importantly can be planned for as a discrete task in a project plan.
Finally, an argument is usually made that all developers need all changes from all other developers at all times. This argument is the hardest to refute because it is based upon the assumption that work should not be isolated, usually born out of a history of poor architectural and design choices wherein there is a tight coupling of all components of a system. The refutation then proposes that by implementing this strategy, these tight coupling become obvious and are an opportunity to make the code base more robust; in essence it institutionalizes the ability to organically detect poor design or implementation choices. Unfortunately this scenario can be a great impediment to change if an existing system already suffers from this tight coupling it can be hard to move away from an existing code base.
In conclusion, a robust branching strategy gives you advantages in reducing overall code development time, code robustness and process control. While the overall complexity of the process is not necessarily reduced, the complexity is shifted to a more efficient point, which allows all team members to be utilized most efficiently, which results in an overall reduction in development time. The inherent process that is introduced by this branching strategy also ties stable branches with hardware environments, which leads to more robust deliveries. These advantages come with no additional aggregate overhead, and as the learning curve is overcome can lead to substantial reductions in time spent with the bureaucratic portions of code control.
