BRANCHING STRATEGIES

A sophisticated branching strategy can yield significant improvements

——————–

The function of a branching strategy has several functional goals : Create better software (fewer bugs); deliver software faster; allow for flexible (agile?) deployments, where an intelligent choice can be made as to which features are released, giving you the ability to do time-based releases with a variable feature list; and to optimize programmer and tester usage by keeping everyone maximally utilized over a project life cycle. There are several options available for branching a code base, and various reasoning has been given for various plans, but I will present my strategy, the reasoning behind it, and address common concerns.

The end goal is to have at least one branch of the code per environment that said code is deployed on.

In a typical web or client application, this implies that each developer has her own environment on a local machine, with a local database, or at least a dedicated copy of a schema on a shared machine. It is imperative that this level of isolation is achieved, either through interface definition (in the case of SOA architectures) or by copying schemas (in the case of classic n-tier architectures). Each branch will therefore represent the body of work of a single developer on a single feature deployed on a single environment, again implying that developers may have multiple branches for different features. The advantage here is that a developer can focus on changing one part of an application without fear of other changes affecting her *during* development. Furthermore, unit testing is much more targeted and effective because of the guarantee that there are no “unknown” changes affecting the desired changes. Inter-related dependencies will get resolved at a merge point.

Once work is complete and unit tested in a developers branch it can be merged into a release candidate branch. This release candidate branch in a typical organization corresponds to a shared development environment. It is in this branch that initial integration testing amongst branches occurs. Changes merged into this branch are tagged, built, deployed, and tested as necessary. Any fixes that may be required can be addressed directly in this branch and deployed back into the shared environment, again isolating individual users. When this code branch has been accepted, it can be merged into the next branch, which typically corresponds to a test/QA/user acceptance environment. The process is again followed with issue resolution and testing, until the final results are merged into production release candidate branch. At this point you can make the argument that the production branch is a defacto trunk, and any issues that result from here should spawn bug fix releases. I agree with this argument, so in practice when code is ready for release it is merged into the trunk and the production release is built from the trunk.

There are several reasons that are presented for a simpler branching strategy. Most commonly heard is the argument that this strategy is simply too complex; in a real environment there will not be enough discipline to enforce this, causing a tremendous management overhead. The overhead introduced by the sophisticated method is mitigated in several ways. First, as time passes this becomes second nature, or to put it another way the main cost involved is the learning curve, not the day to day operation. Second, the cost of merging in a simple situation is trivial; literally the click of a button. The added overhead is again minimal with the added benefit of a guaranteed clean release branch. Finally, the discussion of overhead must include the fact of overhead for inefficient use of development resources. When using simpler strategies, while the role of code librarian is eliminated or reduced in scope, the entire development team is essentially stopped while a build is tested or released. Also, In summary, the reasons to pursue other branching strategies are inherently flawed and are fueled more by fear an ignorance than by sound planning.

A Second common argument is that merging is time consuming and unreliable. This is also usually stated in terms of multiple developers modifying the same file or set of files concurrently, such as when adding new modules to a project in Visual Studio (TM). The response to this is threefold. First, it is an error of technical direction to have more than one developer modifying the same working set on a regular basis (bug fixes aside), except in certain very constrained scenarios as mentioned previously. For other cases, development should be focused on coding to interfaces not implementations and injecting stubs of known good data as early as possible to keep everyone moving forward. Since the scenarios where this is acceptable is so small, the issue of concurrent modification is invalid and can be reasonably be managed by a single person handling code elevations and merges, which again shifts complexity and stoppage from the whole team to one person, which is a good thing. Second, current control management systems have very robust merging capabilities that take into account shared common ancestors to perform diff-of-diffs type of comparisons, resulting in much better reliability. Finally, by following this strategy a side benefit is revealed in that by one person (or role) handling the elevation and merge you have introduced a rational check point for validating changes and performing code review and documentation tracing. Since this is inherent in the strategy it can easily be incorporated into a software lifecycle, and more importantly can be planned for as a discrete task in a project plan.

Finally, an argument is usually made that all developers need all changes from all other developers at all times. This argument is the hardest to refute because it is based upon the assumption that work should not be isolated, usually born out of a history of poor architectural and design choices wherein there is a tight coupling of all components of a system. The refutation then proposes that by implementing this strategy, these tight coupling become obvious and are an opportunity to make the code base more robust; in essence it institutionalizes the ability to organically detect poor design or implementation choices. Unfortunately this scenario can be a great impediment to change if an existing system already suffers from this tight coupling it can be hard to move away from an existing code base.

In conclusion, a robust branching strategy gives you advantages in reducing overall code development time, code robustness and process control. While the overall complexity of the process is not necessarily reduced, the complexity is shifted to a more efficient point, which allows all team members to be utilized most efficiently, which results in an overall reduction in development time. The inherent process that is introduced by this branching strategy also ties stable branches with hardware environments, which leads to more robust deliveries. These advantages come with no additional aggregate overhead, and as the learning curve is overcome can lead to substantial reductions in time spent with the bureaucratic portions of code control.

Multi Developer branching

Overview

Software Configuration Management (SCM) is the art of both controlling and tracking changes in a software project. If software engineering is fundamentally concerned with producing quality software in a known and repeatable fashion, then SCM is the fundamental tool that drives the operation of software engineering.

Why is Software Configuration Management important?

The formal goal of a software engineering project is to deliver a set of integrated software components. In practice, software engineering is often intertwined with systems engineering; that is, the portion of a project concerned with hardware and deployment or otherwise non-functional requirements. The practical goal for any project is then to deliver a stable product which includes not only software, but the hardware specifications and configuration artifacts needed to realize the software system. Configuration Management enables the delivery of stable software and further enhances an organizations ability to deliver revisions and new features by answering the fundamental question : “How do I reproduce a change that someone has made?”. By tracking and controlling changes, SCM allows a program manager to answer that question, and make intelligent decisions based on that information. For the programmer, configuration management is important in another way – research. Proper CM allows a programmer to explore changes and features that may lead to a dead-end, without affecting the rest of the project. To summarize, CM provides reliability (artifacts are maintained in case of failure), flexibility (you can choose to go in a different direction), repeatability (you know what has changed) and isolation (research in one field won’t affect other development). Any one of the preceding four is reason enough to employ intelligent SCM; taken together it is clear that SCM must be at the core of software development.

Who is responsible for configuration management?

In a word, everyone is responsible for configuration management. From a sales manager (on commercial projects) on down to the individual programmer, configuration management needs to be a part of every day discipline. As described previously, the programmer’s fundamental concern with configuration management is one of implementation; making sure that proper procedures are followed and artifacts are revisioned often to prevent data loss and create a change trail. Architects and project management are responsible for defining the strategies to be employed by a group; in particular the way in which the implementers collaborate to achieve a stable release. Sales management or client facing people are responsible for defining and communicating what features are needed that drive that strategy, including needed release dates.

Roles within SCM

With an understanding that everyone is responsible for configuration management, the role of each individual needs to be determined. Roles can loosely be broken into two groups : planners and executors. The planner role is played collectively by the system architects, the sales team (or equivalent) and the IT management. The executor role is played by anyone who is involved in the day to day management of artifacts. On particular role that can be of high importance is that of the code librarian who watches over the repository and communicates changes as necessary (more to be said on this later). As much as can be generalized, the code librarian is the arbiter of what is available for release, which can be further generalized as the role release manager (the librarian may physically execute, the release manager decides which features and when). Architects or project managers define what reasonable branches of work need to take place.

What artifacts are appropriate for revision control?

In software development it is pretty clear that source code needs revision control, along with associated development artifacts like project configuration files, but the scope of revision control is actually much larger. Any project documentation benefits from revision control, but even more from change management. Revision control is only part of the issue; that is to maintain a history of changes. Of even greater import is to control the changes. Any document that is used to communicate in the course of a project can benefit from control management. Requirements documents, design documents, scope documents, test cases and acceptance documents are all examples of artifacts that should be controlled and revisioned, even if they are ephemeral artifacts of a particular development phase. Many of these documents are already controlled in an ad-hoc manner when using MS-Word and the ability to accept changes! Taking the concept a step further, it is beneficial to control all aspects of a software development process, from the operating system to the compilers used. Unfortunately, this extreme view of revision controlling is seldom practiced, and lead to many common problems. How many times have you been on a project where the versions of development tools or compilers has been different? Or the development system versioning is different than the production system?

SOA performance measurement is important for two main reasons — requirements compliance and system planning. The first reason, requirements compliance is obvious, where you must prove that your system meets the overall performance requirements of whatever process you are running. The second reason, system planning, is necessary to get to a fully deployed SOA in the first place since you must know how each component performs individually and collectively so that you can plan on the amount of hardware and the number of systems you need to realize your SOA. I will examine the ways in which individual components (or services) can be measured for performance by drawing a loose analogy to the way in which electronics systems are characterized, as well as discuss the importance for each type of measurement and how to adjust your system design or implementation based on the results of each measurement.

For individual components, performance can be measured in several ways, which can be thought of as analogs to the way electronic systems are characterized: the impulse response, the step response, and the frequency response. The impulse response characterizes how the system reacts to an excitation on an input pole; in an SOA component this is analogous to the latency from a single input on a single operation. The step response of an electronics system is how it reacts from going from some zero state to an input of one unit in some short period of time; for an SOA system this is like going from no inputs to a heavy load of inputs, but I will break the analogy and consider a step-wise type function, where the input is varied from zero to some maximum value, then back again to get a decaying response as well under load. Finally, the frequency response of an electronic system defines how the system reacts when a signal of varying frequency is applied at its inputs; for an SOA component this can be thought of exciting all operations on a component simultaneously.

In an SOA component, the impulse response is analogous to the latency from a single input to the response to that input. Remember that a single service can have multiple operations, per input operation, so for this measurement you assume a complete decoupling of the operations; that is, there is no operation that changes the state of the system in such a way as to change the semantics of any other operation. The latency of each operation is an important measure of how the component will act within an orchestration of other components, and forms the basic metric for performance. However, another important metric to get from the impulse response is the amount of resources consumed for that latency. The percentage of resources used coupled with the latency number can give a rough idea of the upper bounds on the throughput of a component on dedicated hardware. As an example, if operation FOO() takes 400 msec and uses 20% of the CPU and no I/O, then a rough upper bound for throughput would be 5 (100%/20%) FOO() operations in a 400 msec interval, or 400 msec/5 or 80msec aggregate FOO() throughput. Given these numbers, you may need to go back and refactor your operation to either make it more efficient, to reduce the latency and increase the throughput, or even to refactor the operation and parallelize it into multiple operations. After deciding on refactoring or optimizing, it is tempting to stop at this point of analyzing performance, but there are several factors missing. First, the assumption was made that the process will simply scale linearly. This implies that there is zero shared state or locking within the process, which for a well designed service should be true. However, the realities of modern hardware demand that multiple processes will contend for access to low level devices (memory pathways in particular),especially if you have parallelized your operation in any way Amdahl’s Law takes effect, so it is necessary to consider another type of profiling of your component. (for some examples of how the managed environment itself can be causing problems like this see my last post on performance)

The second performance measurement step is analogous to the step response of an electronic system, where in you cause your input to jump from zero to one unit in a very short time. In the case of an SOA component, this means running the individual operations with a high enough load to push the system and to step that load up then back down. The step portion of the measurement is important for two reasons. The first stepping allows you to measure how initial system startup time can affect latency, just as subsequent steps allow you to measure how the system responds to load under a steady state. Secondly, stepping the inputs down allows you to measure worst case latency as well as aggregate throughput; since worst case latency is most likely going to be a limiting factor on overall performance, it is critical that it be measured under load, and not inferred from the impulse response. Finally, your latency can be affected by a backlog of inputs differently when the rate of inputs changes due to queuing theory, which is beyond the scope of this discussion. This measurement therefore takes into better account any hidden dependencies that you are unaware of to give you an idea of your worst case latency for messages and the corresponding throughput under varying input loads and system resource demands. At this point you have a decent idea of what your worst case latency is, and assuming that the latency is tolerable, you have an idea of the type of throughput you can expect for each operation. The performance measurements now allow to again optimize or refactor your component, but given that the numbers are acceptable you can now start to plan how many nodes will need to be executing to meet you overall throughput needs of your system, but there is one more important aspect of performance to consider – the aggregate performance of the component when all operations are active.

The case when you are exercising all inputs of a component is analogous to the frequency response of an electronic system. By activating all of the operations simultaneously, you will again uncover any hidden dependencies that affect latency and throughput under load. This third type of performance measurement is ironically the most critical. This measurement can reveal if there are any aspects of your component design that have shared state or dependencies. Conversely, seeing bad results from this measurement allows you to examine splitting the component in to multiple components and choreographing them differently or running them on separate nodes. Fundamentally this step is simply running the step measurement simultaneously across all operations, but you must take care to vary the loads across the operations at different frequencies to measure how the system responds when one operation is under heavy loads but certain others are quiescent and vice versa.

Armed with the results of all three types of input testing you should now be able to more reasonably plan out your resource requirements, and also optimize the design of your components and services. These measurements will allow you to parallelize a portion of an operation, or even normalize it into two or more operations to achieve your necessary latency and throughput requirements. The measurements will also be useful when you have to document and prove the performance of your system to an outside party. This discussion has left out a few key topics, most notably what specific mechanism should be used to drive your system, how do you do planning if you haven’t actually built the systems yet, but only have assumed numbers from simulations, and how these performance numbers relate to the overall performance of an SOA choreography. Briefly, the driving mechanism for an SOA component should be fairly simply. Any test harness (Junit/Nunit/httpperf) should be able to be configured to exercise a service interface by simply specifying one or more of the allowable domain elements for an operation. The only trick is to make sure that it is run in parallel or is internally threaded for running step load operations, and that it is running on sufficient hardware that the test harness itself is not the bottleneck (although that in itself is useful data).

The question of how component performance affects overall system performance will be addressed in the next installment.

Automatic garbage collection is simultaneously the best and worst part of implementing programs in managed code. On the positive side, garbage collecting makes the writing of the program easier by freeing you from the need to explicitly manage memory and it thus makes the runtime more stable as it prevents most memory leaks (Self-referential lists or cycles aside). On the negative side garbage collection takes up a finite amount of time and necessarily is intrusive, as it resets pointers as it compresses heap space. The amount of time spent in garbage collection is precisely the drawback of automated garbage collection : there are no practical bounds to the amount of time needed nor how often the garbage collector will run. In languages that only support a limited number of options for the garbage collector (e.g., .NET CLR and Java pre-1.4) the problem can be even worse as one scales an application to multiple cores (there is a good document at Sun’s website that discusses the linearity problems of garbage collecting and Ahmdal’s Law).

As a practical matter, the garbage collecting time has rarely entered into the average programmer’s consciousness. The issues with garbage collection time and throughput only reveal themselves in a subset of problems, namely highly transactional, stateful, multi-core systems. It is precisely these kinds of systems that current Service Oriented Architectures represent, making them susceptible to problems with throughput.

I ran into this problem with a C#/.NET based system I developed at RedLasso to handle bandwidth and other resource constraints while loading a processing grid. The issue in this case was handling the state of the system to determine which nodes had access tokens and which nodes had queued requests for those tokens. As a pure processing problem this would not even begin to tax the processor on the reference quad core system I was using, since the task simply required accepting messages and pushing and popping from a FIFO queue then sending messages, at a moderate messaging rate of 100msec average. Early tests showed that on a dual core system, the message rate could be maintained somewhere on the order of 1msec average before the processor started to become a bottleneck, so assuming a linear processing model, a real world rate of 100msec should translate into 1/100 of processor resources, and potentially even ½ of that on a quad core system, if the processing itself can be efficiently threaded. In the this case, I was relying on the WSE framework’s thread pools for incoming messages to take advantage of message parsing, while my main worker thread would be the linear part that handled the synchronized queue management. What I found was that the total CPU usage was nearly 70%, and using the performance monitor I found that upwards of 80% of the time was spent in the garbage collector. Changing the flavor of the .NET garbage collector from server to workstation, both concurrent and not, did not make a significant difference.

To try to isolate the problem I created a standalone program that attempts to mimic the message and queue handling. The program consists of a configurable number of producer threads and a single consumer thread. The producer threads create a random sized string and drop those messages on a thread safe queue. The consumer simply pops the data off of the queue and continues on. The production rate is configurable and causes the producers to sleep a random amount of time, centered on the configured rate; if for example the rate is 50 msec, each producer thread will produce a message then sleep for a random amount of time between 0 and 100 msec. The consumer runs at a the configured rate divided by the number of producers, to try to ensure that on average the consumption rate at least matches the production rate. With this program completed, I ran a test on a dual core machine with a 50msec average duty cycle and 16 producer threads and using the windows performance monitor I captured the percentage of time spent in garbage collection as well as the sizes of the various heaps. I also noted the absolute amount of processor time noted by the task manager. The results were quite astounding. On average, the program was using ~10% of *each* processor and spending 6% of the time garbage collection. Changing to the server version of garbage collection reduced the average percentage time to ~2.3%, but did not have any apparent impact on the overall CPU utilization (

Performance monitor using workstation GC

Performance monitor using workstation GC

). Finally, the workstation version had a peak time of 99 % in the garbage collector. The server version seems to peak closer to the 10% range. This number is important because of latency, which I will discuss in the next installment. Remember that the program is simply moving data into and out of a queue and not actually doing any processing other than to create random strings to simulate data packets. Furthermore, the performance counters showed a large amount of collections happening in the generation 1 and generation 2 heaps. Due to the random nature of the producers, I believe that many objects are getting promoted to gen1/gen2. In addition, since the queue itself is a gen2 object, all accesses to the queue will mark it as written and thus require a gen 2 collection, which is more expensive than a gen0 collection. A trace of my runtime system show further calls to the garbage collector, and has the added issue of a large object heap that changes sizes.
It appears that the pattern I use that has work items on queues and worker threads processing those queues, rather than one thread per message, does not work particularly well with the .NET garbage collector in a highly transactional system. This is highly surprising considering this is exactly the pattern that Microsoft calls “port completion” (and I believe claims to have invented).

Given the production issue that I saw and the results from my tests, I decided to compare the performance of the same scenario in another managed environment. I transliterated the code into java and ran the same tests, using the three different garbage collection options available. In each case, the overall cpu utilization was much lower (closer to 1% than to 10% on the task manager), but I got varying numbers for the amount of time spent in garbage collection :

. I used the visualgc tool that comes with jvmstat, from Sun, to monitor the garbage collection times and heaps, which is similar to the performance monitor tool that comes bundled with Windows XP. The results were interesting in all three cases. The default garbage collector performed the worst, spending roughly 0.33% of the time in garbage collection, while doing 1894 collections over a span of 5 minutes. The parallel garbage collector (-XX:+UseParallelGC) was marginally better, spending ~0.29% of the time in garbage collection, with 888 collections over 10 minutes. Finally the concurrent garbage collector (-XX:+UseConcMarkSweepGC) spent only 0.10% of the time in garbage collection, with 303 collections over ~6.5 minutes. In all cases, the java version ran much “better” than the .NET version; a word of caution is needed however, in that there is no good indication of the peak time spent in garbage collection, again it is a question of latency, particularly in transactional systems.
The results of this test make it clear to me that when dealing with message passing distributed systems, managed environments do not end up scaling as well as would be preferred when running on multiple cores. Since the system I just described is what most SOA systems are turning into, the throughput and latency effects introduced by garbage collectors will need to be taken into account when designing and implementing systems, even to the point of choosing one platform over another, which is ironic considering that managed environments are supposed to ameliorate the risks associated with memory management in a system, and make the choice of platform more independent. I would further conclude that until Microsoft opens up the garbage collector or has better implementations of garbage collecting, runtime systems that rely on low latency will continue to suffer if implemented in a .NET runtime.
All code is available on google code at : http://garbagecollectioncomparison.googlecode.com/svn/trunk/
also note that all tests on .NET were done using release builds, not debug builds.
Please grab a copy of the code and run it for yourself, tweak it, and let me know what you find.

So I was reading the latest issue of Dr Dobb’s and got to the piece by Scott Ambler called “Is Fixed-Price Software Development Unethical?.” Mr. Ambler, as you may know, is the Agile Methodology guy at ddj. It is really a fascinating article, and something that I have been on about for some time now : you simply cannot do fixed cost software projects. Mr. Ambler took it a step further and questioned the ethics, which is really an interesting concept. As a quick side note, it is interesting that in an article on ethics, Mr. Ambler basically recycled an article he wrote more than a year ago. Journalistic ethics aside, the article does raise an interesting twist to the problem : can we ethically propose fixed cost software projects?
If you don’t want to RTFA, I can summarize a few of the more salient points. The key to understanding fixed cost software is that the intent is to mitigate risk for the stakeholder (the person wanting something done) by specifying boundaries of time and money, or as sometimes happens, the desire to squeeze a large profit margin out of a project. Unfortunately, what ends up happening is that an inordinate amount of time and money is spent on Big Requirements Up Front (BRUF) which results in an untenable development model that assumes that all requirements are known and static. Following this to its logical conclusion, The fixed price project suffers at all subsequent phases; in the development process there is inherent disincentive to allow for change management; the end product contains many portions that are unused and unnecessary; the end product fails to deliver on new (discovered) requirements; and finally, the project usually end up late and over budget anyway. QED, the risk mitigation aspect is a feel good fantasy at best for those unwilling or unable to understand the creative aspects of software development.
Now that we have an understanding that at some level, or at least an assumption, that Fixed Priced Projects are A Bad Thing, I must examine the assertion that responding to an RFP that is fixed price is unethical. On the surface, doing something that one knows is wrong is pretty much the definition of unethical. I think that Mr. Ambler missed a point though. Responding to an fixed cost RFP is not the unethical part; putting said RFP out in the first place is the unethical. There is nothing unethical in giving an organization what it wants, or at least thinks it wants. To be sure, it is better to propose a non-fixed price alternative, perhaps in addition to the RFP, but it seems to me that that route is on the fast path to unemployment. As Agile Methodologies continue to be accepted, I predict that fewer and fewer organizations will want a fixed cost project anyway, but until then, keep responding to those RFPs and try to get the system changed from the inside