Synergy of review techniques
from PSP(SM)[1] to Formal inspections
Daniel M. Roy
(STPP, Inc. - Software Technology, Process and People, Inc.)
(Visiting scientist, SEI)
Abstract: Data, observations, and an
experimental framework based on the principles of the experience factory are
proposed to show the synergy of the PSP(SM) personal review techniques, the
TSP(SM) approach and Fagan’s formal inspections. The model can be used to
predict defect removal patterns and costs at the personal, team, and organizational
levels.
Since their introduction in 1976, formal inspections have had a highly positive impact on the maturity of the software process and on the quality of the software products of numerous companies world-wide [Gilb-93]. A large body of evidence (from previous SEPG conferences for instance) has established the fact that formal inspections are one of the most cost effective and easiest measures that can be put in place to make an immediate positive impact on any organization involved in software development.
The Capability Maturity Model (CMM)[2] has had a profound impact on the organizational practices within the software industry [Herbsleb-94]. It is fitting that Peer Reviews (PR) feature preeminently in CMM V1.1 (been a level 3 Key Process Area). Why PR is no longer a Process Area in CMMI-SE/SW V1.0 is very hard to understand.
Others, even more general SPI paradigms such as the experience factory have been demonstrating the value of experimental (model) based software improvement for over 20 years now [Basili-89]. Various kinds of inspection techniques at different levels in the development process have been studied at NASA Software Engineering Laboratory (SEL) for their obvious importance to the maturity of the software process and the quality of its products. Numerous such experiments at SEL have clearly demonstrated that the scientific method can indeed be applied to software[SEL-00].
Building on the success of the CMM, the personal software process (PSP) was developed by Watts Humphrey [Humphrey-95] by downscaling the “what” of CMM major practices to the “how” at the individual engineer’s level. Using fairly simple and well proven engineering principles, the PSP trained engineer plans his/her work, enacts a well defined process, building the product while gathering data, and performs a post mortem analysis that seeds the next improvement cycle. In so doing, the PSP is much more than a downscaling of CMM. It can be seen, and it is taught by the author, as the downscaling of Basili’s Experience Factory (fig. 1), the very spirit of CMM level 5.
In the experience factory, every project is an experiment in the scientific sense. An hypothesis is formed (such as “Reviews should have a measurable impact on the total project costs”) and measurements are made to confirm or invalidate the hypothesis. To quote Dieter Rombach, in the experience factory, “…process technology is applied, the impact on the resulting products is observed, and possible improvements regarding the process technology are identified via root cause analysis.” [Rombach-99]

Figure 1: Downscaling the factory
How fitting, that is Vic Basili, the creator of the experience factory, who wrote the preface to Watts’ book on PSP! It is also fitting that personal design and code reviews (downscaled from Fagan’s inspections) feature preeminently in PSP. However, the current PSP training from SEI puts more emphasis on the product improvement aspects of these techniques rather than the personal process improvement that the analysis of their data should motivate.[3]
The Team Software Process (TSP) is now under development at SEI to apply PSP principles to small teams and bridge the gap between PSP and CMM practices. Formal inspections feature preeminently in TSP activities. The “Quality Manager” and “Process Manager” (two of the major role of TSP) cooperate to lead the team in improving the process based on the analysis of the results of these inspections.
This paper is based on industry data from nearly 100 engineers. Observations, lessons learned and quantitative results are offered in the spirit of the experience factory.
Figure 2 shows the evolution of the “Process Yield” during a PSP class recently offered by the author at a major software company in New Jersey (USA). The PSP yield is defined as the number of defects removed before first compile divided by the number of defects injected before compile and expressed in percent. The numbers on the X axis represent the programs that PSP students have to produce during the 10 days training class to practice increasingly sophisticated levels of their personal process.

Figure 2: Improving personal
review effectiveness
Before the introduction of formal personal design and code review techniques with program 7, engineers remove only a small fraction of their defects before compile. After defining their own checklists, based on the analysis of their own defects, the yield rises to an average of 90% by the end of the class.
As a way to share lessons learned and fight cognitive dissonance [Weinberg-71], the author also demands that cross reviews (two engineers review each other’s design and code) be held after a personal baseline has been established with program 7. This extra step explains the superior results observed (compared to the SEI data base [Hayes-97]).
Results and their statistical validity are discussed in detail during the talk. In particular, the yield dip observed for program 8 is due in great part to one engineer who did not apply the process! The same engineer also can take credit for the minimum yield (0%) for program 9.
One of the metrics known to have an impact on review yield is the review speed typically expressed in number of lines of code reviewed per hour. Gilb advises to not exceed 200 executable LOC reviewed per hour to maximize the chance of finding defects during inspections [Gilb-93]. The PSP class uses the same number.
Figure 3 shows an attempt at finding an explicit negative linear regression relationship between yield and speed with PSP class data. Tremendous variation in individual performance results in a very poor correlation (R**2<0.2). It must be noted that this data was gathered in a very small group (10 programmers) featuring a wide variation in individual experience and programming languages. Your mileage will vary.

Figure 3: Poor correlation between code review speed and yield
With Team software process (Fagan style) inspections, correlation is not that much better but limited author’s team data obtained through TSP0.3 experiments in India show a clearer difference in yield between “fast reviews” (faster than 250 LOC/h) and the rest (Figure 4).
This time the group is much more homogeneous, the individuals having very comparable experience, using the same programming language and environment in a production setting and, above all, having completed PSP training and TSP launch as a team.

Figure 4: Respecting the TSP speed limit
This data seems to confirm the hypothesis that better results are obtained when the speed limit is respected. However, again, the tremendous variation in individual results (and even from review to review by the same individual) reinforce a major message from the experience factory: systematically gather your own data in your own environment.
It is unfortunate that only the product improvement aspect of formal inspection is typically emphasized in industry. It seems that time is always lacking to do causal analysis to improve the process and eradicate recurring defects.
Barry Boehm in his landmark work of over 20 years ago provided highly practical information on the cost of defects found at various stages of the process [Boehm-82]. In a recent study of DoD contractors, Don O’Neill provided similar data accompanied with insightful comments showing the benefits of Fagan class inspections [O’Neill-98]. Their data is summarized in Table 1 below.
|
Phase removed |
Time to fix defect |
|
Code inspection |
30 minutes |
|
Integration testing |
2-10 hours |
|
System testing |
10-40 hours |
Table 1: Cost of defects
If a model could be built to characterize defect removal effectiveness and costs across the process, the impact of review techniques, at each phase of the life cycle, could be quantitatively studied, and their return on investment precisely determined. The following paragraphs describe such a model.
Figure 5 represents any phase of a software process as a filter, actually an active filter in the sense of electronics. Noise (defect) is inherited from previous stages, some is created in the current stage, some is removed and some escapes to the next stage.

Figure 5: Filtering defects
The goal of any software process is to maximize the signal/noise ratio by reducing the injection of defects while boosting the effectiveness of their removal (filtering). This is best achieved at the individual level (where the defects are created) through disciplined care and by the application of a synergy of review techniques:
· Personal (PSP-like) review using individual checklists to catch what is typically missed by the particular engineer
· Cross review (Extreme programming-like) to alleviate personal bias and “cognitive dissonance” (I have difficulties finding my own errors because my errors clash with my self image as a “great programmer”)
· Formal inspection (Fagan-like) to bring many different viewpoints and background to bear on a common cause: improve the product and help eradicate process root causes.
Figure 6 shows how several stages of filtering can help improve the quality of the software product. Here, yield is defined for each stage as the percentage of the defects removed in the stage that were present at stage entry. For instance, the first stage of yield Y1 removes a portion of the I1 defects present at its entry but allows I1(1-Y1) defects to escape to later phases. To simplify, the assumption is made that no defects are injected (erroneous fixes and “mirage” defects) in any of the review stages.

Figure 6: Cascading the filters
When several stages of filtering are combined, the contribution of each stage can be easily studied and a composite (akin to a transfer function in electronics), or chain yield can easily be computed and, therefore the number X of defects left in the product after all the reviews can be predicted if defect densities and review phase yields are consistent.
Figure 7 shows the impact of boosting detailed design review (DLDR) yield from 50% to 80% on the defect escaping a chain of reviews everything else (such as other stage yields) been equal. Note that, like in electronics, it is necessary to limit the noise of the upstream stages (since it will be amplified by the downstream stages). This effect results in increasing costs as the defect ages, a well known empirical result.


Figure 7: Impact of upstream yield
This model can be used to study the impact of review techniques on the process overall defect removal effectiveness. An excel spreadsheet model is available from the author (danroy@stpp.com) to do this. Figure 8 shows the model at work on a typical distribution of defects and typical yields seen in industry data.
|
|
|
|
|
|
|
|
|
|
Size New and changed: |
LOC |
|
|
|
|||
|
Tot. defect density: |
|
/KLOC |
|
|
|
||
|
Total defects in product: |
|
|
|
|
|||
|
Defects left after unit test: |
3 |
|
|
|
|
||
|
PSP yield: |
78 |
% |
Total Develop. Yield: |
94 |
% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Defect flow model:
Number of defects |
|
|
||||
|
Phase |
Yield % |
Inherited |
Injected |
Removed |
Escaped |
Injected |
Removed |
|
Design |
5.56 |
0 |
9 |
0 |
8 |
18 |
1 |
|
DDR |
33.33 |
8 |
0 |
3 |
6 |
1 |
6 |
|
XDDR |
30.77 |
6 |
0 |
2 |
4 |
1 |
4 |
|
Code |
2.47 |
4 |
35 |
1 |
39 |
72 |
2 |
|
CR |
52.50 |
39 |
0 |
21 |
19 |
1 |
42 |
|
XCR |
46.15 |
19 |
0 |
9 |
10 |
1 |
18 |
|
Compile |
65.22 |
10 |
1 |
7 |
4 |
2 |
15 |
|
Unit test |
50.00 |
4 |
2 |
3 |
3 |
4 |
6 |
|
Sums |
|
|
49 |
46 |
|
100 |
94 |
|
|
|
|
|
|
|
|
|
Figure 8: Typical individual defect profile
The model requires the software size in LOC, the total defect density (in defects per KLOC) and the historical distribution of percentage of defects injected and removed by phase.
The same spreadsheet also yields rework times (Figure 9) for PSP2.1 or one PSP3.0 cycle (typically used for TSP) as well as the expected rework time after unit test (rework in integration and system test). Individual rework times by phase injected and removed used here are rough averages. Your mileage will vary. Of course, such data can be directly collected from the PSP defect logs. Needless to say that the data clearly is and must remain the private property of the individual engineer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rework by phase (mn) |
|
|
|
||
|
|
Design |
DDR |
XDDR |
Code |
CR |
XCR |
Compile |
Unit test |
After UT |
|
|
1 |
23 |
21 |
3 |
123 |
78 |
15 |
62 |
1400 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Cost of rework in dev.: |
5.4 |
Hours |
|
|
|
|
||
|
|
Cost of rework after test: |
23.3 |
Hours |
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Average fix time per
defect (mn) |
|
|
||||
|
Inj/Remvd |
Design |
DDR |
XDDR |
Code |
CR |
XCR |
Compile |
Unit test |
After UT |
|
Design |
2 |
8 |
12 |
10 |
15 |
18 |
2 |
45 |
900 |
|
DDR |
|
2 |
5 |
8 |
10 |
12 |
2 |
45 |
900 |
|
XDDR |
|
|
2 |
5 |
8 |
10 |
2 |
45 |
900 |
|
Code |
|
|
|
2 |
5 |
8 |
2 |
30 |
600 |
|
CR |
|
|
|
|
2 |
5 |
2 |
15 |
600 |
|
XCR |
|
|
|
|
|
2 |
2 |
15 |
600 |
|
Compile |
|
|
|
|
|
|
2 |
10 |
400 |
|
Unit test |
|
|
|
|
|
|
|
5 |
200 |
|
|
|
|
|
|
|
|
|
|
|
Figure 9: PSP defect cost model
Note that the significant amount of rework in personal code reviews (CR) and cross CR observed in this case, calls for either more care during the coding phase or (more probably) improvement in the design process. Such insight is impossible without the systematic application of the PSP data collection framework and regular analysis of these measurements.
A similar spreadsheet model (also available from the author) allows the calculations of rework costs knowing the defect injection and removal profiles at the organization level.
Figure 10 shows a typical such profile (for one of the author’s customers) before the introduction of formal inspections. Most of the 5800 or so defects are removed in the later (testing) phases of the life cycle where they cost 8 to 40 times more to fix than in the earlier phases.
For each phase of the life cycle, the number at the top of the box is the percentage of total defects injected in that phase. The number at the bottom is the percentage of the total defects removed in that phase. Costs are computed by multiplying the number of defects removed in the phase by their cost (found in the table at lower left).
Figure 10: Defect injection and removal profile without
inspection
Figure 11 shows the very different profile, resulting from the correct application of inspection procedures. Comparing the two defect removal costs profiles yields the expected savings that can result from a more mature development process.
Figure 11: Savings expected with inspections
Several factors impacting individual yield, cross review yield and formal inspection effectiveness, including psychological factors, are briefly discussed in the talk:
· The need to “decriminalize” defects for personal review
· The need to respect introverted preference and the potential conflict between Kersey and Bates’ temperament as well as Thinking and Feeling decision making preference in cross review teams
· Perspective based vs. defect based review techniques for various kinds of material under review [Rombach-99]
Experiments are under way in the spirit of the experience factory to determine the importance of such factors and measure their impact on the overall process.
The quality of the software product depends on the effectiveness of all defect detection and removal techniques used at each stage of the software process. This paper has shown the importance of personal review techniques. However, the results obtained seem to show that PSP style reviews are insufficient in a production (team-based) environment.
A simple model allows the quantitative study of the synergy between personal reviews, cross reviews and formal inspection using data gathered from the field. The evaluation of defect detection and removal costs at each phase of the life cycle provide a clearer picture of the costs and benefits of each technique. Such a model, driven by PSP data can contribute to the continuous optimization of the software development process.
[Basili-89] Victor Basili, “Software Development: A Paradigm for the Future”, Proceedings of the thirteenth Annual International Computer Software & Applications Conference, Orlando, FL, September 20-22, 1989.
[Boehm-82] Barry Boehm, “Software Engineering Economics”, 1982
[Gilb-93] Tom Gilb and Dorothy Graham, “Software Inspection”, Addison Wesley, 1993
[Hayes-97] Will Hayes and James W. Over, “The Personal Software Process (PSP): An Empirical Study of the Impact of PSP on Individual Enginers”, CMU/SEI-97-TR-001, CMU, 1997.
[Herbsleb-94] Jim Herbsleb et al, “Benefits of CMM Based Software Process Improvement”, CMU/SEI-94-TR-013, August 1994.
[Humphrey-95] Watts S. Humphrey, “A Discipline for Software Engineering”, Addison Wesley, 1995.
[O’Neill-98] Don O’Neill, “National Software Quality Experiment”, Proceedings of the 23rd SEL workshop, GSFC, 1998
[Rombach-99] Dieter H. Rombach, “Engine for Applied Research and Technology Transfer in Software Engineering”, Proceedings of the 24th SEL workshop, GSFC, 1999
[SEL-00] “10 Most Popular Documents Collection”, CD-ROM from the Software Engineering Laboratory, NASA Goddard Space Flight Center, Greenbelt MD 20771, December 2000. Visit http://sel.gsfc.nasa.gov/
[Weinberg -71] Gerald M. Weinberg, “The Psychology of Computer Programming”, Van Nostrand Reinhold, 1971