Firefox 26, A Retrospective in Quality

[Edit: @ttaubert informed me the charts weren’t loading so I’ve uploaded new images instead of linking directly to my document]

The release of Firefox 27 is imminently upon us, next week will mark the end of life for Firefox 26. As such I thought it’d be a good time to look back on Firefox 26 from a Quality Assurance (QA) perspective. It’s kind of hard to measure the impact QA has on a per release basis and whether our strategies are working. Currently, the best data source we have to go on is statistics from Bugzilla. It may not be a foolproof but I don’t think that necessarily devalues the assumptions I’m about to make; particularly when put in the context of data going back to Firefox 5.

Before I begin, let me state that this data is not indicative of any one team’s successes or failures. In large part this data is limited to the scope of bugs flagged with various values of the status-firefox26 flag. This means that there are a large amount of bugs that are not being counted (not every bug has the flag accurately set or set at all) and the flag itself is not necessarily indicative of any one product. For example, this flag could be applied to Desktop, Android, Metro, or some combination with no way easy way to statistically separate the two. Of course one could with some more detailed Bugzilla querying and reporting, but I’ve personally not yet reached a point where I’m able or willing to dig that deep.

Unconfirmed Bugs

Unconfirmed bugs are an indication of our ability to deal with the steady flow of incoming bug reports. For the most part these bugs are reported from our users (trusted Bugzilla accounts are automatically bumped up to NEW). The goal is to be able to get down to 0 UNCONFIRMED bugs before we release a product.

What this data tells us is that we’ve held pretty steady over the months, in spite of the ever increasing volume, but things have slipped somewhat in Firefox 26. In raw terms, Firefox 26 was released with 412 of the 785 reported bugs being confirmed. The net result is a 52% confirmation rate of new bug reports.

However, if we look at these numbers in the historical context it tells us that we’ve slipped by 10% confirmation rate in Firefox 26 while the volume of incoming bugs has increased by 64%. A part of me sees this as acceptable but a large part of me sees a huge opportunity for improvement.

The lesson here is that we need to put more focus on ensuring day to day attention is paid to incoming bugs, particularly since many of them could end up being serious regressions.

Regressions

Regressions are those bugs which are a direct result of some other bug being resolved. Usually this is caused by an unforeseen consequence of a particular code change. Regressions are not always immediately known and can exist in hiding until a third-party (hardware, software, plug-in, website, etc) makes a change that exposes a Firefox regression. These bugs tend to be harder to investigate as we need to track down the fix which ultimately caused the regression. If we’re lucky the offending change was a recent one as we’ll have builds at our disposal. However, in rare cases there are regressions that go far enough back that we don’t have the builds needed to test. This makes it a much more involved process as we have to begin bisecting changesets and rebuilding Firefox.

The number of regressions speaks to a potential failure, something that was missed, not accounted for, or unable to be tested either by the engineer or by QA. In a perfect world a patch would be tested taking into account all potential edge cases. This is just not feasible in reality due to the time and resources it would take to cover all known edge cases; and that’s to say nothing of the unknown edge cases. But that’s how open source works: we release software we think is ready, users report issues, and we fix them as fast and as through as we reasonably can.
In the case of Firefox 26 we’ve seen a continued trend of a reduction in known regressions. I think this is due to QA taking a more focused approach to feature testing and bug verifications. Starting with Firefox 24 we brought on a third QA release driver (the person responsible for coordinating testing of and ultimately signing-off on a release) and shifted toward a more surgical approach to bug testing. In other words we are trying to spend more time doing deeper and exploratory testing of bug fixes which are more risky. We are also continuing to hone our processes and work closer with developers and release managers. I think these efforts are paying off.

The numbers certainly seem to support this theory. Firefox 26 saw a reduction of regressions by 20% over Firefox 25, 25% over Firefox 24, and 57% better than Firefox 17 (our worst release for regressions).

Stability

Stability bugs are a reflection of how frequently our users are encountering crashes. In bugzilla these are indicated using the crash keyword. The most serious crashes are given the topcrash designation. The more crash bugs we ship in a particular release does not necessarily translate for more crashes per user, but it is indicative of more scenarios under which a user may crash. Many of these bugs are considered a subset of the regression bugs discussed earlier as these mostly come about by changes we have made or that a third-party has exposed.
In Firefox 26 we again see a downward trend in the number of crash bugs known to affect the release. I believe this speaks to the success of educating more people in the skills necessary to participate in reviewing the data in crash-stats dashboard, converting that into bugs, and getting the necessary testing done so developers can fix the issues. Before Firefox 24 was on the train the desktop QA really only had one or two people doing day to day triage of the dashboards and crash bugs. Now we have four people looking at the data and escalating bug reports on a daily basis.

The numbers above indicate that Firefox 26 saw 12% less crash bugs that Firefox 25, 41% less than Firefox 24, and 59% less than Firefox 15, our most unstable release.

Reopened

Reopened bugs are those bugs which developers have landed a fix for but the issue was proven not to have been resolved in testing. These bugs are a little bit different than regressions in that the core functionality the patch was meant to address still remains at issue (resulting in the bug being reopened), whereas a regression is typically indicative of an unforeseen edge case or user experience (resulting in a new bug being filed to block the parent bug).

That said, a high volume of reopened bugs is not necessarily an indication of poor QA; in fact it could mean the exact opposite. You might expect there to be a higher volume of reopened bugs if QA was doing their due diligence and found many bugs needing follow up work. However, this could also be an indication of feature work (and by consequence a release) that is of higher risk to regression.

As you can see with Firefox 26 we’re still pretty high in terms of the number of reopened bugs. We’re only 5% better than Firefox 23, our worst release in terms of reopened bugs. I believe this to be a side-effect of us doing more focused feature testing as indicated earlier. However it could also be indicative of problems somewhere else in the chain. I think it warrants looking at more closely and is something I will be raising in future discussions with my peers.

Uplifts vs Landings

Landings are those changes which land on mozilla-central (our development branch) and ride the trains up to release following the 6-week cadence. Uplifts are those fixes which are deemed either important enough or low enough risk that it warrants releasing the fix earlier. In these cases the fix is landed on the mozilla-aurora and/or mozilla-beta branches. In discussions I had last year with my peers I raised concerns about the volume of uplifts happening and the lack of transparency in the selection criteria. Since then QA has been working closer with Release Management to address these concerns.

I don’t yet have a complete picture to compare Firefox 26 to a broad set of historical data (I’ve only plotted data back to Firefox 24 so far). However, I think the data I’ve collected so far shows that Firefox 26 seemed far more “controlled” than previous releases. For one, the volume of uplifts to Beta was 29% less than Firefox 25 and there were 42% less uplifts across the entire Firefox 26 cycle compared to Firefox 25. It was also good to see that uplifts trailed off significantly as we moved through the Aurora cycle into Beta.

However, this does show a recent history of crash-landings (landing a crash late in a cycle) around the Nightly -> Aurora merge. The problem there is that something that lands on the last day of a Nightly cycle does not get the benefit of Nightly user feedback, nor does it get much, if any, time for QA to verify the fix before it is uplifted. This is something else I believe needs to be mitigated in the future if we are to move to higher quality releases.

Tracked Bugs

The final metric I’d like to share with you today is purely to showcase the volume of bugs. In particular how much the volume of bugs we deal with on a release-by-release basis has grown significantly over time. Again, I preface these numbers with the fact that these are only those bugs which have a status flag set. There are likely thousands of bugs that are not counted here because they don’t have a status flag set. A part of the reason for that is because the process for tracking bugs is not always strictly followed; this is particularly true as we go farther back in history.

As you can see above, the trend of ever increasing bug volume continues. Firefox 26 saw a 21% increase in tracked bugs over Firefox 25, and a 183% increase since we started tracking this data.

Firefox 26 In Retrospective

So there we have it. I think Firefox 26 was an incremental success over it’s predecessors. In spite of an ever increasing volume of bugs to triage we shipped less regressions and less crashes than previous releases. This is not just a QA success but is also a success borne of other teams like development and release management. It speaks to the success of implementing smarter strategies and stronger collaboration. We still have a lot of room to improve, in particular focusing more on incoming bugs, dealing with crash landings in a responsible way, and investigating the root cause of bugs that are reopened.

If we continue to identify problems, have open discussions, and make small iterative changes, I’m confident that we’ll be able to look back on 2014 as a year of success.

I will be back in six weeks to report on Firefox 27.