More realistic endurance test results

If you’re not already familiar with the Firefox endurance tests, these are Mozmill tests that repeat a small snippet of user interaction over and over again while gathering metrics. This allows us to detect if there’s a memory leak in an very localised area, or if there’s a memory regression within the areas tested. I’ve blogged about them a few times.

We’ve known for a while that the results we’ve been getting aren’t entirely realistic, and this is due to the fact that we only wait for 0.1 seconds between each iteration. This doesn’t give Firefox any time to perform tasks such as garbage collection. Unfortunately we couldn’t just increase this delay as that would cause other Mozmill tests to be queued behind the much longer running endurance tests.

So now that we have our new VMWare ESX cluster in place (which has given us an awesome three VMs per platform) we’ve configured Jenkins to run endurance tests on just one node per platform. This allows other Mozmill tests to continue on the remaining available nodes. We were then finally able to increase the delay to 5 seconds.

The results are as we had hoped. The memory usage has dropped, and the duration has increased. Also, the individual testrun results became a lot less erratic. This can be seen in the following charts:

It should now be much easier for us to spot regressions, and hopefully we’ll have less false positives! If you’re interested in the latest endurance results, you can find them in our Mozmill Dashboard, along with the endurance charts.

Related bugs/issues:

Bug 788531 – Revise default delay for endurance test to make scenarios more realistic
Issue 173 – Have dedicated nodes for endurance tests
Issue 201 – Revise default delay for all endurance jobs
Issue 203 – Increase build timeout for endurance tests