Crash Stacks

nhirata

Based on various discussions, it seems to me that I have to be a bit more descriptive on my TLC for crash reports.

So I’ve been involving myself with Crash Kill for Mobile of late. My recent crash report is based on looking at the top crashers in the current product, as well as the various other reports. There are some Socorro items that I end up tracking just so that I understand where the tool is at.

For each release such as ” https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/10.0a1/7 ” I take a look at the top 10 crash signatures. The first one for example : mozilla::plugins::PluginModuleParent::WriteExtraDataForMinidump
I look at the buildids and CPUs as well as the OS version to see what type of crash and what CPU. Sometimes there are only WinNT crashes. Then I look at the stacks to sometimes check to make sure they’re the same. More than likely this one would be all the same, an example of different crash stacks but different crash signatures is : https://crash-stats.mozilla.com/report/index/b547fc71-06fb-4adf-8eab-46f122111001 and https://crash-stats.mozilla.com/report/index/716c2ba9-e6b2-4e2b-880d-bb14e2111001. An example of different signatures but basically the same crash stack in libxul are : https://crash-stats.mozilla.com/report/index/1627c95c-6c00-4860-87cb-f167d2111002 , https://crash-stats.mozilla.com/report/index/347a35f4-9765-4125-9ae7-e5c592111002 , https://crash-stats.mozilla.com/report/index/347a35f4-9765-4125-9ae7-e5c592111002 . So in essense they are the same types of crash, there are some slight differences with the libc stuff because of the OS differences. With this trying to narrow down the skiplist helps, so that we get better traction on more accurate signatures in libxul / same types of crash stacks in libxul.

Some crash stacks are really hard to understand. Such as :

https://crash-stats.mozilla.com/report/index/2024e4bc-42f8-4328-97f8-1aab62110930

Why? because if you look at the source signature, it’s hard to tell where the spot in the code the crash stack is associated with. There’s where Ted’s Crash Symbol Sender comes into play. So please please please install this. This way we can associate the code with the crash stack in memory.

Then I take the time to associate the bugs with this, placing in various things such as the version, crash, mobile crash, etc. in the proper location (core or otherwise). So how does one read the stack?

Taking a look at Bug 684863, we look through down the the first libxul line. TOutputESSL::writeVariablePrecision… it’s actually a header. (_new.h:135 ) .h stands for a header file. So we look at the next line.
6 libxul.so TOutputGLSLBase::writeVariableType gfx/angle/src/compiler/OutputGLSLBase.cpp:123

Looking at the Source portion of this line, we see that the it’s in gfx (or graphics). So most likely it’s going to be a crasher in the Core Graphics and most likely it’s a graphic type crash. Generally you can tell what type of crash it is based on the code source.

This is most of what I do when I create the report. I create bugs based on the crash-stats that haven’t been seen, and try to track the crash bugs that have been already reported. This means shifting through crash bugs that have been reported without signatures as well.

When looking for crash repro steps, there’s a bit more to it as well. Shifting through crash bugs that have been reported without signatures sometimes have STRs and you can associate the crash signatured bugs with them. Some of them might have some resemblance to STRs.

To get into the more specifics of the crashes that have crash stacks, you have to trace back the source. Taking a look at bug 622992, you look at the source and the specific line:
138 ScopedDIR dir_closer(opendir(fd_dir)); After looking at the code, FD is in reference to file descriptor and one of the things that makes me wonder is can fd_dir be null some how? Anyways, you look back each step, and then when you reach the top of the procedure, you look at the next line down in the crash stack and work your way up the next procedure until you get a clue on how the crash may have gotten there. I can’t say that I totally understand how things work here, as I don’t really understand the full architecture of the code, but I can make guesses and conjectures.

Filed under: Uncategorized