How Crashlytics symbolicates 1000 crashes/second

by Matt Massicotte, Software Engineer

 

One of the most complex and involved processes in the Crashlytics crash processing system is symbolication. The needs of our symbolication system have changed dramatically over the years. We now support NDK, and the requirements for correctness on iOS change on a regular basis. As the service has grown, our symbolication system has undergone significant architectural changes to improve performance and correctness. We thought it would be interesting to write something up on how the system works today.

First things first – let’s go over what symbolication actually is. Apple has a good breakdown of the process for their platform, but the general idea is similar for any compiled environment: memory addresses go in, and functions, files, and line numbers come out.

Symbolication is essential for understanding thread stack traces. Without at least filling in function names, it’s impossible to understand what a thread was doing at the time. And without that, meaningful analysis is impossible, whether by a human or an automated system. In fact, Crashlytics’ ability to organize crashes into groups typically relies heavily on function names. This makes symbolication a critical piece of our crash processing system, so let's take a closer look at how we do it.

 

It starts with debug information

Symbolication needs a few key pieces of information to do its work. First, we need an address to some executable code. Next, we need to know which binary that code came from. Finally, we need some way of mapping that address to the symbol names in that binary. This mapping comes from the debug information generated during compilation. On Apple platforms, this information is stored in a dSYM. For Android NDK builds, this info is embedded into the executable itself.

These mappings actually hold much more than needed just for symbolication, presenting some opportunities for optimization. They have everything required for a generalized symbolic debugger to step through and inspect your program, which may be a huge amount of information. On iOS, we have seen dSYMs greater than 1GB in size! This is a real opportunity for optimization, and we take advantage of this in two ways. First, we extract just the mapping info we need into a lightweight, platform-agnostic format. This results in a typical space-saving of 20x when compared to an iOS dSYM. The second optimization has to do with something called symbol mangling.

 

Dealing with mangled symbols

In addition to throwing away data we don't need, we also perform an operation called “demangling” upfront. Many languages, C++ and Swift in particular, encode extra data into symbol names. This makes them significantly harder for humans to read. For instance, the mangled symbol:

_TFC9SwiftTest11AppDelegate10myFunctionfS0_FGSqCSo7NSArray_T_

encodes the information needed by the compiler to describe the following code structure:

SwiftTest.AppDelegate.myFunction (SwiftTest.AppDelegate) -> (__ObjC.NSArray?) -> ()

For both C++ and Swift, we make use of the language's standard library to demangle symbols. While this has worked well for C++, the fast pace of language changes in Swift has proven more challenging to support.

We took an interesting approach to address this. We attempt to dynamically load the same Swift libraries that the developer used to build their code, and then use them to demangle their symbols on their machine before uploading anything to our server. This helps to keep the demangler in sync with the mangling the compiler actually performed. We still have work to do to stay on top of Swift demangling, but once its ABI stabilizes it will hopefully present much less of a problem.

 

Minimizing server-side I/O

At this point, we have lightweight, pre-demangled mapping files. Producing the same files for both iOS and NDK means our backend can work without worrying about a platform’s details or quirks. But, we still have another performance issue to overcome. The typical iOS app loads about 300 binaries during execution. Luckily, we only need the mappings for the active libraries in the threads, around 20 on average. But, even with only 20, and even with our optimized file format, the amount of I/O our backend system needs to do is still incredibly high. We need caching to keep up with the load.

The first level of cache we have in place is pretty straightforward. Each frame in a stack can be thought of as an address-library pair. If you are symbolicating the same address-library pair, the result will always be the same. There are an almost infinite number of these pairs, but in practice, a relatively small number of them dominate the workload. This kind of caching is highly efficient in our system – it has about a 75% hit rate. This means that only 25% of the frames we need to symbolicate actually require us finding a matching mapping and doing a lookup. That's good, but we went even further.

If you take all of the address-library pairs for an entire thread, you can produce a unique signature for the thread itself. If you match on this signature, not only can you cache all the symbolication information for the entire thread, but you can also cache any analysis work done later on. In our case, this cache is about 60% efficient. This is really awesome, because you can potentially save tons of work in many downstream subsystems. This affords us a great deal of flexibility for our stack trace analysis. Because our caching is so efficient, we can experiment with complex, slow implementations that would never be able to keep up with the full stream of crash events.

 

Keeping the symbols flowing

Of course, all of these systems have evolved over time. We started off using hosted Macs and dSYMs directly to symbolicate every frame they saw. After many scaling issues, the introduction of NDK support, and Swift, we've ended up in a pretty different place. Our system now makes use of Twitter's Heron stream-processing system to handle close to 1000 iOS/tvOS/macOS and Android NDK crashes per second. Our custom mapping file solution uses an order of magnitude less bandwidth from developer's machine than it once did. We have a cross-platform symbolication system. On-client demangling produces more correct results than ever, especially as Swift goes through rapid iterations.

We’ve come a long way, and we're always working to improve the process. A few months ago, we released a new tool for uploading debug symbols. Not only does it incorporate all of our dSYM translation optimizations, but is also really easy to use in scripts or automation.

If I had to pick one big take-away from this work, I would say always be willing to consider custom file formats. Translating dSYMs right on the developer’s machine allowed for a dramatic savings in size, much improved privacy of the app’s internal structures, and more correct results. For NDK, the same approach also made for a simpler developer workflow, and cross-platform compatibility. These are very substantial wins for our customers and for our operational costs and maintenance of our systems.

For those of you that would like to hear even more technical details about crash reporting on iOS, you can check out my Flight presentation. Let us know how you enjoyed this look at our symbolication internals, and if there are other areas you might want to learn more about.

We're building a mobile platform to help teams create bold new app experiences. Want to join us? Check out our open positions!