How Crashlytics symbolicates 1000 crashes/second

by Matt Massicotte, Software Engineer

One of the most complex and involved processes in the Crashlytics crash processing system is symbolication. The needs of our symbolication system have changed dramatically over the years. We now support NDK, and the requirements for correctness on iOS change on a regular basis. As the service has grown, our symbolication system has undergone significant architectural changes to improve performance and correctness. We thought it would be interesting to write something up on how the system works today.

First things first – let’s go over what symbolication actually is. Apple has a good breakdown of the process for their platform, but the general idea is similar for any compiled environment: memory addresses go in, and functions, files, and line numbers come out.

Symbolication is essential for understanding thread stack traces. Without at least filling in function names, it’s impossible to understand what a thread was doing at the time. And without that, meaningful analysis is impossible, whether by a human or an automated system. In fact, Crashlytics’ ability to organize crashes into groups typically relies heavily on function names. This makes symbolication a critical piece of our crash processing system, so let's take a closer look at how we do it.

It starts with debug information

Symbolication needs a few key pieces of information to do its work. First, we need an address to some executable code. Next, we need to know which binary that code came from. Finally, we need some way of mapping that address to the symbol names in that binary. This mapping comes from the debug information generated during compilation. On Apple platforms, this information is stored in a dSYM. For Android NDK builds, this info is embedded into the executable itself.

These mappings actually hold much more than needed just for symbolication, presenting some opportunities for optimization. They have everything required for a generalized symbolic debugger to step through and inspect your program, which may be a huge amount of information. On iOS, we have seen dSYMs greater than 1GB in size! This is a real opportunity for optimization, and we take advantage of this in two ways. First, we extract just the mapping info we need into a lightweight, platform-agnostic format. This results in a typical space-saving of 20x when compared to an iOS dSYM. The second optimization has to do with something called symbol mangling.

Dealing with mangled symbols

In addition to throwing away data we don't need, we also perform an operation called “demangling” upfront. Many languages, C++ and Swift in particular, encode extra data into symbol names. This makes them significantly harder for humans to read. For instance, the mangled symbol:

_TFC9SwiftTest11AppDelegate10myFunctionfS0_FGSqCSo7NSArray_T_

encodes the information needed by the compiler to describe the following code structure:

SwiftTest.AppDelegate.myFunction (SwiftTest.AppDelegate) -> (__ObjC.NSArray?) -> ()

For both C++ and Swift, we make use of the language's standard library to demangle symbols. While this has worked well for C++, the fast pace of language changes in Swift has proven more challenging to support.

We took an interesting approach to address this. We attempt to dynamically load the same Swift libraries that the developer used to build their code, and then use them to demangle their symbols on their machine before uploading anything to our server. This helps to keep the demangler in sync with the mangling the compiler actually performed. We still have work to do to stay on top of Swift demangling, but once its ABI stabilizes it will hopefully present much less of a problem.

Minimizing server-side I/O

At this point, we have lightweight, pre-demangled mapping files. Producing the same files for both iOS and NDK means our backend can work without worrying about a platform’s details or quirks. But, we still have another performance issue to overcome. The typical iOS app loads about 300 binaries during execution. Luckily, we only need the mappings for the active libraries in the threads, around 20 on average. But, even with only 20, and even with our optimized file format, the amount of I/O our backend system needs to do is still incredibly high. We need caching to keep up with the load.

The first level of cache we have in place is pretty straightforward. Each frame in a stack can be thought of as an address-library pair. If you are symbolicating the same address-library pair, the result will always be the same. There are an almost infinite number of these pairs, but in practice, a relatively small number of them dominate the workload. This kind of caching is highly efficient in our system – it has about a 75% hit rate. This means that only 25% of the frames we need to symbolicate actually require us finding a matching mapping and doing a lookup. That's good, but we went even further.

If you take all of the address-library pairs for an entire thread, you can produce a unique signature for the thread itself. If you match on this signature, not only can you cache all the symbolication information for the entire thread, but you can also cache any analysis work done later on. In our case, this cache is about 60% efficient. This is really awesome, because you can potentially save tons of work in many downstream subsystems. This affords us a great deal of flexibility for our stack trace analysis. Because our caching is so efficient, we can experiment with complex, slow implementations that would never be able to keep up with the full stream of crash events.

Keeping the symbols flowing

Of course, all of these systems have evolved over time. We started off using hosted Macs and dSYMs directly to symbolicate every frame they saw. After many scaling issues, the introduction of NDK support, and Swift, we've ended up in a pretty different place. Our system now makes use of Twitter's Heron stream-processing system to handle close to 1000 iOS/tvOS/macOS and Android NDK crashes per second. Our custom mapping file solution uses an order of magnitude less bandwidth from developer's machine than it once did. We have a cross-platform symbolication system. On-client demangling produces more correct results than ever, especially as Swift goes through rapid iterations.

We’ve come a long way, and we're always working to improve the process. A few months ago, we released a new tool for uploading debug symbols. Not only does it incorporate all of our dSYM translation optimizations, but is also really easy to use in scripts or automation.

If I had to pick one big take-away from this work, I would say always be willing to consider custom file formats. Translating dSYMs right on the developer’s machine allowed for a dramatic savings in size, much improved privacy of the app’s internal structures, and more correct results. For NDK, the same approach also made for a simpler developer workflow, and cross-platform compatibility. These are very substantial wins for our customers and for our operational costs and maintenance of our systems.

For those of you that would like to hear even more technical details about crash reporting on iOS, you can check out my Flight presentation. Let us know how you enjoyed this look at our symbolication internals, and if there are other areas you might want to learn more about.

Get Crashlytics

 

We're building a mobile platform to help teams create bold new app experiences. Want to join us? Check out our open positions!

Fabric August Update

By Annum Munir, Product Marketing Manager

Summer’s almost over but we’re not taking any breaks from shipping new features. In August, we focused on expanding your view into app stability, updated a few of our Android SDKs, and added new functionality to our mobile app. We also celebrated a major milestone for fastlane! Read on for more details:

Introducing OOM reporting: a new dimension to app quality

We extended our crash coverage to include out-of-memory (OOM) reporting on iOS. An OOM event is an unexpected app termination that occurs when a mobile device runs out of memory. However to your users, OOM events look just like crashes, which makes them detrimental to your stability and also extremely difficult to detect. This month, we used intelligent heuristics to bring OOM reporting to Crashlytics. Now, you can monitor your OOM-free sessions, immediately see when they become a problem, and get valuable direction on where to start your troubleshooting.

Don’t let OOMs disrupt your user experience! Learn how on the Crashlytics blog.

Launched Twitter Kit 2.0 and Digits 2.0 for Android

We upgraded our Twitter Kit and Digits SDKs for Android to keep them stable, predictable, and reliable (we already did this for Twitter Kit for iOS a few months ago). In version 2.0, we refined our libraries based on your feedback, updated major underlying dependencies, and also enhanced performance and aligned with modern Android tools. To get these latest versions, simply click the “Update” buttons within your Android IDE plugin - we’ll take care of the rest.

For more specifics, check out the Twitter Developer blog.

Fabric mobile app updates: Crashlytics-only mode & account switching

The Fabric mobile app helps you keep tabs on your app when you’re on the go. While our app gives you a wealth of real-time analytics data, if you want to focus solely on stability you can now switch to a “Crashlytics-only” mode. This way, you have visibility into all of your crashes even if you don’t have Answers enabled!

On top of the new mode, you can also easily switch between your Fabric accounts within our mobile app. In just a few clicks, you’ll be able to monitor all of your releases across all accounts — even if you’re away from your desk. 

Update the app today to get these upgrades:



fastlane surpasses 10,000 stars on GitHub

fastlane automates the tedious, repetitive tasks of mobile deployment so you can spend more time doing what you love: creating amazing user experiences. Over the past months, we’ve worked with our community to make fastlane even better. Today, we’re excited to share that fastlane has surpassed 10,000 stars on GitHub. In fact, fastlane now has more GitHub stars than the language it was written in. We’re humbled by your support and we can’t wait to keep moving fastlane forward!

Here’s our internal changelog:

Fabric

  • iOS

    • Added logging of a warning if Fabric +with: is incorrectly invoked multiple times

    • Improved beta support when Fabric is embedded in a dynamic library

  • Android

    • Fixed issue causing the Crashlytics privacy prompt to not be shown in rare cases

Crashlytics

  • iOS

    • Improved defensiveness when handling Custom Keys and Logs data

    • Added support for Answers 1.3.0

  • Android

    • Updated Crashlytics Core dependency

    • Improved crash reporting efficiency when handling stack overflow errors

Answers

  • iOS

    • CPU/networking are now reduced when in Low Power mode on iOS, or under thermal pressure on macOS

    • Adopted NSURLSession background uploads, making for much more efficient and reliable networking

    • Adopted NSBackgroundActivityScheduler, which results in improved background behavior on macOS

    • Improved compatibility for macOS apps that use Automatic Termination and Sudden Termination

    • Improved visibility of Answers background operations by adopting NSActivity APIs

    • Improved on-disk event storage, reducing I/O and CPU overhead

    • Fixed a bug that could cause Answers to send a report with no events

  • Android

    • Updated Crashlytics Core dependency

Digits

  • Android

    • Clarified external api by defining "internal" package

    • Clarified events generated by defining "events" package

    • Enabled unique user counts per custom attribute by updating the sample application's logger to use custom events

Twitter Kit

  • Android

    • Removed Digits dependency

    • Dropped support for API versions before API 14 (ICS)

    • Updated Twitter Core dependency

    • Removed previously deprecated methods and classes

    • Added contentDescription for media based on altText field

    • Migrated to Retrofit 2.0 and OkHttp 3.2

    • TwitterApiClient now automatically refreshes expired guest tokens

    • Removed all public reference to Application Authentication

    • Fixed issue parsing withheldInCountries field in User objec.

    • Added altText field to MediaEntity object

    • Added Quote Tweet to Tweet object

MoPub

Fabric July Update

By Brian Lynn, Product Marketing Manager

With summer in full swing, we turned up the heat this month by shipping two major releases to help you further engage and retain your users!

Quickly recognize and solve user retention problems

To build a successful mobile business, app development teams need to keep a close eye on user retention, which is crucial for their app’s growth. What’s the point of spending time and money to acquire new users if they churn the next day? That’s why in July, we released Answers activity segments to help you understand how engaged your current users are and how many are at risk of abandoning your app. More on the Answers blog.

Engage users with a seamless Vine viewing experience

We also added Vine support to Twitter Kit so you can easily engage users by bringing creative and quirky video content into your app. Now, Twitter Kit will automatically play a Vine that is embedded within a Tweet, expand these videos within the timeline, and seamlessly play them on loop within the video player. More on the Twitter Developer blog.

Easily build real-time apps with PubNub and Digits

Besides shipping major releases, we also co-hosted a webinar with our friends at PubNub to help developers accelerate real-time data delivery for their apps. During the webinar, their team showed you how to build a real-time mobile chat app in Android using Fabric and PubNub. And, Chris Oryschak, a Fabric product manager, demonstrated how you can easily and securely verify users using Digits without any cumbersome passwords or complex 2-factor authentication setups. Check out the webinar here or grab the slides!

Here’s our internal changelog:

Crashlytics

  • Android

    • Beta now works for apps using the v2 signature in the latest Android Gradle Plugin, on devices running Android N

    • Beta kit’s startup time is now even faster

    • Fixed a bug to prevent false negatives when determining whether an app was installed by Beta

    • Removed logging when the Beta by Crashlytics app cannot be found

Answers

  • iOS

    • Released activity segments feature

  • Android

    • Released activity segments feature

Digits

  • Android (bug fixes):

    • OSS gradle files breakages in v 1.11.0 to help customers continue using our OSS project as an example

    • Crash caused when digitsLoginFailure event was reported without countryCode

    • Users being unable to login when guest auth expires on the service but not on the client

    • Delete contacts throws exception in okhttp 2.3.1+

Twitter Kit

  • Android

    • Bump Digits and tweet-ui dependencies

    • Allow non-filtered search results for SearchTimeline

MoPub

Fabric June Update

By Brian Lynn, Product Marketing Manager

While the Copa America fever catches on this June and excitement looms around our office, we still hunkered down and shipped a ton of new upgrades on Fabric for you -- even if that meant missing a game or two! Here’s the low down:

Understand your phone verification conversion funnel

If you use Digits to onboard users, you may have been looking for more insights into your conversion funnel and more flexibility in testing the Digits flow. That’s why in June, we’re excited to release two major upgrades for Digits: an integration with Answers and the Digits sandbox. Now, you can easily track login events and even log specific user actions within the Digits flow. Also, you can now run tests without triggering any rate limits -- more on the Digits blog!

Powerful, real-time analytics on the go

Since we launched the first Fabric mobile app, we’ve been heads down building out more functionality to help you dive deeper into your data and understand how your apps are growing. This month, we released a major upgrade to the app: the ability to drill into your most impactful adoption and stability metrics, such as DAU, MAU, and retention, so you can stay on top of your new releases on the go. Check them out in the original announcement!

Grow your app with mobile deep linking

User acquisition data for mobile apps is often fragmented between different marketing channels. And it’s even harder to understand which organic channels are most effective in driving new installs. To solve this, we released our integration with Branch: a powerful, multi-channel deep linking and attribution tool for growing your app and effectively tracking the source of your most engaged users. See what you can do with Branch in the original announcement.

Build your game into a successful business

As few months ago, we released Fabric support for Unity to solve the common challenges game developers face. In June, we made it even easier for you to turn your game into a thriving business with a brand new native MoPub integration. We also gave you more control to customize the setup process with manual initialization and deferred SDK activation during onboarding. See more in the original announcement.

Create powerful actions with fastlane plugins

We love developing fastlane with you and want to empower you with more freedom to help mobile developers and strengthen your bond with them. That’s why we also released fastlane plugins in June – a new, faster way to create actions and connect directly with the fastlane community. More here!

 

Here’s our internal changelog:

Fabric Platform

  • Android

    • Added the name of the exception to the Answers Crash event

Crashlytics

  • Android

    • Wrote the exception name to Answers when sending a Crash event

    • Updated Crashlytics Core and Answers dependency

    • Updated Fabric Base dependency for Beta by Crashlytics

Answers

  • Android

    • Facilitated sending the exception name with Crash events

Digits

Twitter Kit

  • iOS

    • Add SFSafariViewController support for login

    • Fix bug when Tweet includes a newline character

    • Add methods for getting tweets from `TWTRTimelineViewController`

    • Support `extended_tweet` mode for Tweet objects

    • Fix non-module header issue with CocoaPods and Swift

  • Android

    • Updated Twitter Core Dependency

    • Fix Fake ID exploit

MoPub

 

Introducing fastlane plugins: A new way to create powerful actions

By Hemal Shah, Product Manager

Introducing fastlane plugins

The beauty of open source software is that innovation can come from anyone, anywhere, and at any time. Over the past year and a half, the fastlane community has embraced this opportunity to make fastlane even better. These community-contributed additions, including approximately 80% of all fastlane actions and spaceship, have already saved millions of precious developer hours!

We love developing fastlane with you and want to empower you with more freedom to help mobile developers and strengthen your bond with them. Today, we’re introducing fastlane plugins – a new, faster way to create actions and connect directly with the fastlane community.

 

A whole new way to move fastlane forward

fastlane is comprised of hundreds of actions that make app deployment a breeze. From helping you distribute your beta builds to posting notifications in Slack channels, the possibilities of what you can automate to save time are endless! And because fastlane is open, everyone has the power to build on top of it.

Up until now, new actions that were proposed by our passionate community were merged into the main repository. To use them, all fastlane customers would need to upgrade their gems to the latest version. We’re excited to announce we’ve made this process even smoother with the new fastlane plugins architecture.

Think of fastlane plugins like building blocks; they’re a new, modular way to create and distribute actions independently of fastlane itself. In other words, these plugins allow actions to be added faster because they aren’t bundled into the main fastlane repository. Everyone has the power to invent, share, and deploy new plugins in this new architecture without waiting for PRs to be approved and gems to be updated – they’re your fast pass to making a dent in the mobile development universe! And everyone also gets instant access to tons of new actions. So, whether you’re a plugin creator or consumer or both, fastlane plugins are a win-win for the entire fastlane family.

I’m pumped to see where developers take fastlane via plugins. This gives people the chance to quickly create actions useful to their workflow, and then easily share them with the world.


Complete ownership of your masterpiece

You’re the boss of your plugin. From concept (what cool actions can you dream of?) to coding (#shipit) and promotion (share it with the world!), you’ll have control of your plugin’s design and destiny. Once your plugin is live, you’ll get to hear feedback from other fastlane customers and interact with them directly. This is your golden opportunity to showcase your talent and collect some good karma by giving back to the community!

To start creating new plugins, simply type fastlane new_plugin in your terminal and fastlane will walk you through the whole set-up.

fastlane new_plugin

Once you’re done, fastlane will generate the code that is necessary to activate your plugin and get it ready to publish to the world! Plus, this code will be all set to run on CI for automatic build and testing.

 

Easily discover actions

Rest assured, new and existing plugins won’t be buried out of sight because fastlane can quickly discover external plugins created by community contributors.

To see the wealth of new actions available to you, type “fastlane search_plugins” in your terminal. If you have a specific problem or task in mind, just add a keyword to the end of this query (type fastlane search_plugins [keyword]) for more targeted search results.

fastlane search_plugin

We also regularly update this page on GitHub with a list of all fastlane plugins. Some of our favorite plugins include github_status (to check on the status of GitHub’s APIs), and upload_folder_to_s3 (to store assets and artifacts in S3 on AWS). There’s even a fastlane plugin that plays victory music called tunes!

 

Instant access to new plugins – no updates needed!

You can create, use, and update plugins independently of the fastlane release cycle. There’s no need to update fastlane first, these plugins will work seamlessly with your existing version!

To install new plugins, just type fastlane add_plugin [name]. Within seconds, fastlane will retrieve the necessary code and generate the configuration files so you can immediately add the new plugin action into your local project.  

fastlane add_plugin

 

Onward and upward to a brighter future

We built fastlane plugins to empower you to make a meaningful and immediate impact. Together, let’s make fastlane even better. We can’t wait to see the amazing new actions our community builds! And remember, we love seeing your work so Tweet us a link to your new plugin once it’s ready to rock.