Link Fabric events to Firebase: do more with your data

By Shobhit Chugh, Product Manager

Fabric-events-Firebase-integration-blog-banner.png

Most app teams collect data about their users to better serve them. But having data alone doesn’t automatically lead to insight. To be valuable, it needs to be processed, analyzed, and distilled into information that informs strategy and decision-making.

Answers, the analytics engine that powers Fabric, helps you understand your users by tracking events. Events are in-app actions (like login, social share, or purchase) that reveal how people interact with your app. This information is vital to learning how engaged users are, what actions they take most (and least), and how their behavior changes over time.

Today, we’re excited to give you more freedom to explore and examine your event data through a new Fabric events + Firebase integration. Now, all Fabric customers can unlock highly-requested analytics features in Firebase by creating and linking their Firebase account.
 

Fabric & Firebase are stronger together

Firebase is Google’s mobile platform that helps you build and grow high-quality apps without managing infrastructure. As we stated at Google I/O 2017, Fabric joined forces with Firebase because our two platforms are stronger together and offer complementary tools.

For instance, Fabric has always adhered to an opinionated philosophy and presented critical app data in a digestible way. However, we’ve heard from many of you that you want more flexibility. By teaming up with Firebase, we can build on the freedom and power that Firebase gives developers to do deeper analysis without compromising our opinionated approach.

This new Fabric events + Firebase integration allows you to use existing Fabric events with Firebase’s advanced marketing and analytics features to get more flexibility to organize, interpret, and act on your event data.
 

Build custom audiences to get deeper user insight

Although Fabric automatically groups your users into activity segments based on sessions, it’s interesting to group users by their behaviors too. With the Fabric events + Firebase integration, you can create custom audiences (a.k.a. user segments) in Firebase using your Fabric events and event attributes.

For example, you could create a “power users” audience to group people who have completed a key in-app action many times. For gaming apps, this group could consist of users who have who have completed at least 10 levels and made an in-app purchase.

With custom audiences, you can slice and dice your user data by in-app behavior to better understand engagement.


View historical data to unearth long-term trends

Fabric focuses on realtime events for the past 30 days, while Firebase gives you long-term visibility. Once you integrate Fabric events with Firebase, all event data that is collected from that point on will be accessible in Firebase. This extended view into your user behavior will help you uncover persisting trends.


Combine different data sources into one view with custom analysis

The Fabric events + Firebase integration also gives you access to your raw data (something you’ve wanted for a long time!) so you can perform more sophisticated and targeted analysis. Specifically, when you complete the integration and link your Firebase project to BigQuery, your Fabric events will flow into a BigQuery dataset. BigQuery is a petabyte-scale analytics data warehouse that you can use to run SQL-like queries over vast amounts of data.

For example, if you track user events in multiple places (such as Fabric, Google Analytics 360, custom analytics collected by a mobile backend service), you can import and aggregate them from all of these sources in BigQuery to see a complete picture. From there, you can also use data visualization tools, like Data Studio, to turn this aggregated, raw data into informative reports that are easy to read, share, and customize.

In addition, this Fabric events + Firebase integration gives you the ability to manage permissions on projects and datasets so you have control over who is able to share, view, and retrieve your data.

BigQuery


Turning data into action

The Fabric events + Firebase integration gives you enhanced flexibility and control over your data. But what do you do after you’ve dissected, analyzed, and extracted meaning from it? How can you turn these insights into action?

By using these insights to trigger smart marketing campaigns.

Firebase provides tools like Cloud Messaging and Remote Config that can send messages and alter your app in response to Fabric events. For example, ecommerce apps can send a push notification with a discount code to all users who have made an in-app purchase to encourage them to buy something else. Or, ecommerce apps can enable “one-click checkout” for this group.


Tap into the power of Firebase in 3 steps

You can start the integration process right from your dashboard. Simply click the Firebase icon in your Fabric dashboard in the left navigation bar.

Firebase in Fabric nav

The integration only takes a few minutes and can be done by a developer in three simple steps:

  1. Link apps*
    *If you don’t have an existing Firebase account, don’t worry. We’ll walk you through setting one up.
  2. Upgrade the SDKs
  3. Ship your app

For more details on how to implement this, check out our technical docs for iOS and Android. This integration can be enabled with minimal code changes and you won’t need to re-instrument your events.

Once you have completed the set up, your Fabric events (from that point on) will automatically flow to Firebase so you can build audiences, view historical trends, run custom analysis, and trigger personalized marketing campaigns. You’ll also continue to have access to this data in Fabric.

We’re pumped to unveil this integration, as it represents a big step towards our goal of bringing the best of Fabric and Firebase together so you have one place to build, understand, and grow your app. We can’t wait to hear what you think!

Migrating to Druid: how we improved the accuracy of our stability metrics

by Max Lord, Software Engineer

Stability metrics are one of the most critical parts of Crashlytics because they show you which issues are having the biggest impact on your apps. We know that you rely on this data to prioritize your time and make key decisions about what to fix, so our job is to ensure these metrics are as accurate as possible.  

In an effort to strengthen the reliability of these numbers, we spent the last few months overhauling the system that gathers and calculates the stability metrics that power Crashlytics. Now, all of our stability metrics are being served out of a system built on Druid. Since the migration has ended, we wanted to step back, reflect on how it went, and share some lessons and learnings with the rest of the engineering community.

Why migrate?

In the very early days of Crashlytics, we simply wrote every crash report we received to a Mongo database. Once we were processing thousands of crashes per second, that database couldn't keep up. We developed a bespoke system based on Apache Storm and Cassandra that served everyone well for the next few years. This system pre-computed all of the metrics that it would ever need to serve, which meant that end-user requests were always very fast. However, its primary disadvantage was that it was cumbersome for us to develop new features, such as new filtering dimensions. Additionally, we occasionally used sampling and estimation techniques to handle the flood of events from our larger customers, but these estimation techniques didn't always work perfectly for everyone.

We wanted to improve the accuracy of metrics for all of our customers, and introduce a richer set of features on our dashboard.  However, we were approaching the limits of what we could build with our current architecture.  Any solution we invented would be restricted to pre-computing metrics and subject to sampling and estimation. This was our cue to explore other options.

Discovering Druid

We learned that the analytics start-up MetaMarkets had found themselves in a similar position and the solution that they open-sourced, Druid, looked like a good fit for us as well. Druid belongs to the column-store family of OLAP databases, purpose-built to efficiently aggregate metrics from a large number of data points. Unlike most other analytics-oriented databases, Druid is optimized for very low latency queries. This characteristic makes it ideally suited for serving data to an exploratory, customer-facing dashboard.

We were doubtful that any column store could compete with the speed of serving pre-computed metrics from Cassandra, but our experimentation demonstrated that Druid's performance is phenomenal. After spending a bit of time tweaking our schema and cluster configuration, we were easily able to achieve latencies comparable to (and sometimes even better than!) our prior system.  We were satisfied that this technology would unlock an immense amount of flexibility and scale, so our next challenge was to swap it in without destabilizing the dashboard for our existing customers.

Migrating safely

As with all major migrations, we had to come up with a plan to keep the firehose of crash reports running while still serving up all of our existing dashboard requests. We didn’t want errors or discrepancies to impact our customers so we enlisted a tool by Github called Scientist. With Scientist, we were able to run all of the metrics requests that support our dashboard through Druid, issuing the exact same query to both the old system and the new system, and comparing the results.  We expected to see a few discrepancies, but we were excited to see that when there were differences, Druid generally produced more accurate results. This gave us the confidence that Druid would provide the functionality we needed, but we still needed to scale it up to support all of our dashboard traffic.  

To insulate our customers from a potential failure as we tuned it to support all of our traffic, we implemented a library called Trial.  This gave us an automatic fallback to the old system. After running this for a few weeks we were able to gradually scale up and cut over all of our traffic to the new system.

How we use Druid for Crashlytics

On busy days, Crashlytics can receive well over a billion crash reports from mobile devices all over the world. Our crash processing pipeline processes most crashes within seconds, and developers love that they can see those events on their dashboards in very close to real time.

To introduce a minimum of additional processing time, we make extensive use of Druid's real-time ingestion capabilities. Our pipeline publishes every processed crash event to a Kafka cluster that facilitates fanout to a number of other systems in Fabric that consume crash events. We use a Heron topology to stream events to Druid through a library called Tranquility. Part of the Druid cluster called the "indexing service" receives each event and can immediately service queries over that data. This path enables us to serve an accurate, minute by minute picture of events for each app for the last few hours.  

However, calculating metrics over a week or months of data requires a different approach. To accomplish this, Druid periodically moves data from its indexing service to another part of the cluster made up of "historical" nodes. Historical nodes store immutable chunks of highly compressed, indexed data called "segments" in Druid parlance and are optimized to service and cache queries against them. In our cluster, we move data to the historical nodes every six hours. Druid knows how to combine data from both types of nodes, so a query for a week of data may scan 27 of these segments plus the very latest one currently being built in the indexing service.

The results

Our Druid based system now allows us to ingest 100% of the events we receive, so we are happy to report that we are no longer sampling crash data from any of our customers.  The result is more accurate metrics that you can trust to triage stability issues, no matter how widely installed your app is.

While nothing is more important to us than working to ensure you have the most reliable information possible, we also strive to iterate and improve the Crashlytics experience. In addition to helping us improve accuracy, Druid has unlocked an unprecedented degree of flexibility and richness in what we can show you about the stability issues impacting your users. Since the migration, you may have noticed a steady stream of design tweaks, new features, and performance enhancements on our dashboard. For example, here are a few heavily-requested features that we’ve recently rolled out:  

  • You can now view issues across multiple versions of your app at the same time.
  • You can view individual issue metrics for any time range.
  • You can now filter your issues by device model and operating system.

This is just the beginning. We're looking forward to what else we can build to help developers ship stable apps to their customers.

P.S. We're building a mobile platform to help teams create bold new app experiences. Want to join us? Check out our open positions!

Get Crashlytics