Scaling Continuous Integration - Cutting Test Time by 77%

by Jeff Seibert, Co-founder


TL;DR - Integration testing is critical to rapid, reliable software engineering, but it also increasingly slows down development as more tests are added. By more accurately modeling production, we were able to decrease the run-time of our tests by 77%.Here at Crashlytics, we’re huge fans of Test-Driven Development (TDD), and specifically integration testing, to ensure our entire system is working properly. Over 3800 (and counting!) separate tests run every time we deploy code. We have a complex, queue-based architecture full of background processes across many servers and operating systems, and verifying that every piece works together is a challenging task.

To date, our codebase had been littered with test-specific conditionals that inlined all of our functionality, making it testable as a monolithic unit. While easy to get up and running, this has a number of obvious shortcomings: not only does it grow (cu-)cumbersome as the system expands, it completely fails to accurately model our production environment, increasing the risk that a bug might slip through.

To address these concerns, we recently began an initiative to more-accurately model production in our development and test environments, but quickly hit upon major stumbling blocks.


Since our primary web language is Ruby, we settled on Cucumber early-on for integration testing on our entire system. Here’s a brief snippet:

[pyg language="cucumber" style="monokai" linenos="inline" linenostart="1"] Given the following app exists: | organization | name | bundle_identifier | status     | | Name: Rovio   | Angry Birds | com.rovio.angrybirds | activated | And I upload the dSym "" for version "1.0.0" of "com.rovio.angrybirds" by "Rovio" And I upload the crash report "angry_birds-603982138.crash" for version "1.0.4" of "com.rovio.angrybirds" by "Rovio" Then the crashes count metric for "com.rovio.angrybirds" should be 1 [/pyg]

We’ll talk more about how we have optimized our crash processing pipeline in a future post, but for now, let’s just say that the core parts of the system are extremely fast. It’s also queue-based and completely non-blocking – work is farmed out to many daemons on many servers. This poses a challenge for Cucumber’s traditional, linear tests. As shown above, line 5 will return before anything is inserted into the database, causing step 6 to fail as the crash report is not yet present.

A simple solution might be to add “And I pause for 1 second” before line 6 in order to create a slight delay before reading the database. Unfortunately, this isn’t ideal – the server may be operating under a heavy load (due to other tests running) and not finish in time, or no load at all and finish almost immediately, making our 1 second delay a waste of time.

Proof of Concept: Queue-Based Blocking for Cucumber

To paint a clearer picture, let’s imagine that a crash goes through the following steps, each performed by a separate daemon:

  1. Uploader – accepts the crash data from the mobile device
  2. Pre-Processor – parses and authenticates the crash data, readies it for symbolication
  3. Symbolicator – determines line numbers for each stack frame
  4. Analyzer – applies our patent-pending algorithms to aggregate and prioritize crashes
  5. Recorder – updates our database and metrics backend

Looking at this, we noticed that if the Recorder could simply notify Cucumber that it had completed, and Cucumber was able to block on this notification, it would then proceed to the next step as soon as possible.

Fortunately, since our entire system is queue-based, we can attain this by having Cucumber block on a queue, like so:

[pyg language="ruby" style="monokai" linenos="inline" linenostart="1"] When /^I pause for daemons to complete$/ do j = Queue.reserve('test.completed', 10) j.should_not be_nil j.delete end [/pyg]

Line 2 allows us to block for up to 10 seconds on the 'test.completed' queue. If these ten seconds elapse, the Queue library raises a timeout exception and the test will fail. So how do we notify the queue?

At the end of the Recorder, all we need to do is enqueue a message:

[pyg language="ruby" style="monokai" linenos="inline" linenostart="1"] if Crashlytics.env.test? Queue.enqueue('test.completed', { :worker => 'recorder' }) end [/pyg]

DRY this Out

Crash processing is just one of our many backend pipelines – all told, we have over half a dozen distinct daemons assembled together in many chains. It would be a huge pain to add this test logic for every end point because in some pipelines a daemon is the termination of the chain, yet in other pipelines the process continues thereafter.

Let’s see if there is an opportunity for abstraction.

Conceptually, a daemon is an endpoint if it terminates before enqueuing the job elsewhere. Sounds simple, but how can we reliably detect if an enqueue happened?

Fortunately, all of our daemons inherit from the same base Worker class, which provides an enqueuing capability. Let’s enhance this to make it annotate the job as “incomplete” if a message is enqueued:

[pyg language="ruby" style="monokai" linenos="inline" linenostart="1"] def enqueue(tube, data) Queue.enqueue(tube, data) @job.incomplete = true end [/pyg]

Our daemons are also architected such that control is returned to the base Worker class after processing each job. Let’s check for this annotation and notify the ‘test.completed’ queue!

[pyg language="ruby" style="monokai" linenos="inline" linenostart="1"] if @job.complete? && Crashlytics.env.test? Queue.enqueue('test.completed', { :worker => }) end [/pyg]


This approach has worked shockingly well for us while cutting the execution time of our test suite by 77% (22 minutes to 5 minutes) since there’s no longer a need to delay arbitrary amounts of time. An unexpected side-benefit is that our tests now run even faster (and more inline with production timing characteristics) than they originally did as a monolithic, linear codebase because the work is now distributed across many servers.

There are certainly still weaknesses; however, this approach works well if every processing pipeline is point-to-point, with a single daemon that terminates the chain. What if it fans out, though? What if a daemon enqueues multiple messages, leading to 2 or more termination points? In a future post, we’ll explore solutions to that challenge. Stay tuned!

Interested in working on these and other cutting-edge challenges?  We’re hiring!  Give us a shout at You can stay up to date with all our progress on TwitterFacebook, and Google+.