Using code coverage data to speed up continuous integration and reduce costs

One of the disadvantages of having a large monolith is the tendency to have even small changes take a long time to merge.  At Appfolio, like many other software providers, we are transitioning from a monolith to smaller consumable web services.  However, we have had considerable success in building and maintaining one of the largest monolithic ruby/rails applications that we are aware of and it is not going to disappear anytime soon.   In early 2019 we made it a goal to reduce the amount of time it takes for a developer working on our core monolith to:

  1. Clone the git repository

  2. Install dependencies

  3. Make a trivial change

  4. Start the development server

  5. Run a single test locally

  6. Push a new branch to git

  7. Wait for continuous integration (CI) to run all tests

We call this the “developer loop” and have found that the CI step takes much longer than any other step, initially 53 minutes and 77% of total time.  Given that CI takes most of the time we have invested in reducing our CI run time for the average branch over the last year and a half.   To this end, we have done many things including:

  • Transitioned from Team City to Circle CI

  • Implemented checksum based emulation of tests when the relevant code is unchanged

  • Built a profiler to identify which of our 100+ ci build steps (jobs) are on the CI critical path

  • Increased parallelism

  • Re-thought job dependencies

  • Restricted certain integration tests to release branches only

  • Run only tests related to the change set for a branch

While our success in reducing our developer loop time has come from a combination of all of these factors, the remainder of this article is about the ae_test_coverage gem we have created to collect code coverage data for use in test selection.

Overview

The purpose of this gem is to record per test method code coverage for every test run in the CI environment.  The output is a mapping from source file to test methods.  At Appfolio we use this mapping to select which tests to run based on which files have been modified on a branch.  For a pure ruby application, traditional code coverage using the built in Coverage module would likely be sufficient.  In a Rails web application, the Coverage module on its own is likely not enough to correctly select the super set of tests for a changeset due to extensive metaprogramming used in the Rails framework, non Ruby code such as Javascript, and Ruby code found in .erb template files that is not visible to the Coverage module.  The main contributions of this gem are hooks into the internals of Rails to get additional code coverage information that will give a better approximation of the true code coverage of a test including file types like erb templates, javascript, and stylesheets in addition to handling some of the more common metaprogrammed rails internals like ActiveRecord model attributes and associations that normal code coverage will not catch.

ERB Templates

Ideally, we would be able to collect line coverage data for .erb template files just like we can for ruby files.  Unfortunately, ruby’s built in Coverage module does not collect coverage data for .erb template files.  In a large web application like ours, we have a significant number of Selenium tests where the changes to .erb template files are relevant to the test outcome.  At first thought, it would seem like the lack of line coverage for .erb files would be a deal breaker for coverage based testing in a Rails application.  Fortunately, we can subscribe to ActiveSupport::Notification for !render_template.action_view to figure out exactly which .erb template files were rendered during the course of a test.

Assets (Javascript and Stylesheets)

Of course, javascript and css files are not ruby, therefore ruby’s Coverage module has little hope of determining whether or not their code is used during the course of a test.  For most unit tests, this is not a problem since the javascript and css are not actually evaluated on the server side.  However, in our significant number of selenium tests, changes to css and javascript files are important to what the browser renders and the user actually sees.  While perhaps less of a problem for coverage based test selection than .erb templates, tracking of the assets used during the course of a test will make test selection more reliable.  Similar to how we handled .erb templates, we hooked into the Rails internals to find out when javascript_include_tag or stylesheet_link_tag was used in the process of rendering a template.  This gives us the set of assets rendered during the course of a test.  For an application not using the Sprockets assets pipeline directives, that alone may be enough.   However, applications using the Sprockets asset pipeline will have used sprockets directives to modularize their javascript and css code.  Fortunately, Rails has a way for us to find the set of asset files that were collected into a single asset via sprockets directives and we use this to make sure that we have a complete code coverage mapping for javascript and stylesheet assets to the tests that actually make use of them.

Active Record Models

Consider the Active Record models below

class A < ActiveRecord::Base
    # Attributes:
    # :name
    has_many: :b

    def foo
        return "you called foo"
    end 
end

class B < ActiveRecord::Base
    belongs_to: :a
end

Now consider the following test class that instantiates and references an instance of class A.  

class ModelReferenceTest < Minitest::Test
    def test_coverage_registers_method_call
        a_instance = A.new(name: 'a')
        assert_equal 'you called foo', a_instance.foo
    end

    def test_coverage_registers_attribute_reference
        a_instance = A.new(name: 'a')
        assert_equal 'a', a_instance.name
    end

    def test_coverage_registers_associate_reference
        b_instance = B.new
        a_instance = A.new(name: 'a', b: b_instance)

        assert_equal b_instance, a.b
    end
end

In this simple case ruby’s built in Coverage module will correctly determine that the source file defining class A is used by the test test_coverage_registers_method_call.   However, the Coverage module would not include for source file for class A when test_coverage_registers_attribute_reference is run, because the source code that actually implements the lookup of the value of A.name from the database and the initializer for class A actually live somewhere in the implementation of ActiveRecord::Base.  A test that creates an instance of class A and refers to A.b will have a similar problem because the code that implements has_many is not actually in class A, only the declaration of the has_many relationship.  To handle this and other references to ActiveRecord model attributes and associations, we hook into ActiveRecord to record reads and writes of model attributes and associations. See details here.

Our experience thus far has shown this to be most useful for test selection during refactoring.  For example, if I were to remove the has_many association from class A to class B, then I would want to make sure that all tests that previously had referred to A.b were run when going through CI so that I could leverage CI to find test failures which would lead me to the code that needs to be fixed.

Webpacker Applications

One of the ways we at Appfolio have been trying to tame the growth of our Ruby monolith is to increasingly decouple the front end and the backend via use of API’s and thick javascript applications built in React.  This has led us to use webpacker as part of our asset pipeline which introduces the need to determine whether or not a set of selenium integration tests needs to be run when one of our React applications change.  The webpacker gem provides a javascript_packs_with_chunks_tag helper that is used in a similar way to javascript_include_tag.  However, unlike the javascript_include_tag, we can’t depend on Rails to give us the collection of all assets rolled up in the pack.   To accomplish this we leverage a glob pattern that is generated based on the value that is passed to javascript_packs_with_chunks_tag to account for all source files from the javascript application.  Admittedly, this casts a pretty wide net.  In our CI configuration, we will run all selenium/integration tests that render a link to the javascript app into a response.  However, this is far better than what we were doing before which was to run all selenium/integration tests anytime a webpacker javascript application changed.

Usage in CI

Collecting the code coverage data using ae_test_coverage is only part of the recipe to reduce CI run time.  The other parts are the automation of the collection of code coverage data and the selection of tests.  At this time, the code for this purpose is not part of ae_test_coverage as the code we have written for this purpose is fairly specific to our repository and circle ci workflow.  In this section I describe at a high level how we do it.

Each night, we schedule a run of our entire test suite on the latest master branch commit with ae_test_coverage enabled.  Each of the jobs instrumented with ae_test_coverage creates a per test JSON artifact that lists all of the source code that was used during the execution of the test. (All of this is included in the gem except for the part about scheduling it to run each night).   In our Circle CI config, we have a step that aggregates all of the individual per test code coverage files into a single compressed code coverage artifact which is stored in Circle CI as an artifact of the build.

At the end of the entire workflow, there is an extra job that depends on all other jobs.  If all other jobs have passed and this job runs, it uses the circle ci api to find all of the jobs that preceded it in the workflow and downloads the compressed code coverage artifact for each of the jobs where we are using this code coverage test selection technique.  In some cases we may have 50+ nodes running in parallel for a job and in these cases, we have produced a compressed coverage artifact for each parallel node in that job meaning 50+ downloads for that job alone.  Once all of the compressed per job node coverage artifacts have been downloaded, we decompress and aggregate them together into a reverse mapping where the keys in the map are the paths to source files and the values in the map are the set of test names that use that source file during the course of running the test.  This results in a quite large JSON artifact, ~500MB that we compress down to about 2MB and upload to S3.  This aggregate artifact is used by our CI workflow on every development branch to select relevant tests.

At the beginning of every feature branch’s run through our CI workflow, we download the above mentioned aggregate code coverage artifact for the whole test suite and store it in the circle ci workspace for use by all jobs that follow.  We also take a diff of the feature branch back to where it branched from master and record the set of file names changed on the branch.  The list of changed files is also stored to the Circle CI workspace.  Each subsequent job and parallel node of parallel jobs takes the intersection of the set of tests that would normally be run on that node and the unique set of tests that are found in the coverage data for the set of changed files.  This reduced set of tests is what is actually run during CI.  An artifact listing what files would be run without test selection and what files are actually run is stored for each job node so that it is easy to determine whether or not the test selection has or has not run a specific test file.

Since it is our goal to speed up the development process but still prevent any bugs that may slip through our test selection strategy, we still run all tests on our master branch for every merged pull request and every release candidate.

New Code and New Tests

Of course code is changing all the time and this is especially true in a monolith with 100+ engineers committing to it daily.  This brings up a few issues with coverage based test selection.  First, we don’t have code coverage data for new tests written on a feature branch so we can’t use the techniques described here to select them.  In this case, we have decided to simply always run all modified test files on a branch.   Another problem to consider is how fast stale coverage data begins to result in code coverage based test selection failures where broken tests get to master.   We don’t have great data on this, but anecdotal evidence, from an incident where code coverage data was not updated for nearly a month without anyone noticing, indicates that there was not a significant increase in broken tests getting to master even with a significant amount of new code committed during that time.

Conclusion

Using the technique described in this article, we are able to run the subset of our automated test suite that is most relevant to the changes made on a feature branch.   After implementing this technique for the portions of our test suite that take the most CI resources, we have been able to reduce the average CI cost of a development branch by 25%.  Measuring how much time this has saved the average developer in a highly parallelized CI build environment is more difficult.  We didn’t implement this strategy until after highly optimizing our CI workflow meaning we were already looking for smaller returns by the time we tried this.   What I can say is that before we implemented this technique our build timing looked like this:

  • 5% of builds took < 15 min

  • 27% < 20 min

  • 77% < 25 min

And after:

  • 5% of builds took < 15 min

  • 70% < 20 min

  • 95% < 25 min

In our case the CI cost savings may be more beneficial than the time savings, but your mileage may vary.