Captain 1.10 Generally Available, Open Source Release

Captain 1.10 Generally Available, Open Source Release

Published on 

by Dan Manges

Last year, we began building a tool to solve common pain points that engineering teams encounter in their builds and with their test suites.

The most common theme we heard when talking to engineers is that flaky tests are a large cause of lost productivity. We also saw common inefficiencies with splitting test suites for parallel execution, and the resulting developer experience impacts of needing to search build logs for a comprehensive overview of test failures.

We built Captain to solve these problems. Captain is an open source CLI that can detect and quarantine flaky tests, automatically retry failed tests, partition files for parallel execution, generate comprehensive test failure summaries, and more. With an optional Cloud subscription, it's capabilities are enhanced to provide more analytics and easier configuration. It’s compatible with 15 testing frameworks.

Automated Retries

Captain can automatically retry failed tests, ensuring the failures are legitimate and not due to flakiness. The following config will:

  • retry all failed tests one time
  • retry tests known to be flaky three times
  • only execute retries if 1% or less of the entire test suite failed
example-suite:
  command: bundle exec rspec
  retries:
    attempts: 1
    flaky-attempts: 3
    max-tests: 1%
    command: bundle exec rspec {{ tests }}

The max-tests limit helps avoid wasting compute minutes when a large number of tests fail, indicating the problem is likely legitimate and not due to flakiness. With a Captain Cloud subscription, Captain will automatically detect which tests are flaky. However, flaky tests can also be managed in the OSS version of the CLI.

Test Quarantining

In some cases, a test may be so flaky that an engineering team needs to remove it from the build altogether. With Captain, engineers can quickly quarantine the test, rather than completely deleting or skipping it. Quarantining continues executing the test, but prevents it from failing the build. It’s one of the strategies that Google uses to handle flaky tests, and we’ve written about using this approach.

In addition to flaky test scenarios, quarantining can be a powerful tool for other team-impacting test failures, such as when date-based tests suddenly start failing.

With a Captain Cloud subscription, tests can be quarantined immediately from a web interface. In the OSS version of the CLI, a list of quarantined tests can be checked into the repository.

--------------------------------------------------------------------------------
----------------------------------- Captain ------------------------------------
--------------------------------------------------------------------------------

Found 1 test result file:
- Updated Captain with results from tmp/rspec.json

2 of 2 failures under quarantine:
- Flaky is always flaky
- InitiallyFlakyThenFailing is initially flaky, then always failing
Captain Cloud Flaky Test UI

Partitioning for Parallel Execution

Once test suites take longer than a few minutes to execute, engineering teams begin partitioning the tests for parallel execution. The best way to execute tests in parallel is to use ABQ, but if ABQ isn’t compatible with underlying test framework, Captain can partition test files based on historical timings to created balanced partitions which will execute in close to equal time.

With a Captain Cloud subscription, Captain will automatically track and update the timings, ensuring that the latest timings are always used for partitioning. With the OSS CLI, a timing file can be checked in to inform the partitioning.

Captain Cloud Test Partition UI

Failure Summaries

One downside of parallel execution is that it makes it harder for engineers to get a comprehensive overview of the tests that failed. Especially if test suites are partitioned into more than a few jobs, seeing which tests failed can require a lot of clicking and scrolling. Captain provides a way to produce an aggregate markdown summary, making the failures immediately obvious in GitHub Actions or other CI platforms.

Captain OSS Markdown Failure Summaries

Getting Started

The Captain CLI is easy to integrate into a build process running on any CI platform. You can use the OSS CLI, or use the Cloud subscription to remove the need to manually manage the configuration used to power Captain’s functionality. See the documentation on getting started with Captain.

We’re happy to help with any integrations. Say hello on Discord or reach out at [email protected]

Enjoyed this post? Share it!

Never miss an update

Get the latest releases and news about RWX and our ecosystem with our newsletter.