👋 Email us at [email protected] with questions or feedback. We'd love to hear from you. If you have an account with RWX, view your support page for access to support via Slack.

Flaky Tests

Integration

For flaky test detection and quarantining, you'll need to integrate your test suites into Captain.

Detection

Captain automatically detects flaky tests by looking for builds which fail but then pass when retried.

Some other test analytics providers indicate flakiness by the overall failure rate of a test. However, we've found that approach to be less reliable as it mixes legitimate failures with failures due to flakiness. By detecting flakiness from retried builds, Captain produces fewer false positives.

You'll be able see a list of your flaky tests in Captain. The built-in issue tracker enables you to comment on flaky tests and resolve them once they're no longer flaky.

Captain Screenshot of Detecting Flaky Tests

Quarantining

We recommend quarantining flaky tests. Without quarantining, engineers have to manually click to retry their builds until getting lucky and having all of the flakes pass. Retrying builds also wastes compute time on CI infrastructure from re-running successful tests in addition to the flaky tests.

A common engineering practice is to skip flaky tests, or mark them as "pending" in some frameworks. This approach has a large downside in that it is effectively the same thing as deleting the test entirely. Allowing flaky tests to continue to run, but quarantining them, enables Captain to detect if a test changes from being flaky to permanently failing. This approach enables you to continue to capture the value of the test (ensuring that the underlying functionality under test isn't broken), while avoiding the cost of a flaky test (wasted engineering productivity and compute time).

You can quarantine tests using the "Quarantine" button in the UI.

Captain Screenshot of Quarantine Button

CLI Output

The CLI will indicate when failed tests have been quarantined.

Finished in 1 minute 8.59 seconds (files took 6.5 seconds to load)
1327 examples, 2 failures

Failed examples:

rspec ./spec/flaky_spec.rb:5 # Flaky is always flaky
rspec ./spec/initially_flaky_then_failing_spec.rb:5 # InitiallyFlakyThenFailing is initially flaky, then always failing

--------------------------------------------------------------------------------
----------------------------------- Captain ------------------------------------
--------------------------------------------------------------------------------

Found 1 test result file:
- Uploaded tmp/rspec.json

2 of 2 failures under quarantine:
- Flaky is always flaky
- InitiallyFlakyThenFailing is initially flaky, then always failing

Targeted Retries

For engineering teams that prefer to see flaky tests pass before a build is green, we're building a mechanism to run targeted retries. It's a big improvement over automatically retrying all failing tests, which can reduce flakiness, but isn't always sufficient, and wastes compute time on retrying legitimate failures. If you're interested in this functionality, please reach out.