Flaky tests are:
- Tests passed and failed with the same code.
- Tests passed before but sometimes failed
Flaky tests caused problems like:
- Engineering time. It is very hard to fix. The person who fix is usually not the author.
- Very expensive to your team. They get in the way when you need to hotfix deploy something.
- People lost confidence of your test suite. They stopped/less willing to write tests.
- Mitigations of flaky tests are costly: qurantine, rerun flaky, retry few times, rerun your tests hourly, daily.
Flaky tests usually caused by:
- Testing for fixed Time and Timezone
- Concurrency — Things run together
- Wait — Assert state fo elements not showing up quick enough on UI.
- Untidy cleanup between tests. Order of Tests.
- Order of result from database is not the same as you think: not behaves the same locally and on CI
- 3rd-party library (Capybara)
- Checking datetime, time, elapsed time fields but precision is not enough
- Checking for ambiguous HTML elements (two
.btnon the page?)
Flaky tests usually tackled by:
- Automatic Retry
- Rerun til passed
- Having a team who fixed them
- Skip them and record somewhere
- Rerun regularly
- Bisect against test database state (seed) (minitest-bisect)
Test Flakiness – Methods for identifying and dealing with flaky tests
Spotify take on test flakiness.
Eradicating Non-Determinism in Tests by Martin Fowler
- Introduced the idea of quarantine
Flaky tests by GitLab
rspec-retry + quarantine
- The dashboard looks insightful to find what to fix.
- Practical tips on how to tackle
Broken windows theory when there is a window broken, more window will get broken.
- The majority of flaky tests are caused by asynchronous waits, concurrency and test order dependency.
- Most of the tests are flaky when they are written. 15% Became flaky at later.
- Google 1.6M test failures per day, 73K (4.5%) are flaky. Repeat 10 times before marking as flaky.
- 97% unit test failures in Apache are harmless (out of 21% are flaky)
- Developers restart 1.72% builds (961 builds of 56552). More mature/complex projects more prompted to restart builds. Those restarted builds are flaky, network issues, execution timeout. Flaky slows down developer workflow. Increase merge time from 16h to 48h.
- Large Tests
- Some tools are more flaky1
- 1.5% - 2% tests are flaky