Gradle recently experienced a build cache node
crash that was impacting build times. In this
post, the Gradle team explored the information
Gradle Enterprise Trends & Insights–
combined with the wealth of data from local and
CI builds provided by Build Scans– generated
before, during, and after the build cache node
crash. This data made it easy to spot that the
slow build performance problem was with the
infrastructure.
If only CI builds data existed, the cause of
the problem wouldn’t be obvious to anyone
running a local build. An administrator would
have no idea that anything was wrong until
another CI build was run. Even if a team knew
something was wrong, without Gradle Enterprise,
it would be much more difficult for an
administrator to figure out the problem.
This post provides an interesting blow-by-blow
account of how the team used its own tooling to
understand an unobvious problem so that it could
be addressed more efficiently.