automate identifying infrastructure issues in nightly runs
The feature to be added is to: identify most common error messages that are reported in the nightlies as a result of infrastructure issues.
This is by looking/searching for corresponding error messages in the database and reporting it as part of the nightly runs.
by automating this, we can save manual efforts and time in tracking those failures and filing individual tickets for them by different teams.
please add to this, your thoughts about best resolution to this.
#1 Updated by Zack Cerza almost 4 years ago
- Assignee deleted (
Typed this up a while back and forgot to hit submit:
I spent some time in the last couple days investigating some of the ways we could approach this:
- An approach similar to John Spray's
- Normalization and grouping of
failure_reasonvalues in paddles
Ultimately for now Sentry makes the most sense. We had only been using it for Sepia, but I also configured it for Octo.