Bug #57309
openUnhandled exception in monitor when muting health alert
0%
Description
We've discovered that when doing a ceph health mute, a malformed ttl can be passed in which will cause unhandled exceptions on the monitor. Easy way to test is:
ceph osd set norecover
ceph health mute OSDMAP_FLAGS --ttl=':'
My coworker did a quick test to see what he could come up with as a patch and this the untested solution.
diff --git a/src/mon/HealthMonitor.cc b/src/mon/HealthMonitor.cc
index 3adbdc3de59f..1f6345e3ab61 100644
--- a/src/mon/HealthMonitor.cc
++ b/src/mon/HealthMonitor.cc@ -301,8 +301,10
@ bool HealthMonitor::prepare_command(MonOpRequestRef op)
string ttl_str;
utime_t ttl;
if (cmd_getval(cmdmap, "ttl", ttl_str)) {
- auto secs = parse_timespan(ttl_str);
- if (secs == 0s) {
auto secs = 0s;
+ try {
+ secs = parse_timespan(ttl_str);
+ } catch (const std::invalid_argument& e) {
r = -EINVAL;
ss << "not a valid duration: " << ttl_str;
goto out;
I lost my dev cluster so I can't test it without impact.
Updated by Paul Mezzanini 12 months ago
This is still an issue that will crash a monitor.
I just tried to mute a scrub warning for one month and caused a monitor to bounce.
I ran:
ceph health mute PG_NOT_SCRUBBED 1M --sticky
hoping M was month since m is minute. It isn't.