Bug #57309: Unhandled exception in monitor when muting health alert - Ceph - Ceph

Actions

Copy link

Bug #57309

open

Unhandled exception in monitor when muting health alert

Added by Paul Mezzanini over 1 year ago. Updated 12 months ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Monitor

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We've discovered that when doing a ceph health mute, a malformed ttl can be passed in which will cause unhandled exceptions on the monitor. Easy way to test is:

ceph osd set norecover
ceph health mute OSDMAP_FLAGS --ttl=':'

My coworker did a quick test to see what he could come up with as a patch and this the untested solution.

diff --git a/src/mon/HealthMonitor.cc b/src/mon/HealthMonitor.cc
index 3adbdc3de59f..1f6345e3ab61 100644
--- a/src/mon/HealthMonitor.cc
++ b/src/mon/HealthMonitor.cc
@ -301,8 +301,10 @ bool HealthMonitor::prepare_command(MonOpRequestRef op)
string ttl_str;
utime_t ttl;
if (cmd_getval(cmdmap, "ttl", ttl_str)) {
- auto secs = parse_timespan(ttl_str);
- if (secs == 0s) {
auto secs = 0s;
+ try {
+ secs = parse_timespan(ttl_str);
+ } catch (const std::invalid_argument& e) {
r = -EINVAL;
ss << "not a valid duration: " << ttl_str;
goto out;

I lost my dev cluster so I can't test it without impact.

Actions

Copy link

Updated by Paul Mezzanini 12 months ago

This is still an issue that will crash a monitor.

I just tried to mute a scrub warning for one month and caused a monitor to bounce.

I ran:
ceph health mute PG_NOT_SCRUBBED 1M --sticky

hoping M was month since m is minute. It isn't.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #57309

Unhandled exception in monitor when muting health alert

Updated by Paul Mezzanini 12 months ago