Bug #21311: ceph perf dump should report standby MDSes - CephFS - Ceph

Actions

Copy link

Bug #21311

closed

ceph perf dump should report standby MDSes

Added by David Galloway over 6 years ago. Updated over 6 years ago.

Status:

Rejected

Priority:

High

Assignee:

John Spray

Category:

Administration/Usability

Target version:

% Done:

Source:

Development

Tags:

Backport:

luminous

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

MDS

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This was discovered when observing the cephmetrics dashboard monitoring the Sepia cluster.

        "num_mds_up": 1,
        "num_mds_in": 1,
        "num_mds_failed": 0,

But mds: cephfs-1/1/1 up {0=mira049=up:active}, 2 up:standby

I think it'd be beneficial to report standbys in perf output.

Actions

Copy link

Updated by Patrick Donnelly over 6 years ago

Project changed from Ceph to CephFS
Category set to Administration/Usability
Assignee set to Douglas Fuller
Priority changed from Normal to High
Source set to Development
Backport set to luminous
Component(FS) MDS added

Doug, please take this one.

Actions

Copy link

Updated by John Spray over 6 years ago

This is a collectd thing, which isn't to say that we shouldn't care, but... I'm not sure bugs against collectd really should be filed against cephfs?

Actions

Copy link

Updated by David Galloway over 6 years ago

John Spray wrote:

This is a collectd thing, which isn't to say that we shouldn't care, but... I'm not sure bugs against collectd really should be filed against cephfs?

collectd uses perf dump to gather data. My understanding is ceph has no metric indicating the number of standby MDSes. Is that incorrect and if so, what should we be running to collect that metric?

Actions

Copy link

Updated by John Spray over 6 years ago

So on closer inspection I see that as you say, for the existing stuff it is indeed using perf counters, but it doesn't follow that we should add perf counters that duplicate what's already available from the MDS map.

To put it another way: Ceph already exposes this information (in `ceph fs dump`), just not as a perf counter. collectd is capable of looking at things other than perf counters -- it already has calls for e.g. `df`, `osd pool stats` (looking at https://github.com/ceph/cephmetrics/blob/master/collectors/mon.py).

I'd actually be inclined to rip out some of those perf counters: they are summing over all filesystems, so not terribly informative. We should generally only be using perf counters for things that are really per-daemon, rather than squashing cluster information into them.

Actions

Copy link

Updated by Douglas Fuller over 6 years ago

Assignee changed from Douglas Fuller to John Spray

John, if you have strong opinions about ripping out perf counters, I'll send this one over to you. Feel free to send it back if you'd rather I look over them.

Actions

Copy link

Updated by John Spray over 6 years ago

Status changed from New to Rejected

OK, so I'm going to take the opinionated position that this is a WONTFIX as we have an existing interface that provides the information, and I've opened a PR (https://github.com/ceph/ceph/pull/17681) to remove the other perf counters in Mimic and beyond to avoid confusion.

Tickets can be re-opened as well as closed, so this does not preclude further discussion.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #21311

ceph perf dump should report standby MDSes

Updated by Patrick Donnelly over 6 years ago

Updated by John Spray over 6 years ago

Updated by David Galloway over 6 years ago

Updated by John Spray over 6 years ago

Updated by Douglas Fuller over 6 years ago

Updated by John Spray over 6 years ago