Bug #44677
closedstale scrub status entry from a failed mds shows up in `ceph status`
0%
Description
This happens intermittently. When an active mds (mds.b) is terminated, mds.c transitions to active, but task status shows both MDSs scrub status:
ceph -s ... ... services: mon: 1 daemons, quorum a (age 24s) mgr: x(active, since 18s) mds: a:1 {0=b=up:active} 2 up:standby osd: 3 osds: 3 up (since 18s), 3 in (since 2h) task status: scrub status: mds.b: idle mds.c: idle ... ...
ceph-mgr should ideally prune older entries after `mgr_service_beacon_grace` seconds, but that doesn't happen. The issue is that ceph-mgr receives an updated fsmap and removes entries from it's tracking index (`daemon_state`). However, `DaemonServer::_prune_pending_service_map()` requires the mds entry in the tracking index to prune stale entries from service map. So, those stale entries remain in the service map until ceph-mgr is restarted (or on a failover).
Updated by Venky Shankar about 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 34281
Updated by Kefu Chai about 4 years ago
- Backport changed from nautilus to nautilus,octopus
Updated by Greg Farnum about 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #45049: octopus: stale scrub status entry from a failed mds shows up in `ceph status` added
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #45050: nautilus: stale scrub status entry from a failed mds shows up in `ceph status` added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".