Project

General

Profile

Bug #44677

stale scrub status entry from a failed mds shows up in `ceph status`

Added by Venky Shankar 13 days ago. Updated 2 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

This happens intermittently. When an active mds (mds.b) is terminated, mds.c transitions to active, but task status shows both MDSs scrub status:

ceph -s                                                                                 
...
...

  services:                                      
    mon: 1 daemons, quorum a (age 24s)                                                                                               
    mgr: x(active, since 18s)
    mds: a:1 {0=b=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 18s), 3 in (since 2h)

  task status:                           
    scrub status:                                
        mds.b: idle             
        mds.c: idle                               
...
...

ceph-mgr should ideally prune older entries after `mgr_service_beacon_grace` seconds, but that doesn't happen. The issue is that ceph-mgr receives an updated fsmap and removes entries from it's tracking index (`daemon_state`). However, `DaemonServer::_prune_pending_service_map()` requires the mds entry in the tracking index to prune stale entries from service map. So, those stale entries remain in the service map until ceph-mgr is restarted (or on a failover).

History

#1 Updated by Venky Shankar 2 days ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 34281

Also available in: Atom PDF