Project

General

Profile

Bug #44677

stale scrub status entry from a failed mds shows up in `ceph status`

Added by Venky Shankar 2 months ago. Updated 20 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
nautilus,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature:

Description

This happens intermittently. When an active mds (mds.b) is terminated, mds.c transitions to active, but task status shows both MDSs scrub status:

ceph -s                                                                                 
...
...

  services:                                      
    mon: 1 daemons, quorum a (age 24s)                                                                                               
    mgr: x(active, since 18s)
    mds: a:1 {0=b=up:active} 2 up:standby
    osd: 3 osds: 3 up (since 18s), 3 in (since 2h)

  task status:                           
    scrub status:                                
        mds.b: idle             
        mds.c: idle                               
...
...

ceph-mgr should ideally prune older entries after `mgr_service_beacon_grace` seconds, but that doesn't happen. The issue is that ceph-mgr receives an updated fsmap and removes entries from it's tracking index (`daemon_state`). However, `DaemonServer::_prune_pending_service_map()` requires the mds entry in the tracking index to prune stale entries from service map. So, those stale entries remain in the service map until ceph-mgr is restarted (or on a failover).


Related issues

Copied to fs - Backport #45049: octopus: stale scrub status entry from a failed mds shows up in `ceph status` Resolved
Copied to fs - Backport #45050: nautilus: stale scrub status entry from a failed mds shows up in `ceph status` Resolved

History

#1 Updated by Venky Shankar 2 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 34281

#2 Updated by Kefu Chai 2 months ago

  • Backport changed from nautilus to nautilus,octopus

#3 Updated by Greg Farnum about 2 months ago

  • Status changed from Fix Under Review to Pending Backport

#4 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #45049: octopus: stale scrub status entry from a failed mds shows up in `ceph status` added

#5 Updated by Nathan Cutler about 2 months ago

  • Copied to Backport #45050: nautilus: stale scrub status entry from a failed mds shows up in `ceph status` added

#6 Updated by Nathan Cutler 20 days ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF