Project

General

Profile

Feature #24854

mds: if MDS fails internal heartbeat, then debugging should be increased to diagnose what it's stuck doing

Added by Patrick Donnelly 5 months ago. Updated 2 months ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
Introspection/Control
Target version:
Start date:
07/10/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Reviewed:
Affected Versions:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

This should incrementally increase to 20 as the timeout reaches mds_beacon_grace.

History

#1 Updated by Patrick Donnelly 5 months ago

  • Status changed from New to In Progress

#2 Updated by Patrick Donnelly 4 months ago

  • Tracker changed from Bug to Feature
  • Status changed from In Progress to New
  • Assignee deleted (Patrick Donnelly)

#3 Updated by Stefan Kooman 2 months ago

We had "debug_mds=20" when the MDS suddenly started logging "heartbeat_map is_healthy 'MDSRank' had timed out after 15", "mds.beacon.mds2 _send skipping beacon, heartbeat map not healthy". So I'm not sure if just increasing debug level would help enough to catch the actual cause here. See: https://www.spinics.net/lists/ceph-users/msg48403.html

Also available in: Atom PDF