Project

General

Profile

Actions

Bug #65388

open

The MDS_SLOW_REQUEST warning is flapping even though the slow requests don't go away

Added by Alexander Patrakov 24 days ago. Updated 18 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have caught a cluster in an unhealthy state - probably some MDS deadlock that results in requests being blocked (deadlocked?) for multiple hours. I would expect the MDS_SLOW_REQUEST warning to be constantly present until the slow requests are somehow unblocked or the bad clients are gone. However, it is flapping, that is, there are windows of HEALTH_OK that should not be there. These bogus HEALTH_OK windows (and not the deadlock itself) are the subject of this issue.

As an example, I attached a log file generated by this command:

while true ; do date ; ceph -s ; ceph tell mds.0 dump_ops_in_flight ; sleep 5 ; done | tee ceph-bug.log

Files

ceph-bug.log (36 KB) ceph-bug.log Alexander Patrakov, 04/09/2024 07:19 AM
Actions #1

Updated by Sebastian Wagner 23 days ago

  • Project changed from Ceph to CephFS
Actions #2

Updated by Venky Shankar 18 days ago

  • Category set to Correctness/Safety
  • Assignee set to Leonid Usov
  • Target version set to v20.0.0
  • Source set to Community (user)
Actions

Also available in: Atom PDF