Project

General

Profile

Actions

Support #51593

open

MDSs trimming stuck

Added by Zachary Ulissi almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

We're running a rook-ceph cluster that has gotten stuck in "1 MDSs behind on trimming" and the number is not decreasing at all. All drives are NVMe and load is low.

It seems like this is probably a bug of some sort or a misbehaving client. The usual suggestions of restart the MDS, increase mds_log_max_segments, etc don't seem to make a difference.

  • 1 filesystem, three active MDS servers each with standby
  • Quite a few files (20M objects), daily snapshots. This might be a problem?
  • Ceph pacific 16.2.4 via rook
  • All but one client are linux 5.8, one is 5.4
  • `ceph health detail` doesn't provide much help (see below)
  • num_segments is very slowly increasing over time
  • Restarting all of the MDSs returns to the same point.
  • moderate CPU usage for each MDS server (~30% for the stuck one, ~80% of a core for the others)
  • logs for the stuck MDS looks clean, it hits rejoin_joint_start then standard 'updating MDS map to version XXX" messages
  • `ceph daemon mds.x ops` shows no active ops on each of the MDS servers
  • `mds_log_max_segments` is set to 128, setting to a higher number causes the warning to go away, but the filesystem remains degraded, and setting it back to 128 shows num_segments has not changed.
  • I've tried playing around with other MDS settings based on various posts on this list and elsewhere, to no avail
  • `cephfs-journal-tool journal inspect` for each rank says journal integrity is fine.

Something similar happened last week and (probably by accident by removing/adding nodes?) I got the MDSs to start recovering and the filesystem went back to healthy.

Other things that seem relevant:

No data to display

Actions

Also available in: Atom PDF