Support #38156
closedMDS Behind on trimming but using no CPU or disk IO.
0%
Description
I have a cluster with three nodes.
Mimir: MDS, MON, MGR
Fenrir, MDS, MON, MGR, 8 OSDs
Hoenir, MDS, MON, MGR, 8 OSDs
Ceph health detail tells me:
ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 1 MDSs behind on trimming; Reduced data availability: 86 pgs inactive; Degraded data redundancy: 360875/6923320 objects degraded (5.212%), 80 pgs degraded, 86 pgs undersized
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
mdsmimir(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 494 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsmimir(mds.0): 1 slow requests are blocked > 30 secs
MDS_TRIM 1 MDSs behind on trimming
mdsmimir(mds.0): Behind on trimming (1806/128) max_segments: 128, num_segments: 1806
PG_AVAILABILITY Reduced data availability: 86 pgs inactive
What's notable about this is that it was at 1806 segments to trim 12 hours ago, and when I restart the Mimir MDS process, another MDS quickly starts using a lot of CPU, gets up to 1806 segments to trim, and then goes down to no CPU usage.
Could something be stuck?
I have 86 inactive pgs, but none of my OSDs are offline, and since I created this cluster I have not had any drive failures.
My workload was to use rsync to transfer several TBs of data to the cluster via the ceph-fuse module.
Attached are the logs from each machine after a reboot and letting them run for a while.
What information can I provide that will help figure this out?
The contents of my ceph.conf file are as follows:
[global]
fsid = 07cb5105-68ea-4f1c-bace-a2be0baae5fa
cluster = ceph
ms bind ipv6 = true
public network = fda8:0941:2491:1699::/64
cluster network = fdd7:d94b:3c2e:b69f::/64
##
# For version 0.55 and beyond, you must explicitly enable
# or disable authentication with "auth" entries in [global].
##
auth client required = cephx
auth service required = cephx
auth cluster required = cephx
[mon]
mon initial members = hoenir fenrir mimir
mon host = hoenir fenrir mimir
mon addr = fda8:0941:2491:1699:75ec:3651:86c3:2e88 fda8:0941:2491:1699:0b45:a2e6:1383:2b98 fda8:0941:2491:1699:60fa:e622:8345:2162
[mon.hoenir]
host = hoenir
addr = fda8:0941:2491:1699:75ec:3651:86c3:2e88
[mon.fenrir]
host = fenrir
addr = fda8:0941:2491:1699:0b45:a2e6:1383:2b98
[mon.mimir]
host = mimir
addr = fda8:0941:2491:1699:60fa:e622:8345:2162
[osd]
osd pool default size = 1
osd pool default min size = 1
osd crush chooseleaf type = 0
[mds]
[mgr]
Files