Actions
Bug #41728
closedmds: hang during fragmentdir
% Done:
0%
Source:
Community (user)
Tags:
Backport:
nautilus,mimic
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When doing a parallel cp, the active MDS on the CephFS hung on a fragmentdir op.
It might be this bug: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034785.html
I was able to repair the cluster by failing the active MDS, unmounting and remounting the client, then failing the new active MDS.
I increased "mds bal split rd" to 250000 and "mds bal split wr" to 100000 in the hopes this would reduce the number of fragmentdir ops,
then restarted the cp. The issue occurred again an hour or two in.
Active MDS ops:
ceph daemon mds.dc-3558-422-C ops
https://termbin.com/c3sm
ceph health detail
https://termbin.com/xx4k
All machines are Ubuntu 18.04 with hwe kernel 5.0, and Nautilus 14.2.3.
Actions