Project

General

Profile

Bug #41728

mds: hang during fragmentdir

Added by Nathan Fish about 1 year ago. Updated 6 months ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
nautilus,mimic
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS, osdc
Labels (FS):
Pull request ID:
Crash signature:

Description

When doing a parallel cp, the active MDS on the CephFS hung on a fragmentdir op.
It might be this bug: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034785.html

I was able to repair the cluster by failing the active MDS, unmounting and remounting the client, then failing the new active MDS.
I increased "mds bal split rd" to 250000 and "mds bal split wr" to 100000 in the hopes this would reduce the number of fragmentdir ops,
then restarted the cp. The issue occurred again an hour or two in.

Active MDS ops:
ceph daemon mds.dc-3558-422-C ops
https://termbin.com/c3sm

ceph health detail
https://termbin.com/xx4k
All machines are Ubuntu 18.04 with hwe kernel 5.0, and Nautilus 14.2.3.


Related issues

Related to fs - Bug #41434: mds: infinite loop in Locker::file_update_finish() Resolved

History

#1 Updated by Patrick Donnelly about 1 year ago

  • Subject changed from MDS hangs during fragmentdir to mds: hang during fragmentdir
  • Priority changed from Normal to High
  • Target version set to v15.0.0
  • Start date deleted (09/09/2019)
  • Backport set to nautilus,mimic
  • Component(FS) MDS added

#2 Updated by Zheng Yan about 1 year ago

  • Component(FS) osdc added

Nathan Fish wrote:

When doing a parallel cp, the active MDS on the CephFS hung on a fragmentdir op.
It might be this bug: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034785.html

Above bug is fixed by https://github.com/ceph/ceph/pull/29902

#3 Updated by Nathan Fish about 1 year ago

Thanks!

#4 Updated by Patrick Donnelly about 1 year ago

  • Related to Bug #41434: mds: infinite loop in Locker::file_update_finish() added

#5 Updated by Patrick Donnelly about 1 year ago

  • Status changed from New to Need More Info
  • Assignee set to Zheng Yan

#6 Updated by Patrick Donnelly 8 months ago

  • Target version deleted (v15.0.0)

#7 Updated by Zheng Yan 6 months ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF