Bug #41728: mds: hang during fragmentdir - CephFS - Ceph

Actions

Copy link

Bug #41728

closed

mds: hang during fragmentdir

Added by Nathan Fish over 4 years ago. Updated about 4 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Zheng Yan

Category:

Correctness/Safety

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

nautilus,mimic

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v14.2.3

ceph-qa-suite:

Component(FS):

MDS, osdc

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

When doing a parallel cp, the active MDS on the CephFS hung on a fragmentdir op.
It might be this bug: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034785.html

I was able to repair the cluster by failing the active MDS, unmounting and remounting the client, then failing the new active MDS.
I increased "mds bal split rd" to 250000 and "mds bal split wr" to 100000 in the hopes this would reduce the number of fragmentdir ops,
then restarted the cp. The issue occurred again an hour or two in.

Active MDS ops:
ceph daemon mds.dc-3558-422-C ops
https://termbin.com/c3sm

ceph health detail
https://termbin.com/xx4k
All machines are Ubuntu 18.04 with hwe kernel 5.0, and Nautilus 14.2.3.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Patrick Donnelly over 4 years ago

Subject changed from MDS hangs during fragmentdir to mds: hang during fragmentdir
Priority changed from Normal to High
Target version set to v15.0.0
Start date deleted (~~09/09/2019~~)
Backport set to nautilus,mimic
Component(FS) MDS added

Actions

Copy link

Updated by Zheng Yan over 4 years ago

Component(FS) osdc added

Nathan Fish wrote:

When doing a parallel cp, the active MDS on the CephFS hung on a fragmentdir op.
It might be this bug: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034785.html

Above bug is fixed by https://github.com/ceph/ceph/pull/29902

Actions

Copy link