Project

General

Profile

Actions

Bug #56529

closed

ceph-fs crashes on getfattr

Added by Frank Schilder almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Community (user)
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/GCZ3F3ONVA2YIR7DJNQJFG53Y4DWQABN/

We made a very weird observation on our ceph test cluster today. A simple getfattr with a misspelled attribute name sends the MDS cluster into a crash+restart loop. Something as simple as

getfattr -n ceph.dir.layout.po /mnt/cephfs

kills a ceph-fs completely. The problem can be resolved if one executes a "umount -f /mnt/cephfs" on the host where the getfattr was executed. The MDS daemons need a restart. One might also need to clear the OSD blacklist.

We observe this with a kernel client on 5.18.6-1.el7.elrepo.x86_64 (Centos 7) with mimic and I'm absolutely sure I have not seen this problem with mimic on earlier 5.9.X-kernel versions.

Continuation:

Also removing a data pool fails:

# setfattr -x ceph.dir.layout /mnt/cephfs
setfattr: /mnt/cephfs: Invalid argument

In addition, we observed that after

- creating a file and directory on the default data pool,
- setting the data pool on "/" to a secondary data pool fs-data

an "rm -rf /mnt/cephfs/*" followed by an "ls /mnt/cephfs" would hang and blocked ops warnings showed up in the log. Handling these vattribs seems seriously broken in newer kernel versions.

I attached relevant sections of the mds log and /var/log/messages. They should contain some restarts.


Files

mds-log.zip (41.4 KB) mds-log.zip Frank Schilder, 07/12/2022 10:58 AM
messages.zip (271 KB) messages.zip Frank Schilder, 07/12/2022 10:58 AM

Related issues 3 (0 open3 closed)

Related to CephFS - Bug #56522: Do not abort MDS on unknown messagesResolvedDhairya Parmar

Actions
Copied to CephFS - Backport #57239: pacific: ceph-fs crashes on getfattrResolvedXiubo LiActions
Copied to CephFS - Backport #57240: quincy: ceph-fs crashes on getfattrResolvedXiubo LiActions
Actions

Also available in: Atom PDF