Project

General

Profile

Bug #57686

general protection fault and CephFS kernel client hangs after MDS failover

Added by Andreas Teuchert 4 months ago. Updated 2 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

We have a four-node Ceph cluster (Ceph 17.2.1, Ubuntu 20.04, kernel 5.15.0-48-generic #54~20.04.1-Ubuntu), managed by cephadm that contains two CephFSs. On those nodes, two directories from one FS are mounted via the kernel client. There are eight active MDSs and four standby MDSs for the FS.

When one of those directories is mounted and an active MDS is failed, the following is logged in dmesg and afterwards accesses to the mounted directory just hang and the node has to be rebooted (reset actually, because the directory can no longer be unmounted):

Tue Sep 27 16:41:02 2022] ceph: mds2 reconnect start
[Tue Sep 27 16:41:02 2022] ceph: mds2 reconnect success
[Tue Sep 27 16:41:03 2022] libceph: osd69 up
[Tue Sep 27 16:41:03 2022] libceph: osd71 up
[Tue Sep 27 16:41:03 2022] libceph: osd75 up
[Tue Sep 27 16:41:03 2022] libceph: osd78 up
[Tue Sep 27 16:41:03 2022] libceph: osd3 up
[Tue Sep 27 16:41:03 2022] libceph: osd14 up
[Tue Sep 27 16:41:03 2022] libceph: osd41 up
[Tue Sep 27 16:41:03 2022] libceph: osd53 up
[Tue Sep 27 16:41:03 2022] libceph: osd62 up
[Tue Sep 27 16:41:03 2022] libceph: osd10 up
[Tue Sep 27 16:41:03 2022] libceph: osd38 up
[Tue Sep 27 16:41:03 2022] libceph: osd56 up
[Tue Sep 27 16:41:03 2022] libceph: osd30 up
[Tue Sep 27 16:41:03 2022] libceph: osd6 up
[Tue Sep 27 16:41:03 2022] libceph: osd34 up
[Tue Sep 27 16:41:03 2022] libceph: osd18 up
[Tue Sep 27 16:41:03 2022] libceph: osd44 up
[Tue Sep 27 16:41:03 2022] libceph: osd47 up
[Tue Sep 27 16:41:03 2022] libceph: osd59 up
[Tue Sep 27 16:41:03 2022] libceph: osd65 up
[Tue Sep 27 16:41:03 2022] libceph: osd22 up
[Tue Sep 27 16:41:03 2022] libceph: osd26 up
[Tue Sep 27 16:41:03 2022] libceph: osd50 up
[Tue Sep 27 16:41:03 2022] ceph: update_snap_trace error -5
[Tue Sep 27 16:41:04 2022] ceph: mds2 recovery completed
[Tue Sep 27 16:41:13 2022] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP NOPTI
[Tue Sep 27 16:41:13 2022] CPU: 6 PID: 4247 Comm: node_exporter Not tainted 5.15.0-48-generic #54~20.04.1-Ubuntu
[Tue Sep 27 16:41:13 2022] Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.2 10/17/2019
[Tue Sep 27 16:41:13 2022] RIP: 0010:ceph_get_snap_realm+0x5e/0x90 [ceph]
[Tue Sep 27 16:41:13 2022] Code: 89 e7 e8 55 72 c1 f8 b8 01 00 00 00 f0 0f c1 43 10 85 c0 75 2a 48 8b 8b 98 00 00 00 48 8b 93 a0 00 00 00 48 8d 83 98 00 00 00 <48> 89 51 08 48 89 0a 48 89 83 98 00 00 00 48 89 83 a0 00 00 00 4c
[Tue Sep 27 16:41:13 2022] RSP: 0018:ffffb462416a3c38 EFLAGS: 00010246
[Tue Sep 27 16:41:13 2022] RAX: ffff9b95774c4a98 RBX: ffff9b95774c4a00 RCX: dead000000000100
[Tue Sep 27 16:41:13 2022] RDX: dead000000000122 RSI: ffff9b95774c4a00 RDI: ffff9b9568cba914
[Tue Sep 27 16:41:13 2022] RBP: ffffb462416a3c48 R08: 0000000000000000 R09: 0000000000000000
[Tue Sep 27 16:41:13 2022] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9568cba914
[Tue Sep 27 16:41:13 2022] R13: ffffb462416a3dc0 R14: ffff9b9568cba800 R15: 0000000000000001
[Tue Sep 27 16:41:13 2022] FS:  000000c000e80090(0000) GS:ffff9bf240d80000(0000) knlGS:0000000000000000
[Tue Sep 27 16:41:13 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Sep 27 16:41:13 2022] CR2: 000055c6c3af55a0 CR3: 0000000122d5a004 CR4: 00000000007706e0
[Tue Sep 27 16:41:13 2022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue Sep 27 16:41:13 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue Sep 27 16:41:13 2022] PKRU: 55555554
[Tue Sep 27 16:41:13 2022] Call Trace:
[Tue Sep 27 16:41:13 2022]  <TASK>
[Tue Sep 27 16:41:13 2022]  get_quota_realm.part.0+0x37/0x170 [ceph]
[Tue Sep 27 16:41:13 2022]  ceph_quota_update_statfs+0x55/0x150 [ceph]
[Tue Sep 27 16:41:13 2022]  ? ceph_monc_do_statfs+0x131/0x170 [libceph]
[Tue Sep 27 16:41:13 2022]  ceph_statfs+0x163/0x180 [ceph]
[Tue Sep 27 16:41:13 2022]  statfs_by_dentry+0x6a/0xa0
[Tue Sep 27 16:41:13 2022]  vfs_statfs+0x1b/0xd0
[Tue Sep 27 16:41:13 2022]  user_statfs+0x5b/0xb0
[Tue Sep 27 16:41:13 2022]  __do_sys_statfs+0x28/0x60
[Tue Sep 27 16:41:13 2022]  __x64_sys_statfs+0x16/0x20
[Tue Sep 27 16:41:13 2022]  do_syscall_64+0x59/0xc0
[Tue Sep 27 16:41:13 2022]  ? exit_to_user_mode_prepare+0x3d/0x1c0
[Tue Sep 27 16:41:13 2022]  ? syscall_exit_to_user_mode+0x27/0x50
[Tue Sep 27 16:41:13 2022]  ? __x64_sys_statfs+0x16/0x20
[Tue Sep 27 16:41:13 2022]  ? do_syscall_64+0x69/0xc0
[Tue Sep 27 16:41:13 2022]  ? irqentry_exit_to_user_mode+0x9/0x20
[Tue Sep 27 16:41:13 2022]  ? irqentry_exit+0x1d/0x30
[Tue Sep 27 16:41:13 2022]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[Tue Sep 27 16:41:13 2022]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[Tue Sep 27 16:41:13 2022] RIP: 0033:0x4b25fb
[Tue Sep 27 16:41:13 2022] Code: ff e9 69 ff ff ff cc cc cc cc cc cc cc cc cc e8 7b bc f8 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[Tue Sep 27 16:41:13 2022] RSP: 002b:000000c00053d418 EFLAGS: 00000212 ORIG_RAX: 0000000000000089
[Tue Sep 27 16:41:13 2022] RAX: ffffffffffffffda RBX: 000000c000059800 RCX: 00000000004b25fb
[Tue Sep 27 16:41:13 2022] RDX: 0000000000000000 RSI: 000000c00053d570 RDI: 000000c00033ac60
[Tue Sep 27 16:41:13 2022] RBP: 000000c00053d478 R08: 0000000000ab8601 R09: 0000000000000001
[Tue Sep 27 16:41:13 2022] R10: 000000c00033ac60 R11: 0000000000000212 R12: ffffffffffffffff
[Tue Sep 27 16:41:13 2022] R13: 0000000000000064 R14: 0000000000000063 R15: 0000000000000100
[Tue Sep 27 16:41:13 2022]  </TASK>
[Tue Sep 27 16:41:13 2022] Modules linked in: binfmt_misc overlay tcp_diag udp_diag inet_diag ceph libceph fscache netfs 8021q garp mrp stp llc ip6t_REJECT nf_reject_ipv6 xt_hl ip6table_filter ip6_tables xt_LOG nf_log_syslog ipt_REJECT nf_reject_ipv4 xt_multiport xt_comment xt_state iptable_filter bpfilter intel_rapl_msr intel_rapl_common sch_fq_codel isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ast crct10dif_pclmul ghash_clmulni_intel drm_vram_helper drm_ttm_helper aesni_intel ttm crypto_simd drm_kms_helper cryptd cec rc_core rapl ipmi_ssif i2c_algo_bit irdma ice fb_sys_fops intel_cstate ib_uverbs syscopyarea mei_me ib_core sysfillrect ioatdma joydev sysimgblt input_leds mei intel_pch_thermal dca acpi_pad acpi_power_meter mac_hid acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler bonding tls xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 drm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov
[Tue Sep 27 16:41:13 2022]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear ses enclosure raid1 hid_generic usbhid hid mpt3sas crc32_pclmul i40e raid_class xhci_pci ahci i2c_i801 scsi_transport_sas i2c_smbus lpc_ich libahci xhci_pci_renesas wmi
[Tue Sep 27 16:41:13 2022] ---[ end trace e35e6df2c19e4080 ]---
[Tue Sep 27 16:41:13 2022] RIP: 0010:ceph_get_snap_realm+0x5e/0x90 [ceph]
[Tue Sep 27 16:41:13 2022] Code: 89 e7 e8 55 72 c1 f8 b8 01 00 00 00 f0 0f c1 43 10 85 c0 75 2a 48 8b 8b 98 00 00 00 48 8b 93 a0 00 00 00 48 8d 83 98 00 00 00 <48> 89 51 08 48 89 0a 48 89 83 98 00 00 00 48 89 83 a0 00 00 00 4c
[Tue Sep 27 16:41:13 2022] RSP: 0018:ffffb462416a3c38 EFLAGS: 00010246
[Tue Sep 27 16:41:13 2022] RAX: ffff9b95774c4a98 RBX: ffff9b95774c4a00 RCX: dead000000000100
[Tue Sep 27 16:41:13 2022] RDX: dead000000000122 RSI: ffff9b95774c4a00 RDI: ffff9b9568cba914
[Tue Sep 27 16:41:13 2022] RBP: ffffb462416a3c48 R08: 0000000000000000 R09: 0000000000000000
[Tue Sep 27 16:41:13 2022] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9568cba914
[Tue Sep 27 16:41:13 2022] R13: ffffb462416a3dc0 R14: ffff9b9568cba800 R15: 0000000000000001
[Tue Sep 27 16:41:13 2022] FS:  000000c000e80090(0000) GS:ffff9bf240d80000(0000) knlGS:0000000000000000
[Tue Sep 27 16:41:13 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Sep 27 16:41:13 2022] CR2: 000055c6c3af55a0 CR3: 0000000122d5a004 CR4: 00000000007706e0
[Tue Sep 27 16:41:13 2022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue Sep 27 16:41:13 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue Sep 27 16:41:13 2022] PKRU: 55555554
[Tue Sep 27 17:00:56 2022] ceph: mds0 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds1 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds2 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds3 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds4 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds5 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds6 caps stale
[Tue Sep 27 17:00:56 2022] ceph: mds7 caps stale
[Tue Sep 27 17:04:52 2022] ceph: mds0 hung
[Tue Sep 27 17:04:56 2022] ceph: mds1 hung
[Tue Sep 27 17:04:56 2022] ceph: mds2 hung
[Tue Sep 27 17:04:56 2022] ceph: mds3 hung
[Tue Sep 27 17:04:56 2022] ceph: mds4 hung
[Tue Sep 27 17:04:56 2022] ceph: mds5 hung
[Tue Sep 27 17:04:56 2022] ceph: mds6 hung
[Tue Sep 27 17:04:56 2022] ceph: mds7 hung

This only affects one of the two directories. When only the other directory is mounted, there is no such issue.

Both directories have a quota (max bytes, no max files) and there are snapshots in a common ancestor directory. Both directories contain a similar set of files.

I'm happy to provide any further information.


Subtasks

Bug #57703: unable to handle page fault for address and system lockup after MDS failoverDuplicate

Bug #57817: general protection fault and CephFS kernel client hangs after MDS failoverDuplicateXiubo Li

History

#1 Updated by Andreas Teuchert 4 months ago

When mounting the affected directory with "noquotadf", the issue will not happen.

#2 Updated by Andreas Teuchert 4 months ago

I believe that https://tracker.ceph.com/issues/57817 is another instance of this bug, but I wasn't sure so I opened a new report.

#3 Updated by Greg Farnum 3 months ago

  • Assignee set to Xiubo Li

#4 Updated by Xiubo Li 3 months ago

  • Status changed from New to In Progress

#5 Updated by Xiubo Li 3 months ago

  • Status changed from In Progress to Fix Under Review

The patchwork: https://patchwork.kernel.org/project/ceph-devel/patch/20221107071759.32000-1-xiubli@redhat.com/

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9bceed2ebda3..baf17df05107 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -849,10 +849,12 @@  int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
     if (realm_to_rebuild && p >= e)
         rebuild_snap_realms(realm_to_rebuild, &dirty_realms);

-    if (!first_realm)
+    if (!first_realm) {
         first_realm = realm;
-    else
+        realm = NULL;
+    } else {
         ceph_put_snap_realm(mdsc, realm);
+    }

     if (p < e)
         goto more;

#6 Updated by Greg Farnum 2 months ago

Xiubo, I know the patch addresses the immediate fault, but do we have any idea how or why the snaprealm was corrupt in a way that triggered this issue?

I think we need to identify that before closing this bug. Since it's triggered by MDS shutdown, it seems that either the MDS or the kernel client messengers are doing something that corrupts the snaprealm data in transit, but...I can't imagine how.

#7 Updated by Xiubo Li 2 months ago

Greg Farnum wrote:

Xiubo, I know the patch addresses the immediate fault, but do we have any idea how or why the snaprealm was corrupt in a way that triggered this issue?

I think we need to identify that before closing this bug. Since it's triggered by MDS shutdown, it seems that either the MDS or the kernel client messengers are doing something that corrupts the snaprealm data in transit, but...I can't imagine how.

There is already one BZ is trackering this https://bugzilla.redhat.com/show_bug.cgi?id=2130925#c61, so I didn't dig it further.

I can debug it if needed.

Also available in: Atom PDF