Project

General

Profile

Actions

Bug #61200

closed

ceph: corrupt snap message from mds1

Added by Xiubo Li 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

100%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

From ceph user mail list: https://www.spinics.net/lists/ceph-users/msg77106.html

The kclient received a corrupted snap trace and was marked as readonly:

[Wed May 10 14:04:54 2023] libceph: loaded (mon/osd proto 15/24)
[Wed May 10 14:04:54 2023] ceph: loaded (mds proto 32)
[Wed May 10 14:04:54 2023] libceph: mon2 (1)192.168.32.67:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon0 (1)192.168.32.65:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon0 (1)192.168.32.65:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon1 (1)192.168.32.66:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon0 (1)192.168.32.65:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon1 (1)192.168.32.66:6789 session established
[Wed May 10 14:04:54 2023] libceph: client210370165 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210356766 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210295802 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210370170 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: mon2 (1)192.168.32.67:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon2 (1)192.168.32.67:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon0 (1)192.168.32.65:6789 session established
[Wed May 10 14:04:54 2023] libceph: mon2 (1)192.168.32.67:6789 session established
[Wed May 10 14:04:54 2023] libceph: client210370160 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210356771 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210295807 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210295812 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210370175 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:54 2023] libceph: client210295817 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 14:04:56 2023] ISO 9660 Extensions: Microsoft Joliet Level 3
[Wed May 10 14:04:56 2023] ISO 9660 Extensions: RRIP_1991A
[Wed May 10 14:05:12 2023] WARNING: CPU: 3 PID: 34 at fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
[Wed May 10 14:05:12 2023] Modules linked in: ceph libceph dns_resolver nls_utf8 isofs cirrus drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt intel_rapl_msr intel_rapl_common fb_sys_fops virtio_net iTCO_wdt net_failover iTCO_vendor_support drm pcspkr failover virtio_balloon joydev lpc_ich i2c_i801 nfsd nfs_acl lockd grace auth_rpcgss sunrpc xfs libcrc32c sr_mod cdrom sg crct10dif_pclmul ahci crc32_pclmul libahci crc32c_intel libata ghash_clmulni_intel serio_raw virtio_blk virtio_console virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
[Wed May 10 14:05:12 2023] CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 4.18.0-486.el8.x86_64 #1
[Wed May 10 14:05:12 2023] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[Wed May 10 14:05:12 2023] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Wed May 10 14:05:12 2023] RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
[Wed May 10 14:05:12 2023] Code: c0 48 c7 c7 c0 39 74 c0 e8 6c 7c 5d c7 0f 0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 c9 fc ff ff <0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 00
[Wed May 10 14:05:12 2023] RSP: 0018:ffffa4c980d87b48 EFLAGS: 00010203
[Wed May 10 14:05:12 2023] RAX: 0000000000000000 RBX: 0000000000000005 RCX: dead000000000200
[Wed May 10 14:05:12 2023] RDX: ffff8cfc56dbf7d0 RSI: ffff8cfc56dbf7d0 RDI: ffff8cfc56dbf7c8
[Wed May 10 14:05:12 2023] RBP: ffff8cfc45503970 R08: ffff8cfc56dbf7d0 R09: 0000000000000001
[Wed May 10 14:05:12 2023] R10: ffff8cfc42658780 R11: 00000000ffff8ce0 R12: 0000000000000155
[Wed May 10 14:05:12 2023] R13: ffff8cfc42658780 R14: ffff8cfc42658788 R15: 0000000000000001
[Wed May 10 14:05:12 2023] FS:  0000000000000000(0000) GS:ffff8cfdb7d80000(0000) knlGS:0000000000000000
[Wed May 10 14:05:12 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed May 10 14:05:12 2023] CR2: 00007f7b7b013570 CR3: 0000000104cf4000 CR4: 00000000003506e0
[Wed May 10 14:05:12 2023] Call Trace:
[Wed May 10 14:05:12 2023]  ceph_handle_caps+0xdf2/0x1780 [ceph]
[Wed May 10 14:05:12 2023]  mds_dispatch+0x13a/0x670 [ceph]
[Wed May 10 14:05:12 2023]  ceph_con_process_message+0x79/0x140 [libceph]
[Wed May 10 14:05:12 2023]  ? calc_signature+0xdf/0x110 [libceph]
[Wed May 10 14:05:12 2023]  ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
[Wed May 10 14:05:12 2023]  ? available_idle_cpu+0x41/0x50
[Wed May 10 14:05:12 2023]  ceph_con_workfn+0x329/0x680 [libceph]
[Wed May 10 14:05:12 2023]  process_one_work+0x1a7/0x360
[Wed May 10 14:05:12 2023]  worker_thread+0x30/0x390
[Wed May 10 14:05:12 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 14:05:12 2023]  kthread+0x134/0x150
[Wed May 10 14:05:12 2023]  ? set_kthread_struct+0x50/0x50
[Wed May 10 14:05:12 2023]  ret_from_fork+0x35/0x40
[Wed May 10 14:05:12 2023] ---[ end trace 03a1d82065fdafbd ]---
[Wed May 10 14:05:56 2023] ISO 9660 Extensions: Microsoft Joliet Level 3
[Wed May 10 14:05:56 2023] ISO 9660 Extensions: RRIP_1991A
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:24 2023] ceph: mds1 reconnect start
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:01:25 2023] ceph: mds1 reconnect success
[Wed May 10 16:02:12 2023] ceph: mds1 caps stale
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:02:19 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:06 2023] ceph: update_snap_trace error -5
[Wed May 10 16:03:06 2023] ceph: ceph_update_snap_trace failed to blocklist (3)192.168.48.142:0: -13
[Wed May 10 16:03:06 2023] ------------[ cut here ]------------
[Wed May 10 16:03:06 2023] ceph_update_snap_trace: do remount to continue after corrupted snaptrace is fixed
[Wed May 10 16:03:06 2023] WARNING: CPU: 0 PID: 5143 at fs/ceph/snap.c:841 ceph_update_snap_trace.cold.21+0x68/0x137 [ceph]
[Wed May 10 16:03:06 2023] Modules linked in: ceph libceph dns_resolver nls_utf8 isofs cirrus drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt intel_rapl_msr intel_rapl_common fb_sys_fops virtio_net iTCO_wdt net_failover iTCO_vendor_support drm pcspkr failover virtio_balloon joydev lpc_ich i2c_i801 nfsd nfs_acl lockd grace auth_rpcgss sunrpc xfs libcrc32c sr_mod cdrom sg crct10dif_pclmul ahci crc32_pclmul libahci crc32c_intel libata ghash_clmulni_intel serio_raw virtio_blk virtio_console virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
[Wed May 10 16:03:06 2023] CPU: 0 PID: 5143 Comm: kworker/0:0 Tainted: G        W        --------- -  - 4.18.0-486.el8.x86_64 #1
[Wed May 10 16:03:06 2023] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[Wed May 10 16:03:06 2023] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Wed May 10 16:03:06 2023] RIP: 0010:ceph_update_snap_trace.cold.21+0x68/0x137 [ceph]
[Wed May 10 16:03:06 2023] Code: c6 74 c0 48 c7 c5 10 5e 74 c0 48 89 c8 49 89 e8 48 89 c2 44 89 0c 24 48 c7 c6 60 bf 73 c0 48 c7 c7 48 61 74 c0 e8 08 e7 5b c7 <0f> 0b 44 8b 0c 24 e9 6c e8 fe ff 44 89 ce 48 c7 c7 60 5e 74 c0 44
[Wed May 10 16:03:06 2023] RSP: 0018:ffffa4c981fbbbb8 EFLAGS: 00010286
[Wed May 10 16:03:06 2023] RAX: 0000000000000000 RBX: ffff8cfc4649a089 RCX: 0000000000000027
[Wed May 10 16:03:06 2023] RDX: 0000000000000027 RSI: 00000000ffff7fff RDI: ffff8cfdb7c1e690
[Wed May 10 16:03:06 2023] RBP: ffffffffc0745e10 R08: 0000000000000000 R09: c0000000ffff7fff
[Wed May 10 16:03:06 2023] R10: 0000000000000001 R11: ffffa4c981fbb9d0 R12: ffff8cfc623c6b48
[Wed May 10 16:03:06 2023] R13: 0000000000000000 R14: ffff8cfc623c6b18 R15: ffff8cfb65855f00
[Wed May 10 16:03:06 2023] FS:  0000000000000000(0000) GS:ffff8cfdb7c00000(0000) knlGS:0000000000000000
[Wed May 10 16:03:06 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed May 10 16:03:06 2023] CR2: 00007fedace4d008 CR3: 000000025f67c000 CR4: 00000000003506f0
[Wed May 10 16:03:06 2023] Call Trace:
[Wed May 10 16:03:06 2023]  ceph_handle_snap+0x188/0x4e0 [ceph]
[Wed May 10 16:03:06 2023]  mds_dispatch+0x17a/0x670 [ceph]
[Wed May 10 16:03:06 2023]  ceph_con_process_message+0x79/0x140 [libceph]
[Wed May 10 16:03:06 2023]  ? calc_signature+0xdf/0x110 [libceph]
[Wed May 10 16:03:06 2023]  ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
[Wed May 10 16:03:06 2023]  ? available_idle_cpu+0x41/0x50
[Wed May 10 16:03:06 2023]  ceph_con_workfn+0x329/0x680 [libceph]
[Wed May 10 16:03:06 2023]  process_one_work+0x1a7/0x360
[Wed May 10 16:03:06 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:03:06 2023]  worker_thread+0x30/0x390
[Wed May 10 16:03:06 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:03:06 2023]  kthread+0x134/0x150
[Wed May 10 16:03:06 2023]  ? set_kthread_struct+0x50/0x50
[Wed May 10 16:03:06 2023]  ret_from_fork+0x35/0x40
[Wed May 10 16:03:06 2023] ---[ end trace 03a1d82065fdafbe ]---
[Wed May 10 16:03:06 2023] ceph: corrupt snap message from mds1
[Wed May 10 16:03:06 2023] header: 00000000: 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023] header: 00000010: 12 03 7f 00 01 00 00 01 00 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023] header: 00000020: 00 00 00 00 02 01 00 00 00 00 00 00 00 01 00 00  ................
[Wed May 10 16:03:06 2023] header: 00000030: 00 98 0d 60 93                                   ...`.
[Wed May 10 16:03:06 2023]  front: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 00000010: 0c 00 00 00 88 00 00 00 d1 c0 71 38 00 01 00 00  ..........q8....
[Wed May 10 16:03:06 2023]  front: 00000020: 22 c8 71 38 00 01 00 00 d7 c7 71 38 00 01 00 00  ".q8......q8....
[Wed May 10 16:03:06 2023]  front: 00000030: d9 c7 71 38 00 01 00 00 d4 c7 71 38 00 01 00 00  ..q8......q8....
[Wed May 10 16:03:06 2023]  front: 00000040: f1 c0 71 38 00 01 00 00 d4 c0 71 38 00 01 00 00  ..q8......q8....
[Wed May 10 16:03:06 2023]  front: 00000050: 20 c8 71 38 00 01 00 00 1d c8 71 38 00 01 00 00   .q8......q8....
[Wed May 10 16:03:06 2023]  front: 00000060: ec c0 71 38 00 01 00 00 d6 c0 71 38 00 01 00 00  ..q8......q8....
[Wed May 10 16:03:06 2023]  front: 00000070: ef c0 71 38 00 01 00 00 6a 11 2d 1a 00 01 00 00  ..q8....j.-.....
[Wed May 10 16:03:06 2023]  front: 00000080: 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 00000090: ee 01 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000a0: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000b0: 01 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000c0: 01 00 00 00 00 00 00 00 02 09 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000d0: 05 00 00 00 00 00 00 00 01 09 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000e0: ff 08 00 00 00 00 00 00 fd 08 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023]  front: 000000f0: fb 08 00 00 00 00 00 00 f9 08 00 00 00 00 00 00  ................
[Wed May 10 16:03:06 2023] footer: 00000000: ca 39 06 07 00 00 00 00 00 00 00 00 42 06 63 61  .9..........B.ca
[Wed May 10 16:03:06 2023] footer: 00000010: 7b 4b 5d 2d 05                                   {K]-.
[Wed May 10 16:03:06 2023] ceph: ceph_do_invalidate_pages: inode 1001a2d116a.fffffffffffffffe is shut down
[Wed May 10 16:03:07 2023] WARNING: CPU: 3 PID: 5069 at fs/ceph/mds_client.c:4623 check_session_state+0x67/0x70 [ceph]
[Wed May 10 16:03:07 2023] Modules linked in: ceph libceph dns_resolver nls_utf8 isofs cirrus drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt intel_rapl_msr intel_rapl_common fb_sys_fops virtio_net iTCO_wdt net_failover iTCO_vendor_support drm pcspkr failover virtio_balloon joydev lpc_ich i2c_i801 nfsd nfs_acl lockd grace auth_rpcgss sunrpc xfs libcrc32c sr_mod cdrom sg crct10dif_pclmul ahci crc32_pclmul libahci crc32c_intel libata ghash_clmulni_intel serio_raw virtio_blk virtio_console virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
[Wed May 10 16:03:07 2023] CPU: 3 PID: 5069 Comm: kworker/3:1 Tainted: G        W        --------- -  - 4.18.0-486.el8.x86_64 #1
[Wed May 10 16:03:07 2023] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[Wed May 10 16:03:07 2023] Workqueue: events delayed_work [ceph]
[Wed May 10 16:03:07 2023] RIP: 0010:check_session_state+0x67/0x70 [ceph]
[Wed May 10 16:03:07 2023] Code: 3f 36 ed c8 48 39 d0 0f 88 5a 6a 00 00 89 d8 5b e9 6e ea 2c c8 48 83 7f 10 00 74 f1 48 8b 07 48 8b 00 8b 40 28 83 f8 04 74 e3 <0f> 0b eb df 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 89 f6 41
[Wed May 10 16:03:07 2023] RSP: 0018:ffffa4c9844b7e48 EFLAGS: 00010202
[Wed May 10 16:03:07 2023] RAX: 0000000000000006 RBX: 0000000000000000 RCX: 0000000000000007
[Wed May 10 16:03:07 2023] RDX: 0000000000000005 RSI: ffff8cfc469e07cc RDI: ffff8cfc469e0000
[Wed May 10 16:03:07 2023] RBP: ffff8cfc469e0000 R08: 0000000000000000 R09: 000073746e657665
[Wed May 10 16:03:07 2023] R10: 8080808080808080 R11: ffffffff8965c148 R12: ffff8cfc491d5130
[Wed May 10 16:03:07 2023] R13: 0000000000000000 R14: ffff8cfc491d5008 R15: ffff8cfc491d5000
[Wed May 10 16:03:07 2023] FS:  0000000000000000(0000) GS:ffff8cfdb7d80000(0000) knlGS:0000000000000000
[Wed May 10 16:03:07 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed May 10 16:03:07 2023] CR2: 00007fedace4d008 CR3: 00000001d4506000 CR4: 00000000003506e0
[Wed May 10 16:03:07 2023] Call Trace:
[Wed May 10 16:03:07 2023]  delayed_work+0x143/0x240 [ceph]
[Wed May 10 16:03:07 2023]  process_one_work+0x1a7/0x360
[Wed May 10 16:03:07 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:03:07 2023]  worker_thread+0x30/0x390
[Wed May 10 16:03:07 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:03:07 2023]  kthread+0x134/0x150
[Wed May 10 16:03:07 2023]  ? set_kthread_struct+0x50/0x50
[Wed May 10 16:03:07 2023]  ret_from_fork+0x35/0x40
[Wed May 10 16:03:07 2023] ---[ end trace 03a1d82065fdafbf ]---
[Wed May 10 16:03:09 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:09 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:09 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:09 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:09 2023] ceph: mds1 caps renewed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:03:09 2023] ceph: mds1 recovery completed
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 20011a825dd.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 200118b9384.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 1001c664b26.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 1001c665c34.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 1001c66a2b6.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 2000e6d3a15.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 100219a7adf.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 10023fb4137.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 10024484920.fffffffffffffffe is shut down
[Wed May 10 16:04:10 2023] ceph: ceph_do_invalidate_pages: inode 2000e6d3a4b.fffffffffffffffe is shut down
[Wed May 10 16:04:12 2023] libceph: mds1 (1)192.168.32.74:6801 socket closed (con state OPEN)
[Wed May 10 16:04:12 2023] libceph: mds1 (1)192.168.32.74:6801 socket closed (con state OPEN)
[Wed May 10 16:04:12 2023] libceph: mds1 (1)192.168.32.74:6801 socket closed (con state OPEN)
[Wed May 10 16:04:12 2023] libceph: mds1 (1)192.168.32.74:6801 socket closed (con state OPEN)
[Wed May 10 16:04:12 2023] libceph: mds1 (1)192.168.32.74:6801 socket closed (con state OPEN)
[Wed May 10 16:04:13 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:13 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:13 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:13 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:13 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:13 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:13 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:13 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:13 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:13 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:14 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:14 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:14 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:14 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:14 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:14 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:14 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:14 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:14 2023] libceph: wrong peer, want (1)192.168.32.74:6801/3586644765, got (1)192.168.32.74:6801/3778319964
[Wed May 10 16:04:14 2023] libceph: mds1 (1)192.168.32.74:6801 wrong peer at address
[Wed May 10 16:04:20 2023] ceph: mds1 reconnect start
[Wed May 10 16:04:20 2023] ceph: mds1 reconnect start
[Wed May 10 16:04:20 2023] ceph: mds1 reconnect start
[Wed May 10 16:04:20 2023] ceph: mds1 reconnect start
[Wed May 10 16:04:20 2023] ceph: mds1 reconnect start
[Wed May 10 16:04:23 2023] ceph: mds1 reconnect success
[Wed May 10 16:04:23 2023] ceph: mds1 reconnect success
[Wed May 10 16:04:23 2023] ceph: mds1 reconnect success
[Wed May 10 16:04:23 2023] ceph: mds1 reconnect success
[Wed May 10 16:04:23 2023] ceph: mds1 reconnect success
[Wed May 10 16:06:14 2023] ceph: mds1 caps stale
[Wed May 10 16:06:14 2023] ceph: mds1 caps stale
[Wed May 10 16:06:14 2023] ceph: mds1 caps stale
[Wed May 10 16:06:14 2023] ceph: mds1 caps stale
[Wed May 10 16:06:14 2023] ceph: mds1 caps stale
[Wed May 10 16:06:45 2023] ceph: mds1 caps renewed
[Wed May 10 16:06:45 2023] ceph: mds1 caps renewed
[Wed May 10 16:06:45 2023] ceph: mds1 caps renewed
[Wed May 10 16:06:45 2023] ceph: mds1 caps renewed
[Wed May 10 16:06:45 2023] ceph: mds1 caps renewed
[Wed May 10 16:06:45 2023] ceph: mds1 recovery completed
[Wed May 10 16:06:45 2023] ceph: mds1 recovery completed
[Wed May 10 16:06:45 2023] ceph: mds1 recovery completed
[Wed May 10 16:06:45 2023] ceph: mds1 recovery completed
[Wed May 10 16:06:45 2023] ceph: mds1 recovery completed
[Wed May 10 16:22:24 2023] WARNING: CPU: 1 PID: 5392 at fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph]
[Wed May 10 16:22:24 2023] Modules linked in: ceph libceph dns_resolver nls_utf8 isofs cirrus drm_shmem_helper drm_kms_helper syscopyarea sysfillrect sysimgblt intel_rapl_msr intel_rapl_common fb_sys_fops virtio_net iTCO_wdt net_failover iTCO_vendor_support drm pcspkr failover virtio_balloon joydev lpc_ich i2c_i801 nfsd nfs_acl lockd grace auth_rpcgss sunrpc xfs libcrc32c sr_mod cdrom sg crct10dif_pclmul ahci crc32_pclmul libahci crc32c_intel libata ghash_clmulni_intel serio_raw virtio_blk virtio_console virtio_scsi dm_mirror dm_region_hash dm_log dm_mod fuse
[Wed May 10 16:22:24 2023] CPU: 1 PID: 5392 Comm: kworker/1:0 Tainted: G        W        --------- -  - 4.18.0-486.el8.x86_64 #1
[Wed May 10 16:22:24 2023] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module_el8.7.0+3346+68867adb 04/01/2014
[Wed May 10 16:22:24 2023] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Wed May 10 16:22:24 2023] RIP: 0010:ceph_add_cap+0x53e/0x550 [ceph]
[Wed May 10 16:22:24 2023] Code: c0 48 c7 c7 c0 39 74 c0 e8 6c 7c 5d c7 0f 0b 44 89 7c 24 04 e9 7e fc ff ff 44 8b 7c 24 04 e9 68 fe ff ff 0f 0b e9 c9 fc ff ff <0f> 0b e9 0a fe ff ff 0f 0b e9 12 fe ff ff 0f 0b 66 90 0f 1f 44 00
[Wed May 10 16:22:24 2023] RSP: 0018:ffffa4c984a97b48 EFLAGS: 00010207
[Wed May 10 16:22:24 2023] RAX: 0000000000000000 RBX: 0000000000000005 RCX: dead000000000200
[Wed May 10 16:22:24 2023] RDX: ffff8cfc56d7d7d0 RSI: ffff8cfc56d7d7d0 RDI: ffff8cfc56d7d7c8
[Wed May 10 16:22:24 2023] RBP: ffff8cfc45507570 R08: ffff8cfc56d7d7d0 R09: 0000000000000001
[Wed May 10 16:22:24 2023] R10: ffff8cfce4206f00 R11: 00000000ffff8ce0 R12: 0000000000000155
[Wed May 10 16:22:24 2023] R13: ffff8cfce4206f00 R14: ffff8cfce4206f08 R15: 0000000000000001
[Wed May 10 16:22:24 2023] FS:  0000000000000000(0000) GS:ffff8cfdb7c80000(0000) knlGS:0000000000000000
[Wed May 10 16:22:24 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed May 10 16:22:24 2023] CR2: 00007f7b61a629f0 CR3: 00000001049f0000 CR4: 00000000003506e0
[Wed May 10 16:22:24 2023] Call Trace:
[Wed May 10 16:22:24 2023]  ceph_handle_caps+0xdf2/0x1780 [ceph]
[Wed May 10 16:22:24 2023]  mds_dispatch+0x13a/0x670 [ceph]
[Wed May 10 16:22:24 2023]  ceph_con_process_message+0x79/0x140 [libceph]
[Wed May 10 16:22:24 2023]  ? calc_signature+0xdf/0x110 [libceph]
[Wed May 10 16:22:24 2023]  ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
[Wed May 10 16:22:24 2023]  ceph_con_workfn+0x329/0x680 [libceph]
[Wed May 10 16:22:24 2023]  process_one_work+0x1a7/0x360
[Wed May 10 16:22:24 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:22:24 2023]  worker_thread+0x30/0x390
[Wed May 10 16:22:24 2023]  ? create_worker+0x1a0/0x1a0
[Wed May 10 16:22:24 2023]  kthread+0x134/0x150
[Wed May 10 16:22:24 2023]  ? set_kthread_struct+0x50/0x50
[Wed May 10 16:22:24 2023]  ret_from_fork+0x35/0x40
[Wed May 10 16:22:24 2023] ---[ end trace 03a1d82065fdafc0 ]---
[Wed May 10 17:54:14 2023] libceph: mon0 (1)192.168.32.65:6789 session established
[Wed May 10 17:54:14 2023] libceph: client210426330 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9
[Wed May 10 17:54:16 2023] libceph: mon2 (1)192.168.32.67:6789 session established
[Wed May 10 17:54:16 2023] libceph: client210350757 fsid e4ece518-f2cb-4708-b00f-b6bf511e91d9

Subtasks 3 (0 open3 closed)

Bug #57686: general protection fault and CephFS kernel client hangs after MDS failoverDuplicateXiubo Li

Actions
Bug #57703: unable to handle page fault for address and system lockup after MDS failoverDuplicate

Actions
Bug #57817: general protection fault and CephFS kernel client hangs after MDS failoverDuplicateXiubo Li

Actions

Related issues 1 (0 open1 closed)

Copied to CephFS - Bug #61217: ceph: corrupt snap message from mds1ResolvedXiubo Li

Actions
Actions #1

Updated by Xiubo Li 12 months ago

  • Status changed from New to Fix Under Review
Actions #2

Updated by Xiubo Li 12 months ago

  • Copied to Bug #61217: ceph: corrupt snap message from mds1 added
Actions #3

Updated by Xiubo Li 12 months ago

This tracker is only tracking the corrupted snaptrace issue, but this still couldn't explain why the metadata corrupted in MDS. I have no evidence that the corrupted snaptrace could cause the MDS corrupting.

The following is just a guess:

Firstly the MDS sent a fake snaptrace to kclient with MClientSnap1 and then the kclient just sent corrupted capsnaps to MDS and then the MDS sent another MClientSnap2 msg, but this time the snaptrace couldn't be parsed and then the kclient corrupted.

Actions #5

Updated by Xiubo Li 11 months ago

  • Status changed from Fix Under Review to Resolved

Merged to Linus' tree.

Actions

Also available in: Atom PDF