Actions
Bug #52436
closedfs/ceph: "corrupt mdsmap"
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Crash signature (v1):
Crash signature (v2):
Description
[ 885.585012] ceph: corrupt mdsmap [ 885.588480] mdsmap: 00000000: 05 04 24 04 00 00 38 00 00 00 12 00 00 00 00 00 ..$...8......... [ 885.588487] mdsmap: 00000010: 00 00 00 00 00 00 3c 00 00 00 2c 01 00 00 00 00 ......<...,..... [ 885.588490] mdsmap: 00000020: 00 00 00 01 00 00 01 00 00 00 01 00 00 00 66 1b ..............f. [ 885.588493] mdsmap: 00000030: 00 00 00 00 00 00 0a 04 ff 01 00 00 66 1b 00 00 ............f... [ 885.588496] mdsmap: 00000040: 00 00 00 00 01 00 00 00 63 00 00 00 00 2d 00 00 ........c....-.. [ 885.588498] mdsmap: 00000050: 00 0d 00 00 00 03 00 00 00 00 00 00 00 02 02 00 ................ [ 885.588501] mdsmap: 00000060: 00 00 01 01 01 1c 00 00 00 02 00 00 00 2a 3b d7 .............*;. [ 885.588504] mdsmap: 00000070: cc 10 00 00 00 02 00 1a b3 ac 15 0f 2a 00 00 00 ............*... [ 885.588506] mdsmap: 00000080: 00 00 00 00 00 01 01 01 1c 00 00 00 01 00 00 00 ................ [ 885.588509] mdsmap: 00000090: 2a 3b d7 cc 10 00 00 00 02 00 1a b5 ac 15 0f 2a *;.............* [ 885.588512] mdsmap: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 885.588515] mdsmap: 000000b0: ff ff ff ff 00 00 00 00 01 00 00 00 01 00 00 00 ................ [ 885.588518] mdsmap: 000000c0: ff ff fd 7f bb cf 01 3f ff ff ff ff 00 00 00 00 .......?........ [ 885.588520] mdsmap: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 885.588523] mdsmap: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 fe 07 00 ................ [ 885.588526] mdsmap: 000000f0: 00 00 00 00 00 0a 00 00 00 01 00 00 00 00 00 00 ................ [ 885.588528] mdsmap: 00000100: 00 0a 00 00 00 62 61 73 65 20 76 30 2e 32 30 02 .....base v0.20. [ 885.588531] mdsmap: 00000110: 00 00 00 00 00 00 00 17 00 00 00 63 6c 69 65 6e ...........clien [ 885.588534] mdsmap: 00000120: 74 20 77 72 69 74 65 61 62 6c 65 20 72 61 6e 67 t writeable rang [ 885.588536] mdsmap: 00000130: 65 73 03 00 00 00 00 00 00 00 1c 00 00 00 64 65 es............de [ 885.588539] mdsmap: 00000140: 66 61 75 6c 74 20 66 69 6c 65 20 6c 61 79 6f 75 fault file layou [ 885.588542] mdsmap: 00000150: 74 73 20 6f 6e 20 64 69 72 73 04 00 00 00 00 00 ts on dirs...... [ 885.588544] mdsmap: 00000160: 00 00 1c 00 00 00 64 69 72 20 69 6e 6f 64 65 20 ......dir inode [ 885.588547] mdsmap: 00000170: 69 6e 20 73 65 70 61 72 61 74 65 20 6f 62 6a 65 in separate obje [ 885.588550] mdsmap: 00000180: 63 74 05 00 00 00 00 00 00 00 1b 00 00 00 6d 64 ct............md [ 885.588552] mdsmap: 00000190: 73 20 75 73 65 73 20 76 65 72 73 69 6f 6e 65 64 s uses versioned [ 885.588555] mdsmap: 000001a0: 20 65 6e 63 6f 64 69 6e 67 06 00 00 00 00 00 00 encoding....... [ 885.588558] mdsmap: 000001b0: 00 19 00 00 00 64 69 72 66 72 61 67 20 69 73 20 .....dirfrag is [ 885.588560] mdsmap: 000001c0: 73 74 6f 72 65 64 20 69 6e 20 6f 6d 61 70 07 00 stored in omap.. [ 885.588563] mdsmap: 000001d0: 00 00 00 00 00 00 14 00 00 00 6d 64 73 20 75 73 ..........mds us [ 885.588566] mdsmap: 000001e0: 65 73 20 69 6e 6c 69 6e 65 20 64 61 74 61 08 00 es inline data.. [ 885.588587] mdsmap: 000001f0: 00 00 00 00 00 00 0f 00 00 00 6e 6f 20 61 6e 63 ..........no anc [ 885.588590] mdsmap: 00000200: 68 6f 72 20 74 61 62 6c 65 09 00 00 00 00 00 00 hor table....... [ 885.588593] mdsmap: 00000210: 00 0e 00 00 00 66 69 6c 65 20 6c 61 79 6f 75 74 .....file layout [ 885.588595] mdsmap: 00000220: 20 76 32 0a 00 00 00 00 00 00 00 0c 00 00 00 73 v2............s [ 885.588598] mdsmap: 00000230: 6e 61 70 72 65 61 6c 6d 20 76 32 01 00 00 00 0d naprealm v2..... [ 885.588600] mdsmap: 00000240: 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 10 ................ [ 885.588603] mdsmap: 00000250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 885.588606] mdsmap: 00000260: 00 00 00 00 00 00 00 00 00 fe 07 00 00 00 00 00 ................ [ 885.588608] mdsmap: 00000270: 00 0a 00 00 00 01 00 00 00 00 00 00 00 0a 00 00 ................ [ 885.588611] mdsmap: 00000280: 00 62 61 73 65 20 76 30 2e 32 30 02 00 00 00 00 .base v0.20..... [ 885.588614] mdsmap: 00000290: 00 00 00 17 00 00 00 63 6c 69 65 6e 74 20 77 72 .......client wr [ 885.588616] mdsmap: 000002a0: 69 74 65 61 62 6c 65 20 72 61 6e 67 65 73 03 00 iteable ranges.. [ 885.588619] mdsmap: 000002b0: 00 00 00 00 00 00 1c 00 00 00 64 65 66 61 75 6c ..........defaul [ 885.588622] mdsmap: 000002c0: 74 20 66 69 6c 65 20 6c 61 79 6f 75 74 73 20 6f t file layouts o [ 885.588624] mdsmap: 000002d0: 6e 20 64 69 72 73 04 00 00 00 00 00 00 00 1c 00 n dirs.......... [ 885.588627] mdsmap: 000002e0: 00 00 64 69 72 20 69 6e 6f 64 65 20 69 6e 20 73 ..dir inode in s [ 885.588630] mdsmap: 000002f0: 65 70 61 72 61 74 65 20 6f 62 6a 65 63 74 05 00 eparate object.. [ 885.588632] mdsmap: 00000300: 00 00 00 00 00 00 1b 00 00 00 6d 64 73 20 75 73 ..........mds us [ 885.588635] mdsmap: 00000310: 65 73 20 76 65 72 73 69 6f 6e 65 64 20 65 6e 63 es versioned enc [ 885.588638] mdsmap: 00000320: 6f 64 69 6e 67 06 00 00 00 00 00 00 00 19 00 00 oding........... [ 885.588640] mdsmap: 00000330: 00 64 69 72 66 72 61 67 20 69 73 20 73 74 6f 72 .dirfrag is stor [ 885.588643] mdsmap: 00000340: 65 64 20 69 6e 20 6f 6d 61 70 07 00 00 00 00 00 ed in omap...... [ 885.588645] mdsmap: 00000350: 00 00 14 00 00 00 6d 64 73 20 75 73 65 73 20 69 ......mds uses i [ 885.588648] mdsmap: 00000360: 6e 6c 69 6e 65 20 64 61 74 61 08 00 00 00 00 00 nline data...... [ 885.588651] mdsmap: 00000370: 00 00 0f 00 00 00 6e 6f 20 61 6e 63 68 6f 72 20 ......no anchor [ 885.588653] mdsmap: 00000380: 74 61 62 6c 65 09 00 00 00 00 00 00 00 0e 00 00 table........... [ 885.588656] mdsmap: 00000390: 00 66 69 6c 65 20 6c 61 79 6f 75 74 20 76 32 0a .file layout v2. [ 885.588659] mdsmap: 000003a0: 00 00 00 00 00 00 00 0c 00 00 00 73 6e 61 70 72 ...........snapr [ 885.588661] mdsmap: 000003b0: 65 61 6c 6d 20 76 32 0c 00 00 00 00 00 00 00 5e ealm v2........^ [ 885.588664] mdsmap: 000003c0: 1d 29 61 44 d9 f9 0e a9 1d 29 61 c3 8b 62 2c 00 .)aD.....)a..b,. [ 885.588667] mdsmap: 000003d0: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 ................ [ 885.588669] mdsmap: 000003e0: 00 00 00 38 00 00 00 01 00 00 00 00 00 00 00 66 ...8...........f [ 885.588672] mdsmap: 000003f0: 1b 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01 ................ [ 885.588674] mdsmap: 00000400: 00 00 00 00 00 00 00 00 00 00 01 06 00 00 00 63 ...............c [ 885.588677] mdsmap: 00000410: 65 70 68 66 73 00 00 00 00 00 00 00 00 00 00 00 ephfs........... [ 885.588680] mdsmap: 00000420: 00 00 00 00 00 00 00 00 00 00 .......... [ 885.588690] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 885.595707] #PF: supervisor read access in kernel mode [ 885.600903] #PF: error_code(0x0000) - not-present page [ 885.606093] PGD 0 P4D 0 [ 885.608678] Oops: 0000 [#1] SMP PTI [ 885.612216] CPU: 1 PID: 21153 Comm: kworker/1:21 Not tainted 5.14.0-rc7-ceph-g93c7ab6f6301 #1 [ 885.620794] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [ 885.628340] Workqueue: ceph-msgr ceph_con_workfn [libceph] [ 885.633922] RIP: 0010:check_new_map+0x44/0x600 [ceph] [ 885.639051] Code: fc 55 53 48 81 ec 40 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 38 01 00 00 31 c0 48 8d 7c 24 38 f3 48 ab 0f 1f 44 00 00 <49> 8b 55 30 48 85 d2 74 31 44 8b 82 94 00 00 00 45 85 c0 7e 25 31 [ 885.657871] RSP: 0018:ffffb510885f3a88 EFLAGS: 00010246 [ 885.663143] RAX: 0000000000000000 RBX: ffff99851160d000 RCX: 0000000000000000 [ 885.670325] RDX: ffff9984cbb7b300 RSI: 0000000000000000 RDI: ffffb510885f3bc0 [ 885.677510] RBP: ffffb510885f3c60 R08: 0000000000000001 R09: 0000000000000001 [ 885.684695] R10: ffff9984cbb7b300 R11: 0000000000000001 R12: ffff99851160d000 [ 885.691878] R13: 0000000000000000 R14: ffff99851160d008 R15: ffff9984c6408428 [ 885.699064] FS: 0000000000000000(0000) GS:ffff998c1fc40000(0000) knlGS:0000000000000000 [ 885.707241] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 885.713050] CR2: 0000000000000030 CR3: 000000010cad8001 CR4: 00000000001706e0 [ 885.720255] Call Trace: [ 885.722772] ? ceph_mdsmap_decode+0x122/0xc80 [ceph] [ 885.727825] ? kfree+0x140/0x360 [ 885.731130] ceph_mdsc_handle_mdsmap+0x14a/0x260 [ceph] [ 885.736445] ? extra_mon_dispatch+0x34/0x40 [ceph] [ 885.741319] extra_mon_dispatch+0x34/0x40 [ceph] [ 885.746018] mon_dispatch+0x6a/0xb30 [libceph] [ 885.750562] ? ceph_con_process_message+0x65/0x160 [libceph] [ 885.756316] ? lock_release+0xc7/0x290 [ 885.760133] ? __mutex_unlock_slowpath+0x45/0x2a0 [ 885.764916] ? ceph_con_process_message+0x74/0x160 [libceph] [ 885.770667] ceph_con_process_message+0x74/0x160 [libceph] [ 885.776244] ceph_con_v1_try_read+0x59c/0x1630 [libceph] [ 885.781649] ? lock_acquire+0xc8/0x2d0 [ 885.785466] ? process_one_work+0x1be/0x540 [ 885.789723] ceph_con_workfn+0x271/0x6f0 [libceph] [ 885.794606] process_one_work+0x238/0x540 [ 885.798685] worker_thread+0x50/0x3a0 [ 885.802419] ? process_one_work+0x540/0x540 [ 885.806675] kthread+0x140/0x160 [ 885.809975] ? set_kthread_struct+0x40/0x40 [ 885.814231] ret_from_fork+0x1f/0x30 [ 885.817883] Modules linked in: xt_comment ipt_REJECT nf_reject_ipv4 xt_tcpudp ceph libceph fscache netfs veth xfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm configfs ib_core overlay intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel aesni_intel ipmi_ssif crypto_simd cryptd joydev mei_me mei ioatdma acpi_ipmi wmi ipmi_si ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter kvm_intel kvm irqbypass sch_fq_codel scsi_transport_iscsi lp parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid igb i2c_algo_bit ixgbe nvme crc32_pclmul dca i2c_i801 ahci ptp nvme_core lpc_ich i2c_smbus libahci pps_core mdio [ 885.899216] CR2: 0000000000000030 [ 885.902599] ---[ end trace aa5f6f2a55e50455 ]--- [ 886.009137] RIP: 0010:check_new_map+0x44/0x600 [ceph] [ 886.014308] Code: fc 55 53 48 81 ec 40 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 38 01 00 00 31 c0 48 8d 7c 24 38 f3 48 ab 0f 1f 44 00 00 <49> 8b 55 30 48 85 d2 74 31 44 8b 82 94 00 00 00 45 85 c0 7e 25 31 [ 886.033191] RSP: 0018:ffffb510885f3a88 EFLAGS: 00010246 [ 886.038487] RAX: 0000000000000000 RBX: ffff99851160d000 RCX: 0000000000000000 [ 886.045693] RDX: ffff9984cbb7b300 RSI: 0000000000000000 RDI: ffffb510885f3bc0 [ 886.052895] RBP: ffffb510885f3c60 R08: 0000000000000001 R09: 0000000000000001 [ 886.060096] R10: ffff9984cbb7b300 R11: 0000000000000001 R12: ffff99851160d000 [ 886.067298] R13: 0000000000000000 R14: ffff99851160d008 R15: ffff9984c6408428 [ 886.074513] FS: 0000000000000000(0000) GS:ffff998c1fc40000(0000) knlGS:0000000000000000 [ 886.082696] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 886.088512] CR2: 0000000000000030 CR3: 000000010cad8001 CR4: 00000000001706e0 [ 886.095722] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49 [ 886.104684] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 21153, name: kworker/1:21 [ 886.113213] INFO: lockdep is turned off. [ 886.117205] irq event stamp: 432642 [ 886.120767] hardirqs last enabled at (432641): [<ffffffff8e2f4081>] kfree+0x1e1/0x360 [ 886.128775] hardirqs last disabled at (432642): [<ffffffff8eb1fad8>] exc_page_fault+0x38/0x260 [ 886.137479] softirqs last enabled at (432220): [<ffffffff8e9ad7d0>] tcp_recvmsg+0xb0/0x1b0 [ 886.145929] softirqs last disabled at (432218): [<ffffffff8e8caba9>] release_sock+0x19/0xa0 [ 886.154381] CPU: 1 PID: 21153 Comm: kworker/1:21 Tainted: G D 5.14.0-rc7-ceph-g93c7ab6f6301 #1 [ 886.164385] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015 [ 886.171956] Workqueue: ceph-msgr ceph_con_workfn [libceph] [ 886.177539] Call Trace: [ 886.180054] dump_stack_lvl+0x57/0x72 [ 886.183791] ___might_sleep.cold+0xb6/0xc6 [ 886.187962] exit_signals+0x30/0x310 [ 886.191609] do_exit+0xc7/0xc20 [ 886.194824] ? kthread+0x140/0x160 [ 886.198298] rewind_stack_do_exit+0x17/0x20 [ 886.202557] RIP: 0000:0x0 [ 886.205250] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 886.212198] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 [ 886.219861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 886.227062] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 886.234263] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 886.241469] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 886.248676] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 922.323902] libceph: mon0 (1)172.21.15.42:6789 session lost, hunting for new mon [ 1173.300550] ceph: mds0 hung
From: /ceph/teuthology-archive/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6362981
Another, same test: https://pulpito.ceph.com/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6363001/
Seems to be reliably reproduced by this test. Only testing branch of kernel.
Actions