Project

General

Profile

Actions

Bug #52436

closed

fs/ceph: "corrupt mdsmap"

Added by Patrick Donnelly over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Crash signature (v1):
Crash signature (v2):

Description

[  885.585012] ceph: corrupt mdsmap
[  885.588480] mdsmap: 00000000: 05 04 24 04 00 00 38 00 00 00 12 00 00 00 00 00  ..$...8.........
[  885.588487] mdsmap: 00000010: 00 00 00 00 00 00 3c 00 00 00 2c 01 00 00 00 00  ......<...,.....
[  885.588490] mdsmap: 00000020: 00 00 00 01 00 00 01 00 00 00 01 00 00 00 66 1b  ..............f.
[  885.588493] mdsmap: 00000030: 00 00 00 00 00 00 0a 04 ff 01 00 00 66 1b 00 00  ............f...
[  885.588496] mdsmap: 00000040: 00 00 00 00 01 00 00 00 63 00 00 00 00 2d 00 00  ........c....-..
[  885.588498] mdsmap: 00000050: 00 0d 00 00 00 03 00 00 00 00 00 00 00 02 02 00  ................
[  885.588501] mdsmap: 00000060: 00 00 01 01 01 1c 00 00 00 02 00 00 00 2a 3b d7  .............*;.
[  885.588504] mdsmap: 00000070: cc 10 00 00 00 02 00 1a b3 ac 15 0f 2a 00 00 00  ............*...
[  885.588506] mdsmap: 00000080: 00 00 00 00 00 01 01 01 1c 00 00 00 01 00 00 00  ................
[  885.588509] mdsmap: 00000090: 2a 3b d7 cc 10 00 00 00 02 00 1a b5 ac 15 0f 2a  *;.............*
[  885.588512] mdsmap: 000000a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  885.588515] mdsmap: 000000b0: ff ff ff ff 00 00 00 00 01 00 00 00 01 00 00 00  ................
[  885.588518] mdsmap: 000000c0: ff ff fd 7f bb cf 01 3f ff ff ff ff 00 00 00 00  .......?........
[  885.588520] mdsmap: 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  885.588523] mdsmap: 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 fe 07 00  ................
[  885.588526] mdsmap: 000000f0: 00 00 00 00 00 0a 00 00 00 01 00 00 00 00 00 00  ................
[  885.588528] mdsmap: 00000100: 00 0a 00 00 00 62 61 73 65 20 76 30 2e 32 30 02  .....base v0.20.
[  885.588531] mdsmap: 00000110: 00 00 00 00 00 00 00 17 00 00 00 63 6c 69 65 6e  ...........clien
[  885.588534] mdsmap: 00000120: 74 20 77 72 69 74 65 61 62 6c 65 20 72 61 6e 67  t writeable rang
[  885.588536] mdsmap: 00000130: 65 73 03 00 00 00 00 00 00 00 1c 00 00 00 64 65  es............de
[  885.588539] mdsmap: 00000140: 66 61 75 6c 74 20 66 69 6c 65 20 6c 61 79 6f 75  fault file layou
[  885.588542] mdsmap: 00000150: 74 73 20 6f 6e 20 64 69 72 73 04 00 00 00 00 00  ts on dirs......
[  885.588544] mdsmap: 00000160: 00 00 1c 00 00 00 64 69 72 20 69 6e 6f 64 65 20  ......dir inode 
[  885.588547] mdsmap: 00000170: 69 6e 20 73 65 70 61 72 61 74 65 20 6f 62 6a 65  in separate obje
[  885.588550] mdsmap: 00000180: 63 74 05 00 00 00 00 00 00 00 1b 00 00 00 6d 64  ct............md
[  885.588552] mdsmap: 00000190: 73 20 75 73 65 73 20 76 65 72 73 69 6f 6e 65 64  s uses versioned
[  885.588555] mdsmap: 000001a0: 20 65 6e 63 6f 64 69 6e 67 06 00 00 00 00 00 00   encoding.......
[  885.588558] mdsmap: 000001b0: 00 19 00 00 00 64 69 72 66 72 61 67 20 69 73 20  .....dirfrag is 
[  885.588560] mdsmap: 000001c0: 73 74 6f 72 65 64 20 69 6e 20 6f 6d 61 70 07 00  stored in omap..
[  885.588563] mdsmap: 000001d0: 00 00 00 00 00 00 14 00 00 00 6d 64 73 20 75 73  ..........mds us
[  885.588566] mdsmap: 000001e0: 65 73 20 69 6e 6c 69 6e 65 20 64 61 74 61 08 00  es inline data..
[  885.588587] mdsmap: 000001f0: 00 00 00 00 00 00 0f 00 00 00 6e 6f 20 61 6e 63  ..........no anc
[  885.588590] mdsmap: 00000200: 68 6f 72 20 74 61 62 6c 65 09 00 00 00 00 00 00  hor table.......
[  885.588593] mdsmap: 00000210: 00 0e 00 00 00 66 69 6c 65 20 6c 61 79 6f 75 74  .....file layout
[  885.588595] mdsmap: 00000220: 20 76 32 0a 00 00 00 00 00 00 00 0c 00 00 00 73   v2............s
[  885.588598] mdsmap: 00000230: 6e 61 70 72 65 61 6c 6d 20 76 32 01 00 00 00 0d  naprealm v2.....
[  885.588600] mdsmap: 00000240: 00 00 00 00 00 00 00 ff ff ff ff ff ff ff ff 10  ................
[  885.588603] mdsmap: 00000250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[  885.588606] mdsmap: 00000260: 00 00 00 00 00 00 00 00 00 fe 07 00 00 00 00 00  ................
[  885.588608] mdsmap: 00000270: 00 0a 00 00 00 01 00 00 00 00 00 00 00 0a 00 00  ................
[  885.588611] mdsmap: 00000280: 00 62 61 73 65 20 76 30 2e 32 30 02 00 00 00 00  .base v0.20.....
[  885.588614] mdsmap: 00000290: 00 00 00 17 00 00 00 63 6c 69 65 6e 74 20 77 72  .......client wr
[  885.588616] mdsmap: 000002a0: 69 74 65 61 62 6c 65 20 72 61 6e 67 65 73 03 00  iteable ranges..
[  885.588619] mdsmap: 000002b0: 00 00 00 00 00 00 1c 00 00 00 64 65 66 61 75 6c  ..........defaul
[  885.588622] mdsmap: 000002c0: 74 20 66 69 6c 65 20 6c 61 79 6f 75 74 73 20 6f  t file layouts o
[  885.588624] mdsmap: 000002d0: 6e 20 64 69 72 73 04 00 00 00 00 00 00 00 1c 00  n dirs..........
[  885.588627] mdsmap: 000002e0: 00 00 64 69 72 20 69 6e 6f 64 65 20 69 6e 20 73  ..dir inode in s
[  885.588630] mdsmap: 000002f0: 65 70 61 72 61 74 65 20 6f 62 6a 65 63 74 05 00  eparate object..
[  885.588632] mdsmap: 00000300: 00 00 00 00 00 00 1b 00 00 00 6d 64 73 20 75 73  ..........mds us
[  885.588635] mdsmap: 00000310: 65 73 20 76 65 72 73 69 6f 6e 65 64 20 65 6e 63  es versioned enc
[  885.588638] mdsmap: 00000320: 6f 64 69 6e 67 06 00 00 00 00 00 00 00 19 00 00  oding...........
[  885.588640] mdsmap: 00000330: 00 64 69 72 66 72 61 67 20 69 73 20 73 74 6f 72  .dirfrag is stor
[  885.588643] mdsmap: 00000340: 65 64 20 69 6e 20 6f 6d 61 70 07 00 00 00 00 00  ed in omap......
[  885.588645] mdsmap: 00000350: 00 00 14 00 00 00 6d 64 73 20 75 73 65 73 20 69  ......mds uses i
[  885.588648] mdsmap: 00000360: 6e 6c 69 6e 65 20 64 61 74 61 08 00 00 00 00 00  nline data......
[  885.588651] mdsmap: 00000370: 00 00 0f 00 00 00 6e 6f 20 61 6e 63 68 6f 72 20  ......no anchor 
[  885.588653] mdsmap: 00000380: 74 61 62 6c 65 09 00 00 00 00 00 00 00 0e 00 00  table...........
[  885.588656] mdsmap: 00000390: 00 66 69 6c 65 20 6c 61 79 6f 75 74 20 76 32 0a  .file layout v2.
[  885.588659] mdsmap: 000003a0: 00 00 00 00 00 00 00 0c 00 00 00 73 6e 61 70 72  ...........snapr
[  885.588661] mdsmap: 000003b0: 65 61 6c 6d 20 76 32 0c 00 00 00 00 00 00 00 5e  ealm v2........^
[  885.588664] mdsmap: 000003c0: 1d 29 61 44 d9 f9 0e a9 1d 29 61 c3 8b 62 2c 00  .)aD.....)a..b,.
[  885.588667] mdsmap: 000003d0: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00  ................
[  885.588669] mdsmap: 000003e0: 00 00 00 38 00 00 00 01 00 00 00 00 00 00 00 66  ...8...........f
[  885.588672] mdsmap: 000003f0: 1b 00 00 00 00 00 00 00 00 00 00 01 00 00 00 01  ................
[  885.588674] mdsmap: 00000400: 00 00 00 00 00 00 00 00 00 00 01 06 00 00 00 63  ...............c
[  885.588677] mdsmap: 00000410: 65 70 68 66 73 00 00 00 00 00 00 00 00 00 00 00  ephfs...........
[  885.588680] mdsmap: 00000420: 00 00 00 00 00 00 00 00 00 00                    ..........
[  885.588690] BUG: kernel NULL pointer dereference, address: 0000000000000030
[  885.595707] #PF: supervisor read access in kernel mode
[  885.600903] #PF: error_code(0x0000) - not-present page
[  885.606093] PGD 0 P4D 0 
[  885.608678] Oops: 0000 [#1] SMP PTI
[  885.612216] CPU: 1 PID: 21153 Comm: kworker/1:21 Not tainted 5.14.0-rc7-ceph-g93c7ab6f6301 #1
[  885.620794] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
[  885.628340] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[  885.633922] RIP: 0010:check_new_map+0x44/0x600 [ceph]
[  885.639051] Code: fc 55 53 48 81 ec 40 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 38 01 00 00 31 c0 48 8d 7c 24 38 f3 48 ab 0f 1f 44 00 00 <49> 8b 55 30 48 85 d2 74 31 44 8b 82 94 00 00 00 45 85 c0 7e 25 31
[  885.657871] RSP: 0018:ffffb510885f3a88 EFLAGS: 00010246
[  885.663143] RAX: 0000000000000000 RBX: ffff99851160d000 RCX: 0000000000000000
[  885.670325] RDX: ffff9984cbb7b300 RSI: 0000000000000000 RDI: ffffb510885f3bc0
[  885.677510] RBP: ffffb510885f3c60 R08: 0000000000000001 R09: 0000000000000001
[  885.684695] R10: ffff9984cbb7b300 R11: 0000000000000001 R12: ffff99851160d000
[  885.691878] R13: 0000000000000000 R14: ffff99851160d008 R15: ffff9984c6408428
[  885.699064] FS:  0000000000000000(0000) GS:ffff998c1fc40000(0000) knlGS:0000000000000000
[  885.707241] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  885.713050] CR2: 0000000000000030 CR3: 000000010cad8001 CR4: 00000000001706e0
[  885.720255] Call Trace:
[  885.722772]  ? ceph_mdsmap_decode+0x122/0xc80 [ceph]
[  885.727825]  ? kfree+0x140/0x360
[  885.731130]  ceph_mdsc_handle_mdsmap+0x14a/0x260 [ceph]
[  885.736445]  ? extra_mon_dispatch+0x34/0x40 [ceph]
[  885.741319]  extra_mon_dispatch+0x34/0x40 [ceph]
[  885.746018]  mon_dispatch+0x6a/0xb30 [libceph]
[  885.750562]  ? ceph_con_process_message+0x65/0x160 [libceph]
[  885.756316]  ? lock_release+0xc7/0x290
[  885.760133]  ? __mutex_unlock_slowpath+0x45/0x2a0
[  885.764916]  ? ceph_con_process_message+0x74/0x160 [libceph]
[  885.770667]  ceph_con_process_message+0x74/0x160 [libceph]
[  885.776244]  ceph_con_v1_try_read+0x59c/0x1630 [libceph]
[  885.781649]  ? lock_acquire+0xc8/0x2d0
[  885.785466]  ? process_one_work+0x1be/0x540
[  885.789723]  ceph_con_workfn+0x271/0x6f0 [libceph]
[  885.794606]  process_one_work+0x238/0x540
[  885.798685]  worker_thread+0x50/0x3a0
[  885.802419]  ? process_one_work+0x540/0x540
[  885.806675]  kthread+0x140/0x160
[  885.809975]  ? set_kthread_struct+0x40/0x40
[  885.814231]  ret_from_fork+0x1f/0x30
[  885.817883] Modules linked in: xt_comment ipt_REJECT nf_reject_ipv4 xt_tcpudp ceph libceph fscache netfs veth xfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm configfs ib_core overlay intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul ghash_clmulni_intel aesni_intel ipmi_ssif crypto_simd cryptd joydev mei_me mei ioatdma acpi_ipmi wmi ipmi_si ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter kvm_intel kvm irqbypass sch_fq_codel scsi_transport_iscsi lp parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 hid_generic usbhid hid igb i2c_algo_bit ixgbe nvme crc32_pclmul dca i2c_i801 ahci ptp nvme_core lpc_ich i2c_smbus libahci pps_core mdio
[  885.899216] CR2: 0000000000000030
[  885.902599] ---[ end trace aa5f6f2a55e50455 ]---
[  886.009137] RIP: 0010:check_new_map+0x44/0x600 [ceph]
[  886.014308] Code: fc 55 53 48 81 ec 40 01 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 38 01 00 00 31 c0 48 8d 7c 24 38 f3 48 ab 0f 1f 44 00 00 <49> 8b 55 30 48 85 d2 74 31 44 8b 82 94 00 00 00 45 85 c0 7e 25 31
[  886.033191] RSP: 0018:ffffb510885f3a88 EFLAGS: 00010246
[  886.038487] RAX: 0000000000000000 RBX: ffff99851160d000 RCX: 0000000000000000
[  886.045693] RDX: ffff9984cbb7b300 RSI: 0000000000000000 RDI: ffffb510885f3bc0
[  886.052895] RBP: ffffb510885f3c60 R08: 0000000000000001 R09: 0000000000000001
[  886.060096] R10: ffff9984cbb7b300 R11: 0000000000000001 R12: ffff99851160d000
[  886.067298] R13: 0000000000000000 R14: ffff99851160d008 R15: ffff9984c6408428
[  886.074513] FS:  0000000000000000(0000) GS:ffff998c1fc40000(0000) knlGS:0000000000000000
[  886.082696] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  886.088512] CR2: 0000000000000030 CR3: 000000010cad8001 CR4: 00000000001706e0
[  886.095722] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[  886.104684] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 21153, name: kworker/1:21
[  886.113213] INFO: lockdep is turned off.
[  886.117205] irq event stamp: 432642
[  886.120767] hardirqs last  enabled at (432641): [<ffffffff8e2f4081>] kfree+0x1e1/0x360
[  886.128775] hardirqs last disabled at (432642): [<ffffffff8eb1fad8>] exc_page_fault+0x38/0x260
[  886.137479] softirqs last  enabled at (432220): [<ffffffff8e9ad7d0>] tcp_recvmsg+0xb0/0x1b0
[  886.145929] softirqs last disabled at (432218): [<ffffffff8e8caba9>] release_sock+0x19/0xa0
[  886.154381] CPU: 1 PID: 21153 Comm: kworker/1:21 Tainted: G      D           5.14.0-rc7-ceph-g93c7ab6f6301 #1
[  886.164385] Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
[  886.171956] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[  886.177539] Call Trace:
[  886.180054]  dump_stack_lvl+0x57/0x72
[  886.183791]  ___might_sleep.cold+0xb6/0xc6
[  886.187962]  exit_signals+0x30/0x310
[  886.191609]  do_exit+0xc7/0xc20
[  886.194824]  ? kthread+0x140/0x160
[  886.198298]  rewind_stack_do_exit+0x17/0x20
[  886.202557] RIP: 0000:0x0
[  886.205250] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  886.212198] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[  886.219861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  886.227062] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  886.234263] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  886.241469] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  886.248676] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  922.323902] libceph: mon0 (1)172.21.15.42:6789 session lost, hunting for new mon
[ 1173.300550] ceph: mds0 hung

From: /ceph/teuthology-archive/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6362981

Another, same test: https://pulpito.ceph.com/pdonnell-2021-08-27_16:46:16-fs-wip-pdonnell-testing-20210827.024746-distro-basic-smithi/6363001/

Seems to be reliably reproduced by this test. Only testing branch of kernel.

Actions

Also available in: Atom PDF