Actions
Bug #40190
closedoops or soft lockup after MDS crash
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
Yesterday we had a v12.2.12 mds crash [1] and several kernel clients (el7 kernel 3.10.0-957.12.1.el7.x86_64) either deadlocked or oops'd.
Here is an OOPS:
[196767.597903] libceph: mds4 137.138.44.130:6800 socket closed (con state NEGOTIATING) [196768.439310] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [196768.445991] libceph: mds4 137.138.44.130:6800 wrong peer at address [196769.439371] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [196769.446550] libceph: mds4 137.138.44.130:6800 wrong peer at address [196771.442082] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [196771.447412] libceph: mds4 137.138.44.130:6800 wrong peer at address [196827.479453] BUG: unable to handle kernel NULL pointer dereference at (null) [196827.485410] IP: [<ffffffffa03950cb>] __list_add+0x1b/0xc0 [196827.486455] PGD 0 [196827.486909] Oops: 0000 [#1] SMP [196827.487634] Modules linked in: tcp_diag inet_diag fuse ceph libceph dns_resolver nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_pkttype ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter ppdev iosf_mbi kvm_intel kvm irqbypass cirrus ttm parport_pc parport drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops joydev drm virtio_balloon i2c_piix4 drm_panel_orientation_quirks pcspkr binfmt_misc openafs(POE) auth_rpcgss ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd floppy ata_piix serio_raw libata virtio_pci virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_mod [196827.514284] CPU: 0 PID: 180996 Comm: kworker/0:3 Kdump: loaded Tainted: P OE ------------ 3.10.0-957.12.2.el7.x86_64 #1 [196827.519880] Hardware name: RDO OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [196827.522908] Workqueue: events delayed_work [ceph] [196827.525493] task: ffff9e07bb2a4100 ti: ffff9e065a720000 task.ti: ffff9e065a720000 [196827.528538] RIP: 0010:[<ffffffffa03950cb>] [<ffffffffa03950cb>] __list_add+0x1b/0xc0 [196827.532676] RSP: 0018:ffff9e065a723be8 EFLAGS: 00010246 [196827.536233] RAX: 00000000ffffffff RBX: ffff9e065a723c10 RCX: 0000000000000000 [196827.539217] RDX: ffff9e06613c6028 RSI: 0000000000000000 RDI: ffff9e065a723c10 [196827.544014] RBP: ffff9e065a723c00 R08: 0000000000000000 R09: 0000000000000000 [196827.546969] R10: ffff9e07c2740258 R11: 0000000000000000 R12: ffff9e06613c6028 [196827.549781] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff9e06613c6028 [196827.552435] FS: 0000000000000000(0000) GS:ffff9e07d4800000(0000) knlGS:0000000000000000 [196827.555280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [196827.557758] CR2: 0000000000000000 CR3: 0000000194a6a000 CR4: 00000000003606f0 [196827.560485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [196827.563300] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [196827.566125] Call Trace: [196827.568168] [<ffffffffa0767c96>] __mutex_lock_slowpath+0xa6/0x1d0 [196827.570839] [<ffffffffa076709f>] mutex_lock+0x1f/0x2f [196827.573253] [<ffffffffc0794f60>] ceph_check_caps+0x440/0xc10 [ceph] [196827.575872] [<ffffffffa00aaf78>] ? add_timer+0x18/0x20 [196827.578281] [<ffffffffc0799344>] ceph_check_delayed_caps+0xc4/0x110 [ceph] [196827.585644] [<ffffffffc07a1945>] delayed_work+0x35/0x260 [ceph] [196827.588274] [<ffffffffa00b9e3f>] process_one_work+0x17f/0x440 [196827.590891] [<ffffffffa00baed6>] worker_thread+0x126/0x3c0 [196827.593600] [<ffffffffa00badb0>] ? manage_workers.isra.25+0x2a0/0x2a0 [196827.596366] [<ffffffffa00c1d21>] kthread+0xd1/0xe0 [196827.598842] [<ffffffffa00c1c50>] ? insert_kthread_work+0x40/0x40 [196827.601518] [<ffffffffa0775c1d>] ret_from_fork_nospec_begin+0x7/0x21 [196827.604240] [<ffffffffa00c1c50>] ? insert_kthread_work+0x40/0x40 [196827.607142] Code: ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89 [196827.615446] RIP [<ffffffffa03950cb>] __list_add+0x1b/0xc0 [196827.618034] RSP <ffff9e065a723be8> [196827.620176] CR2: 0000000000000000
And here is an example soft lockup (machines in this state had to be hard rebooted):
Jun 5 19:54:48 boincai12 kernel: libceph: mds4 137.138.44.130:6800 socket closed (con state CONNECTING) Jun 5 19:54:48 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g ot 137.138.44.130:6800/-2088078627 Jun 5 19:54:48 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address Jun 5 19:54:49 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g ot 137.138.44.130:6800/-2088078627 Jun 5 19:54:49 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address Jun 5 19:54:51 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g ot 137.138.44.130:6800/-2088078627 Jun 5 19:54:51 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address ... Jun 5 20:01:56 boincai12 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [read er#4:1954587] Jun 5 20:01:56 boincai12 kernel: Modules linked in: tcp_diag inet_diag ceph libceph dns_reso lver nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_pkttype ipt_REJECT nf_reject_ipv4 nf_ conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_set xt_multiport xt_conntrack nf_conntrack ip_set _hash_ip ip_set iptable_filter nfnetlink ppdev iosf_mbi kvm_intel kvm irqbypass cirrus ttm jo ydev drm_kms_helper parport_pc parport syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr d rm virtio_balloon drm_panel_orientation_quirks i2c_piix4 binfmt_misc openafs(POE) auth_rpcgss ip_tables xfs libcrc32c virtio_blk virtio_net ata_generic pata_acpi crct10dif_pclmul crct10d if_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper Jun 5 20:01:56 boincai12 kernel: ablk_helper cryptd floppy serio_raw ata_piix libata virtio_ pci virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_mod Jun 5 20:01:56 boincai12 kernel: CPU: 0 PID: 1954587 Comm: reader#4 Kdump: loaded Tainted: P OE ------------ 3.10.0-957.12.1.el7.x86_64 #1 Jun 5 20:01:56 boincai12 kernel: Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/ 01/2014 Jun 5 20:01:56 boincai12 kernel: task: ffff9df577a61040 ti: ffff9df697b94000 task.ti: ffff9d f697b94000 Jun 5 20:01:56 boincai12 kernel: RIP: 0010:[<ffffffff8536ae86>] [<ffffffff8536ae86>] native _safe_halt+0x6/0x10 Jun 5 20:01:56 boincai12 kernel: RSP: 0018:ffff9df697b97cd8 EFLAGS: 00000202 Jun 5 20:01:56 boincai12 kernel: RAX: 0000000000000001 RBX: 00000000e9a64a44 RCX: 0000000000 000001 Jun 5 20:01:56 boincai12 kernel: RDX: 00000000000000a0 RSI: 0000000000000001 RDI: ffff9df6d481b7c4 Jun 5 20:01:56 boincai12 kernel: RBP: ffff9df697b97cd8 R08: 0000000000000000 R09: 0000000000000000 Jun 5 20:01:56 boincai12 kernel: R10: 0000000000000020 R11: 0000000000000000 R12: ffff9df6000001bd Jun 5 20:01:56 boincai12 kernel: R13: ffff9df6c6957bc0 R14: 0cd89df600000d80 R15: ffff9df4f4174710 Jun 5 20:01:56 boincai12 kernel: FS: 00007fac947f8700(0000) GS:ffff9df6d4800000(0000) knlGS:0000000000000000 Jun 5 20:01:56 boincai12 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 5 20:01:56 boincai12 kernel: CR2: 00005646b04b4000 CR3: 0000000034458000 CR4: 00000000001606f0 Jun 5 20:01:56 boincai12 kernel: Call Trace: Jun 5 20:01:56 boincai12 kernel: [<ffffffff84c698f5>] kvm_wait+0x65/0x70 Jun 5 20:01:56 boincai12 kernel: [<ffffffff84d12066>] __pv_queued_spin_lock_slowpath+0x216/0x2e0 Jun 5 20:01:56 boincai12 kernel: [<ffffffff8535d1ab>] queued_spin_lock_slowpath+0xb/0xf Jun 5 20:01:56 boincai12 kernel: [<ffffffff8536b660>] _raw_spin_lock+0x20/0x30 Jun 5 20:01:56 boincai12 kernel: [<ffffffffc09c0e75>] __cap_is_valid+0x25/0xb0 [ceph] Jun 5 20:01:56 boincai12 kernel: [<ffffffffc09c2d9c>] __ceph_caps_issued_mask+0x5c/0x200 [ceph] Jun 5 20:01:56 boincai12 kernel: [<ffffffffc09b198d>] __ceph_do_getattr+0x10d/0x200 [ceph] Jun 5 20:01:56 boincai12 kernel: [<ffffffffc09b1af8>] ceph_getattr+0x28/0x100 [ceph] Jun 5 20:01:56 boincai12 kernel: [<ffffffff84e470d9>] vfs_getattr+0x49/0x80 Jun 5 20:01:56 boincai12 kernel: [<ffffffff84e47205>] vfs_fstatat+0x75/0xc0 Jun 5 20:01:56 boincai12 kernel: [<ffffffff84e475ae>] SYSC_newstat+0x2e/0x60 Jun 5 20:01:56 boincai12 kernel: [<ffffffff84d38e96>] ? __audit_syscall_exit+0x1e6/0x280 Jun 5 20:01:56 boincai12 kernel: [<ffffffff84e47a6e>] SyS_newstat+0xe/0x10 Jun 5 20:01:56 boincai12 kernel: [<ffffffff85375d9b>] system_call_fastpath+0x22/0x27 Jun 5 20:01:56 boincai12 kernel: Code: 65 48 8b 05 b5 21 ca 7a 48 89 c2 b9 48 00 00 00 48 c1 ea 20 0f 30 eb 85 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90
The machines which recovered correctly just showed:
[Wed Jun 5 19:53:35 2019] libceph: mds4 137.138.44.130:6800 socket closed (con state OPEN) [Wed Jun 5 19:53:36 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [Wed Jun 5 19:53:36 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address [Wed Jun 5 19:53:37 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [Wed Jun 5 19:53:37 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address [Wed Jun 5 19:53:38 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627 [Wed Jun 5 19:53:38 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address [Wed Jun 5 19:53:44 2019] ceph: mds4 reconnect start [Wed Jun 5 19:53:44 2019] ceph: mds4 reconnect success [Wed Jun 5 19:54:50 2019] ceph: mds4 recovery completed
This is the first time we had kernel clients lockup due to an mds crash. We only recently upgraded to v12.2.12; could this be related?
Cheers, Dan
[1] v12.2.12 mds crash:
2019-06-05 19:54:44.765660 7fda87f8e700 0 -- 137.138.44.130:6800/2077487243 >> 188.184.94.18 2:0/1391033891 conn(0x55b1c1d7d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=12 303575 cs=41 l=0).process missed message? skipped from seq 0 to 192910665 2019-06-05 19:54:44.769521 7fda8580e700 -1 /builddir/build/BUILD/ceph-12.2.12/src/mds/MDSRank .cc: In function 'Session* MDSRank::get_session(Message*)' thread 7fda8580e700 time 2019-06-0 5 19:54:44.765758 /builddir/build/BUILD/ceph-12.2.12/src/mds/MDSRank.cc: 1304: FAILED assert(session->get_nref() == 1) ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b1af348590] 2: (MDSRank::get_session(Message*)+0x7af) [0x55b1af00760f] 3: (Server::handle_client_session(MClientSession*)+0x32) [0x55b1af060ba2] 4: (Server::dispatch(Message*)+0x5cb) [0x55b1af0932bb] 5: (MDSRank::handle_deferrable_message(Message*)+0x804) [0x55b1aefff044] 6: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55b1af016d73] 7: (MDSRankDispatcher::ms_dispatch(Message*)+0xa0) [0x55b1af017d40] 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55b1aeff5e63] 9: (DispatchQueue::entry()+0x792) [0x55b1af64c9e2] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b1af3d038d] 11: (()+0x7dd5) [0x7fda8a644dd5] 12: (clone()+0x6d) [0x7fda89721ead]
Updated by Patrick Donnelly almost 5 years ago
- Assignee set to Zheng Yan
- Start date deleted (
06/06/2019)
Updated by Zheng Yan almost 5 years ago
yes. it's related. mds crash while importing caps reveals a kernel bug
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 0c84b74ee34..d8e92557221 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1020,6 +1020,7 @@ __open_export_target_session(struct ceph_mds_client *mdsc, int target) if (session->s_state == CEPH_MDS_SESSION_NEW || session->s_state == CEPH_MDS_SESSION_CLOSING) __open_session(mdsc, session); + ^^^ return session; } @@ -3262,6 +3263,8 @@ static void check_new_map(struct ceph_mds_client *mdsc, * out now */ get_session(s); __unregister_session(mdsc, s); + // the session may have caps (which were added by handle_cap_import) + // after __unregister_session(), cap->session pointer become wild __wake_requests(mdsc, &s->s_waiting); ceph_put_mds_session(s); } else if (i >= newmap->m_num_mds) {
Updated by Zheng Yan almost 5 years ago
- Status changed from New to 7
Fixed by commit "ceph: don't blindly unregister session that is in opening state" in https://github.com/ceph/ceph-client/tree/testing
Updated by Patrick Donnelly over 4 years ago
- Status changed from 7 to Fix Under Review
Updated by Ilya Dryomov over 3 years ago
- Category set to fs/ceph
- Status changed from Fix Under Review to Resolved
Actions