Project

General

Profile

Bug #40190

oops or soft lockup after MDS crash

Added by Dan van der Ster 5 months ago. Updated 5 months ago.

Status:
Testing
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

Yesterday we had a v12.2.12 mds crash [1] and several kernel clients (el7 kernel 3.10.0-957.12.1.el7.x86_64) either deadlocked or oops'd.

Here is an OOPS:

[196767.597903] libceph: mds4 137.138.44.130:6800 socket closed (con state NEGOTIATING)
[196768.439310] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[196768.445991] libceph: mds4 137.138.44.130:6800 wrong peer at address
[196769.439371] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[196769.446550] libceph: mds4 137.138.44.130:6800 wrong peer at address
[196771.442082] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[196771.447412] libceph: mds4 137.138.44.130:6800 wrong peer at address
[196827.479453] BUG: unable to handle kernel NULL pointer dereference at           (null)
[196827.485410] IP: [<ffffffffa03950cb>] __list_add+0x1b/0xc0
[196827.486455] PGD 0 
[196827.486909] Oops: 0000 [#1] SMP 
[196827.487634] Modules linked in: tcp_diag inet_diag fuse ceph libceph dns_resolver nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_pkttype ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport xt_conntrack nf_conntrack iptable_filter ppdev iosf_mbi kvm_intel kvm irqbypass cirrus ttm parport_pc parport drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops joydev drm virtio_balloon i2c_piix4 drm_panel_orientation_quirks pcspkr binfmt_misc openafs(POE) auth_rpcgss ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd floppy ata_piix serio_raw libata virtio_pci virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_mod
[196827.514284] CPU: 0 PID: 180996 Comm: kworker/0:3 Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.12.2.el7.x86_64 #1
[196827.519880] Hardware name: RDO OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014
[196827.522908] Workqueue: events delayed_work [ceph]
[196827.525493] task: ffff9e07bb2a4100 ti: ffff9e065a720000 task.ti: ffff9e065a720000
[196827.528538] RIP: 0010:[<ffffffffa03950cb>]  [<ffffffffa03950cb>] __list_add+0x1b/0xc0
[196827.532676] RSP: 0018:ffff9e065a723be8  EFLAGS: 00010246
[196827.536233] RAX: 00000000ffffffff RBX: ffff9e065a723c10 RCX: 0000000000000000
[196827.539217] RDX: ffff9e06613c6028 RSI: 0000000000000000 RDI: ffff9e065a723c10
[196827.544014] RBP: ffff9e065a723c00 R08: 0000000000000000 R09: 0000000000000000
[196827.546969] R10: ffff9e07c2740258 R11: 0000000000000000 R12: ffff9e06613c6028
[196827.549781] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff9e06613c6028
[196827.552435] FS:  0000000000000000(0000) GS:ffff9e07d4800000(0000) knlGS:0000000000000000
[196827.555280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[196827.557758] CR2: 0000000000000000 CR3: 0000000194a6a000 CR4: 00000000003606f0
[196827.560485] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[196827.563300] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[196827.566125] Call Trace:
[196827.568168]  [<ffffffffa0767c96>] __mutex_lock_slowpath+0xa6/0x1d0
[196827.570839]  [<ffffffffa076709f>] mutex_lock+0x1f/0x2f
[196827.573253]  [<ffffffffc0794f60>] ceph_check_caps+0x440/0xc10 [ceph]
[196827.575872]  [<ffffffffa00aaf78>] ? add_timer+0x18/0x20
[196827.578281]  [<ffffffffc0799344>] ceph_check_delayed_caps+0xc4/0x110 [ceph]
[196827.585644]  [<ffffffffc07a1945>] delayed_work+0x35/0x260 [ceph]
[196827.588274]  [<ffffffffa00b9e3f>] process_one_work+0x17f/0x440
[196827.590891]  [<ffffffffa00baed6>] worker_thread+0x126/0x3c0
[196827.593600]  [<ffffffffa00badb0>] ? manage_workers.isra.25+0x2a0/0x2a0
[196827.596366]  [<ffffffffa00c1d21>] kthread+0xd1/0xe0
[196827.598842]  [<ffffffffa00c1c50>] ? insert_kthread_work+0x40/0x40
[196827.601518]  [<ffffffffa0775c1d>] ret_from_fork_nospec_begin+0x7/0x21
[196827.604240]  [<ffffffffa00c1c50>] ? insert_kthread_work+0x40/0x40
[196827.607142] Code: ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89 
[196827.615446] RIP  [<ffffffffa03950cb>] __list_add+0x1b/0xc0
[196827.618034]  RSP <ffff9e065a723be8>
[196827.620176] CR2: 0000000000000000

And here is an example soft lockup (machines in this state had to be hard rebooted):

Jun  5 19:54:48 boincai12 kernel: libceph: mds4 137.138.44.130:6800 socket closed (con state 
CONNECTING)
Jun  5 19:54:48 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g
ot 137.138.44.130:6800/-2088078627
Jun  5 19:54:48 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address
Jun  5 19:54:49 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g
ot 137.138.44.130:6800/-2088078627
Jun  5 19:54:49 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address
Jun  5 19:54:51 boincai12 kernel: libceph: wrong peer, want 137.138.44.130:6800/2077487243, g
ot 137.138.44.130:6800/-2088078627
Jun  5 19:54:51 boincai12 kernel: libceph: mds4 137.138.44.130:6800 wrong peer at address
...
Jun  5 20:01:56 boincai12 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [read
er#4:1954587]
Jun  5 20:01:56 boincai12 kernel: Modules linked in: tcp_diag inet_diag ceph libceph dns_reso
lver nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_pkttype ipt_REJECT nf_reject_ipv4 nf_
conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_set xt_multiport xt_conntrack nf_conntrack ip_set
_hash_ip ip_set iptable_filter nfnetlink ppdev iosf_mbi kvm_intel kvm irqbypass cirrus ttm jo
ydev drm_kms_helper parport_pc parport syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr d
rm virtio_balloon drm_panel_orientation_quirks i2c_piix4 binfmt_misc openafs(POE) auth_rpcgss
 ip_tables xfs libcrc32c virtio_blk virtio_net ata_generic pata_acpi crct10dif_pclmul crct10d
if_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper
Jun  5 20:01:56 boincai12 kernel: ablk_helper cryptd floppy serio_raw ata_piix libata virtio_
pci virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_mod
Jun  5 20:01:56 boincai12 kernel: CPU: 0 PID: 1954587 Comm: reader#4 Kdump: loaded Tainted: P
           OE  ------------   3.10.0-957.12.1.el7.x86_64 #1
Jun  5 20:01:56 boincai12 kernel: Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/
01/2014
Jun  5 20:01:56 boincai12 kernel: task: ffff9df577a61040 ti: ffff9df697b94000 task.ti: ffff9d
f697b94000
Jun  5 20:01:56 boincai12 kernel: RIP: 0010:[<ffffffff8536ae86>]  [<ffffffff8536ae86>] native
_safe_halt+0x6/0x10
Jun  5 20:01:56 boincai12 kernel: RSP: 0018:ffff9df697b97cd8  EFLAGS: 00000202
Jun  5 20:01:56 boincai12 kernel: RAX: 0000000000000001 RBX: 00000000e9a64a44 RCX: 0000000000
000001
Jun  5 20:01:56 boincai12 kernel: RDX: 00000000000000a0 RSI: 0000000000000001 RDI: ffff9df6d481b7c4
Jun  5 20:01:56 boincai12 kernel: RBP: ffff9df697b97cd8 R08: 0000000000000000 R09: 0000000000000000
Jun  5 20:01:56 boincai12 kernel: R10: 0000000000000020 R11: 0000000000000000 R12: ffff9df6000001bd
Jun  5 20:01:56 boincai12 kernel: R13: ffff9df6c6957bc0 R14: 0cd89df600000d80 R15: ffff9df4f4174710
Jun  5 20:01:56 boincai12 kernel: FS:  00007fac947f8700(0000) GS:ffff9df6d4800000(0000) knlGS:0000000000000000
Jun  5 20:01:56 boincai12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  5 20:01:56 boincai12 kernel: CR2: 00005646b04b4000 CR3: 0000000034458000 CR4: 00000000001606f0
Jun  5 20:01:56 boincai12 kernel: Call Trace:
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84c698f5>] kvm_wait+0x65/0x70
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84d12066>] __pv_queued_spin_lock_slowpath+0x216/0x2e0
Jun  5 20:01:56 boincai12 kernel: [<ffffffff8535d1ab>] queued_spin_lock_slowpath+0xb/0xf
Jun  5 20:01:56 boincai12 kernel: [<ffffffff8536b660>] _raw_spin_lock+0x20/0x30
Jun  5 20:01:56 boincai12 kernel: [<ffffffffc09c0e75>] __cap_is_valid+0x25/0xb0 [ceph]
Jun  5 20:01:56 boincai12 kernel: [<ffffffffc09c2d9c>] __ceph_caps_issued_mask+0x5c/0x200 [ceph]
Jun  5 20:01:56 boincai12 kernel: [<ffffffffc09b198d>] __ceph_do_getattr+0x10d/0x200 [ceph]
Jun  5 20:01:56 boincai12 kernel: [<ffffffffc09b1af8>] ceph_getattr+0x28/0x100 [ceph]
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84e470d9>] vfs_getattr+0x49/0x80
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84e47205>] vfs_fstatat+0x75/0xc0
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84e475ae>] SYSC_newstat+0x2e/0x60
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84d38e96>] ? __audit_syscall_exit+0x1e6/0x280
Jun  5 20:01:56 boincai12 kernel: [<ffffffff84e47a6e>] SyS_newstat+0xe/0x10
Jun  5 20:01:56 boincai12 kernel: [<ffffffff85375d9b>] system_call_fastpath+0x22/0x27
Jun  5 20:01:56 boincai12 kernel: Code: 65 48 8b 05 b5 21 ca 7a 48 89 c2 b9 48 00 00 00 48 c1 ea 20 0f 30 eb 85 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 fb f4 <5d> c3 0f 1f 84 00 00 00 00 00 55 48 89 e5 f4 5d c3 90 90 90 90 

The machines which recovered correctly just showed:

[Wed Jun  5 19:53:35 2019] libceph: mds4 137.138.44.130:6800 socket closed (con state OPEN)
[Wed Jun  5 19:53:36 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[Wed Jun  5 19:53:36 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address
[Wed Jun  5 19:53:37 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[Wed Jun  5 19:53:37 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address
[Wed Jun  5 19:53:38 2019] libceph: wrong peer, want 137.138.44.130:6800/2077487243, got 137.138.44.130:6800/-2088078627
[Wed Jun  5 19:53:38 2019] libceph: mds4 137.138.44.130:6800 wrong peer at address
[Wed Jun  5 19:53:44 2019] ceph: mds4 reconnect start
[Wed Jun  5 19:53:44 2019] ceph: mds4 reconnect success
[Wed Jun  5 19:54:50 2019] ceph: mds4 recovery completed

This is the first time we had kernel clients lockup due to an mds crash. We only recently upgraded to v12.2.12; could this be related?

Cheers, Dan

[1] v12.2.12 mds crash:

2019-06-05 19:54:44.765660 7fda87f8e700  0 -- 137.138.44.130:6800/2077487243 >> 188.184.94.18
2:0/1391033891 conn(0x55b1c1d7d800 :6800 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=12
303575 cs=41 l=0).process missed message?  skipped from seq 0 to 192910665
2019-06-05 19:54:44.769521 7fda8580e700 -1 /builddir/build/BUILD/ceph-12.2.12/src/mds/MDSRank
.cc: In function 'Session* MDSRank::get_session(Message*)' thread 7fda8580e700 time 2019-06-0
5 19:54:44.765758
/builddir/build/BUILD/ceph-12.2.12/src/mds/MDSRank.cc: 1304: FAILED assert(session->get_nref() == 1)

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55b1af348590]
 2: (MDSRank::get_session(Message*)+0x7af) [0x55b1af00760f]
 3: (Server::handle_client_session(MClientSession*)+0x32) [0x55b1af060ba2]
 4: (Server::dispatch(Message*)+0x5cb) [0x55b1af0932bb]
 5: (MDSRank::handle_deferrable_message(Message*)+0x804) [0x55b1aefff044]
 6: (MDSRank::_dispatch(Message*, bool)+0x1e3) [0x55b1af016d73]
 7: (MDSRankDispatcher::ms_dispatch(Message*)+0xa0) [0x55b1af017d40]
 8: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55b1aeff5e63]
 9: (DispatchQueue::entry()+0x792) [0x55b1af64c9e2]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b1af3d038d]
 11: (()+0x7dd5) [0x7fda8a644dd5]
 12: (clone()+0x6d) [0x7fda89721ead]

History

#2 Updated by Zheng Yan 5 months ago

looks like cap->session is wild pointer

#3 Updated by Zheng Yan 5 months ago

yes. it's related. mds crash while importing caps reveals a kernel bug

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 0c84b74ee34..d8e92557221 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1020,6 +1020,7 @@ __open_export_target_session(struct ceph_mds_client *mdsc, int target)
        if (session->s_state == CEPH_MDS_SESSION_NEW ||
            session->s_state == CEPH_MDS_SESSION_CLOSING)
                __open_session(mdsc, session);
+               ^^^

        return session;
 }
@@ -3262,6 +3263,8 @@ static void check_new_map(struct ceph_mds_client *mdsc,
                                 * out now */
                                get_session(s);
                                __unregister_session(mdsc, s);
+                               // the session may have caps (which were added by handle_cap_import)
+                               // after __unregister_session(), cap->session pointer become wild
                                __wake_requests(mdsc, &s->s_waiting);
                                ceph_put_mds_session(s);
                        } else if (i >= newmap->m_num_mds) {

#4 Updated by Zheng Yan 5 months ago

  • Status changed from New to Testing

Fixed by commit "ceph: don't blindly unregister session that is in opening state" in https://github.com/ceph/ceph-client/tree/testing

Also available in: Atom PDF