Actions
Bug #11523
closedsoft lockup in kick_requests on 3.16.3-ceph-00306-g76c7fd1
Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
May 1 09:12:54 teuthology kernel: [3967183.215527] libceph: osd113 up May 1 09:13:15 teuthology kernel: [3967204.537993] libceph: osd113 10.214.132.114:6820 socket closed (con state OPEN) May 1 09:13:26 teuthology kernel: [3967215.624290] libceph: osd113 down May 1 09:13:30 teuthology kernel: [3967219.883936] libceph: osd113 up May 1 09:15:01 teuthology kernel: [3967310.661801] libceph: osd113 10.214.132.114:6803 socket closed (con state OPEN) May 1 09:15:11 teuthology kernel: [3967320.407148] libceph: osd113 down May 1 09:15:15 teuthology kernel: [3967323.926475] libceph: osd113 up May 1 09:15:50 teuthology kernel: [3967358.980888] libceph: osd113 down May 1 09:15:50 teuthology kernel: [3967358.986719] libceph: osd113 up May 1 09:16:25 teuthology kernel: [3967394.016842] libceph: osd113 down May 1 09:16:25 teuthology kernel: [3967394.021316] libceph: osd113 up May 1 09:17:05 teuthology kernel: [3967434.066645] libceph: wrong peer, want 10.214.132.114:6801/12973, got 10.214.132.114:6801/14384 May 1 09:17:05 teuthology kernel: [3967434.074065] libceph: osd113 down May 1 09:17:05 teuthology kernel: [3967434.079141] libceph: osd113 10.214.132.114:6801 socket error on read May 1 09:17:05 teuthology kernel: [3967434.090187] libceph: osd113 up May 1 09:17:30 teuthology kernel: [3967458.962655] libceph: osd113 10.214.132.114:6801 socket closed (con state OPEN) May 1 09:17:40 teuthology kernel: [3967468.938897] libceph: osd113 down May 1 09:18:05 teuthology kernel: [3967493.392616] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:4894] May 1 09:18:05 teuthology kernel: [3967493.399420] Modules linked in: ipmi_devintf(E) ipmi_si(E) ipmi_msghandler(E) ip6table_filter(E) ip6_tables(E) ebtable_nat(E) ebtables(E) ipt_MASQUERADE(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_state(E) nf_conntrack (E) ipt_REJECT(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E) iptable_filter(E) ip_tables(E) x_tables(E) bridge(E) stp(E) llc(E) ceph(E) libceph(E) fscache(E) coretemp(E) kvm_intel(E) kvm(E) gpio_ich(E) i7core_edac(E) microcode(E) joydev(E) psmouse(E) serio_raw(E) edac_core(E) shpchp(E) tpm_infi neon(E) lpc_ich(E) tpm_tis(E) xfs(E) lp(E) parport(E) hid_generic(E) btrfs(E) usbhid(E) hid(E) e1000e(E) raid6_pq(E) ahci(E) ptp(E) libahci(E) pps_co re(E) arcmsr(E) xor(E) libcrc32c(E) May 1 09:18:05 teuthology kernel: [3967493.466624] irq event stamp: 0 May 1 09:18:05 teuthology kernel: [3967493.469911] hardirqs last enabled at (0): [< (null)>] (null) May 1 09:18:05 teuthology kernel: [3967493.477765] hardirqs last disabled at (0): [<ffffffff8105626d>] copy_process+0x78d/0x22f0 May 1 09:18:05 teuthology kernel: [3967493.486306] softirqs last enabled at (0): [<ffffffff8105626d>] copy_process+0x78d/0x22f0 May 1 09:18:05 teuthology kernel: [3967493.494883] softirqs last disabled at (0): [< (null)>] (null) May 1 09:18:05 teuthology kernel: [3967493.502583] CPU: 0 PID: 4894 Comm: kworker/0:1 Tainted: G E 3.16.3-ceph-00306-g76c7fd1 #1 May 1 09:18:05 teuthology kernel: [3967493.512035] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.1 05/27/2010 May 1 09:18:05 teuthology kernel: [3967493.518811] Workqueue: ceph-msgr con_work [libceph] May 1 09:18:05 teuthology kernel: [3967493.524019] task: ffff88017e5f43e0 ti: ffff880161f18000 task.ti: ffff880161f18000 May 1 09:18:05 teuthology kernel: [3967493.531834] RIP: 0010:[<ffffffff813a4227>] [<ffffffff813a4227>] rb_next+0x27/0x50 May 1 09:18:05 teuthology kernel: [3967493.539753] RSP: 0000:ffff880161f1bac8 EFLAGS: 00000286 May 1 09:18:05 teuthology kernel: [3967493.545511] RAX: ffff880131d96018 RBX: ffffffff8172aa60 RCX: 0000000000000087 May 1 09:18:05 teuthology kernel: [3967493.553056] RDX: ffff880131d96018 RSI: ffff880036a87960 RDI: ffff88013617e018 May 1 09:18:05 teuthology kernel: [3967493.560461] RBP: ffff880161f1bac8 R08: 0000000000000000 R09: 0000000000000000 May 1 09:18:05 teuthology kernel: [3967493.567937] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880161f1ba38 May 1 09:18:05 teuthology kernel: [3967493.575337] R13: ffff88043fc13b40 R14: ffff880161f18000 R15: 0000000000000001 May 1 09:18:05 teuthology kernel: [3967493.582776] FS: 0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 May 1 09:18:05 teuthology kernel: [3967493.591267] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 1 09:18:05 teuthology kernel: [3967493.597313] CR2: 00007fe143499000 CR3: 000000018e3a2000 CR4: 00000000000007f0 May 1 09:18:05 teuthology kernel: [3967493.604770] Stack: May 1 09:18:05 teuthology kernel: [3967493.606988] ffff880161f1bb28 ffffffffa040a8cb 28ff880427bfa968 aefd4f58557e42f7 May 1 09:18:05 teuthology kernel: [3967493.614735] ffff880427bfaa50 000043d200000000 ffff880161f1bb28 ffff8802d2022ddb May 1 09:18:05 teuthology kernel: [3967493.622557] ffff880427bfa968 ffff8802d2022dcf 0000000000000001 ffff8804250ae800 May 1 09:18:05 teuthology kernel: [3967493.630336] Call Trace: May 1 09:18:05 teuthology kernel: [3967493.633084] [<ffffffffa040a8cb>] kick_requests+0x1eb/0x480 [libceph] May 1 09:18:05 teuthology kernel: [3967493.639834] [<ffffffffa040b68a>] ceph_osdc_handle_map+0x26a/0x600 [libceph] May 1 09:18:05 teuthology kernel: [3967493.647376] [<ffffffffa04067d0>] dispatch+0x110/0x8f0 [libceph] May 1 09:18:05 teuthology kernel: [3967493.653784] [<ffffffff810b087d>] ? trace_hardirqs_on+0xd/0x10 May 1 09:18:05 teuthology kernel: [3967493.659859] [<ffffffffa04024e5>] con_work+0x17d5/0x2c60 [libceph] May 1 09:18:05 teuthology kernel: [3967493.666335] [<ffffffff810879cf>] ? finish_task_switch+0x3f/0x120 May 1 09:18:05 teuthology kernel: [3967493.672673] [<ffffffff810879cf>] ? finish_task_switch+0x3f/0x120 May 1 09:18:05 teuthology kernel: [3967493.679145] [<ffffffff8107622f>] ? process_one_work+0x16f/0x570 May 1 09:18:05 teuthology kernel: [3967493.685436] [<ffffffff81076291>] process_one_work+0x1d1/0x570 May 1 09:18:05 teuthology kernel: [3967493.691628] [<ffffffff8107622f>] ? process_one_work+0x16f/0x570 May 1 09:18:05 teuthology kernel: [3967493.697903] [<ffffffff810770cc>] worker_thread+0x11c/0x530 May 1 09:18:05 teuthology kernel: [3967493.703754] [<ffffffff81076fb0>] ? create_and_start_worker+0x50/0x50 May 1 09:18:05 teuthology kernel: [3967493.710552] [<ffffffff8107e4b4>] kthread+0xe4/0x100 May 1 09:18:05 teuthology kernel: [3967493.715778] [<ffffffff8107e3d0>] ? flush_kthread_worker+0x130/0x130 May 1 09:18:05 teuthology kernel: [3967493.722388] [<ffffffff81729c2c>] ret_from_fork+0x7c/0xb0 May 1 09:18:05 teuthology kernel: [3967493.727999] [<ffffffff8107e3d0>] ? flush_kthread_worker+0x130/0x130Linux teuthology 3.16.3-ceph-00306-g76c7fd1 #1 SMP Fri Sep 19 04:59:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Updated by Ilya Dryomov about 9 years ago
This was fixed in 4.0-rc1 and set to be backported all the way down to 3.9 - commit 7eb71e0351fb "libceph: fix double __remove_osd() problem".
If we are serious about the dog-fooding effort, we should be running latest kernels, e.g. 4.0 as of now.
Actions