Project

General

Profile

Actions

Bug #11523

closed

soft lockup in kick_requests on 3.16.3-ceph-00306-g76c7fd1

Added by Sage Weil almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

May  1 09:12:54 teuthology kernel: [3967183.215527] libceph: osd113 up
May  1 09:13:15 teuthology kernel: [3967204.537993] libceph: osd113 10.214.132.114:6820 socket closed (con state OPEN)
May  1 09:13:26 teuthology kernel: [3967215.624290] libceph: osd113 down
May  1 09:13:30 teuthology kernel: [3967219.883936] libceph: osd113 up
May  1 09:15:01 teuthology kernel: [3967310.661801] libceph: osd113 10.214.132.114:6803 socket closed (con state OPEN)
May  1 09:15:11 teuthology kernel: [3967320.407148] libceph: osd113 down
May  1 09:15:15 teuthology kernel: [3967323.926475] libceph: osd113 up
May  1 09:15:50 teuthology kernel: [3967358.980888] libceph: osd113 down
May  1 09:15:50 teuthology kernel: [3967358.986719] libceph: osd113 up
May  1 09:16:25 teuthology kernel: [3967394.016842] libceph: osd113 down
May  1 09:16:25 teuthology kernel: [3967394.021316] libceph: osd113 up
May  1 09:17:05 teuthology kernel: [3967434.066645] libceph: wrong peer, want 10.214.132.114:6801/12973, got 10.214.132.114:6801/14384
May  1 09:17:05 teuthology kernel: [3967434.074065] libceph: osd113 down
May  1 09:17:05 teuthology kernel: [3967434.079141] libceph: osd113 10.214.132.114:6801 socket error on read
May  1 09:17:05 teuthology kernel: [3967434.090187] libceph: osd113 up
May  1 09:17:30 teuthology kernel: [3967458.962655] libceph: osd113 10.214.132.114:6801 socket closed (con state OPEN)
May  1 09:17:40 teuthology kernel: [3967468.938897] libceph: osd113 down
May  1 09:18:05 teuthology kernel: [3967493.392616] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:4894]
May  1 09:18:05 teuthology kernel: [3967493.399420] Modules linked in: ipmi_devintf(E) ipmi_si(E) ipmi_msghandler(E) ip6table_filter(E) ip6_tables(E)
 ebtable_nat(E) ebtables(E) ipt_MASQUERADE(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_state(E) nf_conntrack
(E) ipt_REJECT(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E) iptable_filter(E) ip_tables(E) x_tables(E) bridge(E) stp(E) llc(E) ceph(E) libceph(E)
 fscache(E) coretemp(E) kvm_intel(E) kvm(E) gpio_ich(E) i7core_edac(E) microcode(E) joydev(E) psmouse(E) serio_raw(E) edac_core(E) shpchp(E) tpm_infi
neon(E) lpc_ich(E) tpm_tis(E) xfs(E) lp(E) parport(E) hid_generic(E) btrfs(E) usbhid(E) hid(E) e1000e(E) raid6_pq(E) ahci(E) ptp(E) libahci(E) pps_co
re(E) arcmsr(E) xor(E) libcrc32c(E)
May  1 09:18:05 teuthology kernel: [3967493.466624] irq event stamp: 0
May  1 09:18:05 teuthology kernel: [3967493.469911] hardirqs last  enabled at (0): [<          (null)>]           (null)
May  1 09:18:05 teuthology kernel: [3967493.477765] hardirqs last disabled at (0): [<ffffffff8105626d>] copy_process+0x78d/0x22f0
May  1 09:18:05 teuthology kernel: [3967493.486306] softirqs last  enabled at (0): [<ffffffff8105626d>] copy_process+0x78d/0x22f0
May  1 09:18:05 teuthology kernel: [3967493.494883] softirqs last disabled at (0): [<          (null)>]           (null)
May  1 09:18:05 teuthology kernel: [3967493.502583] CPU: 0 PID: 4894 Comm: kworker/0:1 Tainted: G            E 3.16.3-ceph-00306-g76c7fd1 #1
May  1 09:18:05 teuthology kernel: [3967493.512035] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.1 05/27/2010
May  1 09:18:05 teuthology kernel: [3967493.518811] Workqueue: ceph-msgr con_work [libceph]
May  1 09:18:05 teuthology kernel: [3967493.524019] task: ffff88017e5f43e0 ti: ffff880161f18000 task.ti: ffff880161f18000
May  1 09:18:05 teuthology kernel: [3967493.531834] RIP: 0010:[<ffffffff813a4227>]  [<ffffffff813a4227>] rb_next+0x27/0x50
May  1 09:18:05 teuthology kernel: [3967493.539753] RSP: 0000:ffff880161f1bac8  EFLAGS: 00000286
May  1 09:18:05 teuthology kernel: [3967493.545511] RAX: ffff880131d96018 RBX: ffffffff8172aa60 RCX: 0000000000000087
May  1 09:18:05 teuthology kernel: [3967493.553056] RDX: ffff880131d96018 RSI: ffff880036a87960 RDI: ffff88013617e018
May  1 09:18:05 teuthology kernel: [3967493.560461] RBP: ffff880161f1bac8 R08: 0000000000000000 R09: 0000000000000000
May  1 09:18:05 teuthology kernel: [3967493.567937] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880161f1ba38
May  1 09:18:05 teuthology kernel: [3967493.575337] R13: ffff88043fc13b40 R14: ffff880161f18000 R15: 0000000000000001
May  1 09:18:05 teuthology kernel: [3967493.582776] FS:  0000000000000000(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
May  1 09:18:05 teuthology kernel: [3967493.591267] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May  1 09:18:05 teuthology kernel: [3967493.597313] CR2: 00007fe143499000 CR3: 000000018e3a2000 CR4: 00000000000007f0
May  1 09:18:05 teuthology kernel: [3967493.604770] Stack:
May  1 09:18:05 teuthology kernel: [3967493.606988]  ffff880161f1bb28 ffffffffa040a8cb 28ff880427bfa968 aefd4f58557e42f7
May  1 09:18:05 teuthology kernel: [3967493.614735]  ffff880427bfaa50 000043d200000000 ffff880161f1bb28 ffff8802d2022ddb
May  1 09:18:05 teuthology kernel: [3967493.622557]  ffff880427bfa968 ffff8802d2022dcf 0000000000000001 ffff8804250ae800
May  1 09:18:05 teuthology kernel: [3967493.630336] Call Trace:
May  1 09:18:05 teuthology kernel: [3967493.633084]  [<ffffffffa040a8cb>] kick_requests+0x1eb/0x480 [libceph]
May  1 09:18:05 teuthology kernel: [3967493.639834]  [<ffffffffa040b68a>] ceph_osdc_handle_map+0x26a/0x600 [libceph]
May  1 09:18:05 teuthology kernel: [3967493.647376]  [<ffffffffa04067d0>] dispatch+0x110/0x8f0 [libceph]
May  1 09:18:05 teuthology kernel: [3967493.653784]  [<ffffffff810b087d>] ? trace_hardirqs_on+0xd/0x10
May  1 09:18:05 teuthology kernel: [3967493.659859]  [<ffffffffa04024e5>] con_work+0x17d5/0x2c60 [libceph]
May  1 09:18:05 teuthology kernel: [3967493.666335]  [<ffffffff810879cf>] ? finish_task_switch+0x3f/0x120
May  1 09:18:05 teuthology kernel: [3967493.672673]  [<ffffffff810879cf>] ? finish_task_switch+0x3f/0x120
May  1 09:18:05 teuthology kernel: [3967493.679145]  [<ffffffff8107622f>] ? process_one_work+0x16f/0x570
May  1 09:18:05 teuthology kernel: [3967493.685436]  [<ffffffff81076291>] process_one_work+0x1d1/0x570
May  1 09:18:05 teuthology kernel: [3967493.691628]  [<ffffffff8107622f>] ? process_one_work+0x16f/0x570
May  1 09:18:05 teuthology kernel: [3967493.697903]  [<ffffffff810770cc>] worker_thread+0x11c/0x530
May  1 09:18:05 teuthology kernel: [3967493.703754]  [<ffffffff81076fb0>] ? create_and_start_worker+0x50/0x50
May  1 09:18:05 teuthology kernel: [3967493.710552]  [<ffffffff8107e4b4>] kthread+0xe4/0x100
May  1 09:18:05 teuthology kernel: [3967493.715778]  [<ffffffff8107e3d0>] ? flush_kthread_worker+0x130/0x130
May  1 09:18:05 teuthology kernel: [3967493.722388]  [<ffffffff81729c2c>] ret_from_fork+0x7c/0xb0
May  1 09:18:05 teuthology kernel: [3967493.727999]  [<ffffffff8107e3d0>] ? flush_kthread_worker+0x130/0x130

Linux teuthology 3.16.3-ceph-00306-g76c7fd1 #1 SMP Fri Sep 19 04:59:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Actions #1

Updated by Ilya Dryomov almost 9 years ago

This was fixed in 4.0-rc1 and set to be backported all the way down to 3.9 - commit 7eb71e0351fb "libceph: fix double __remove_osd() problem".
If we are serious about the dog-fooding effort, we should be running latest kernels, e.g. 4.0 as of now.

Actions #2

Updated by Sage Weil almost 9 years ago

  • Status changed from New to Resolved

fixed

Actions

Also available in: Atom PDF