Bug #2708
spinlock lockup in queue_con, queue_work
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
[40494.962428] m~B [send break] [40505.554625] mptscsih: ioc0: attempting task abort! (sc=ffff88020c00b400) [40505.598241] sd 0:0:0:0: [sda] CDB: [40505.638379] Test Unit Ready: 00 00 00 00 00 00 [40508.696718] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000 [40508.777030] mptbase: ioc0: Initiating recovery [40517.703845] BUG: spinlock lockup on CPU#0, kworker/0:0/17445 [40517.745786] lock: 0xffff88022720e480, .magic: dead4ead, .owner: kworker/0:0/17445, .owner_cpu: 0 [40517.826674] Pid: 17445, comm: kworker/0:0 Tainted: G D 3.5.0-rc1-ceph-00030-g9a64e8e #1 [40517.907466] Call Trace: [40517.945236] <IRQ> [<ffffffff81330f98>] spin_dump+0x78/0xc0 [40517.986573] [<ffffffff813311bd>] do_raw_spin_lock+0xed/0x120 [40518.027404] [<ffffffff8162d586>] _raw_spin_lock_irqsave+0x56/0x70 [40518.068299] [<ffffffff8106e78d>] ? __queue_work+0x5d/0x430 [40518.107988] [<ffffffff810622bc>] ? run_timer_softirq+0x21c/0x400 [40518.148330] [<ffffffff8106e78d>] __queue_work+0x5d/0x430 [40518.187398] [<ffffffff810621ca>] ? run_timer_softirq+0x12a/0x400 [40518.226457] [<ffffffff8106ebc5>] queue_work_on+0x25/0x40 [40518.264080] [<ffffffff8106ed4f>] queue_work+0x1f/0x30 [40518.300565] [<ffffffff8106ed78>] schedule_work+0x18/0x20 [40518.337280] [<ffffffff8136e281>] cursor_timer_handler+0x21/0x40 [40518.374139] [<ffffffff81062269>] run_timer_softirq+0x1c9/0x400 [40518.410212] [<ffffffff810621ca>] ? run_timer_softirq+0x12a/0x400 [40518.445941] [<ffffffff8136e260>] ? store_cursor_blink+0xc0/0xc0 [40518.481035] [<ffffffff81059ddf>] __do_softirq+0xcf/0x220 [40518.515487] [<ffffffff8107b648>] ? hrtimer_interrupt+0x158/0x250 [40518.550255] [<ffffffff8163762c>] call_softirq+0x1c/0x30 [40518.583461] [<ffffffff8101633d>] do_softirq+0x9d/0xd0 [40518.615763] [<ffffffff81059b55>] irq_exit+0xd5/0xf0 [40518.647253] [<ffffffff81637fae>] smp_apic_timer_interrupt+0x6e/0x99 [40518.680741] [<ffffffff81636c2f>] apic_timer_interrupt+0x6f/0x80 [40518.713164] <EOI> [<ffffffff810beb1d>] ? acct_collect+0xad/0x1b0 [40518.745566] [<ffffffff8162dc14>] ? _raw_spin_unlock_irq+0x34/0x40 [40518.778181] [<ffffffff8162dc10>] ? _raw_spin_unlock_irq+0x30/0x40 [40518.809923] [<ffffffff810bebe3>] acct_collect+0x173/0x1b0 [40518.840339] [<ffffffff810569d4>] do_exit+0x824/0x940 [40518.869445] [<ffffffff8105297e>] ? kmsg_dump+0x11e/0x170 [40518.898312] [<ffffffff810528dd>] ? kmsg_dump+0x7d/0x170 [40518.926176] [<ffffffff8162eee0>] oops_end+0xb0/0xf0 [40518.953728] [<ffffffff810438bd>] no_context+0x11d/0x2d0 [40518.981323] [<ffffffff81043bbd>] __bad_area_nosemaphore+0x14d/0x230 [40519.010151] [<ffffffff8162aa39>] ? __mutex_unlock_slowpath+0xd9/0x180 [40519.039221] [<ffffffff81043cb3>] bad_area_nosemaphore+0x13/0x20 [40519.067740] [<ffffffff81631bae>] do_page_fault+0x34e/0x4b0 [40519.095762] [<ffffffff8162aaee>] ? mutex_unlock+0xe/0x10 [40519.123466] [<ffffffff8132afad>] ? trace_hardirqs_off_thunk+0x3a/0x3c [40519.152634] [<ffffffff8162e255>] page_fault+0x25/0x30 [40519.180433] [<ffffffff8106f0fb>] ? process_one_work+0x3b/0x530 [40519.209358] [<ffffffff8106f207>] ? process_one_work+0x147/0x530 [40519.238164] [<ffffffffa040d770>] ? ceph_msg_revoke_incoming+0x180/0x180 [libceph] [40519.291578] [<ffffffff810715b3>] worker_thread+0x173/0x400 [40519.321184] [<ffffffff81071440>] ? manage_workers+0x210/0x210 [40519.351120] [<ffffffff81076a9e>] kthread+0xbe/0xd0 [40519.379784] [<ffffffff81637534>] kernel_thread_helper+0x4/0x10 [40519.409634] [<ffffffff8162dfb0>] ? retint_restore_args+0x13/0x13 [40519.439580] [<ffffffff810769e0>] ? __init_kthread_worker+0x70/0x70 [40519.469801] [<ffffffff81637530>] ? gs_change+0x13/0x13 [40532.547947] mptbase: ioc0: Attempting Retry Config request type 0x1, page 0x2, action 0 [40532.548001] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88020c00b400) [40532.548008] mptscsih: ioc0: attempting target reset! (sc=ffff88020c00b400) [40532.548009] sd 0:0:0:0: [sda] CDB: [40532.548013] Write(10): 2a 00 1e c4 53 f0 00 00 08 00 [40540.150206] BUG: spinlock lockup on CPU#4, kworker/4:0/17 [40540.183238] lock: 0xffff88022720e480, .magic: dead4ead, .owner: kworker/0:0/17445, .owner_cpu: 0 [40540.249673] Pid: 17, comm: kworker/4:0 Tainted: G D 3.5.0-rc1-ceph-00030-g9a64e8e #1 [40540.318328] Call Trace: [40540.350117] [<ffffffff81330f98>] spin_dump+0x78/0xc0 [40540.385118] [<ffffffff813311bd>] do_raw_spin_lock+0xed/0x120 [40540.420846] [<ffffffff8162d586>] _raw_spin_lock_irqsave+0x56/0x70 [40540.457344] [<ffffffff8106e875>] ? __queue_work+0x145/0x430 [40540.493144] [<ffffffff8106e875>] __queue_work+0x145/0x430 [40540.528362] [<ffffffff8106ebc5>] queue_work_on+0x25/0x40 [40540.563230] [<ffffffff8106ed4f>] queue_work+0x1f/0x30 [40540.597818] [<ffffffff8106ee3d>] queue_delayed_work+0x2d/0x40 [40540.632828] [<ffffffffa040a8b1>] queue_con+0x31/0xc0 [libceph] [40540.667342] [<ffffffffa040bdc7>] ceph_con_close+0x97/0xd0 [libceph] [40540.701878] [<ffffffffa040fee2>] __close_session+0x32/0x90 [libceph] [40540.736712] [<ffffffffa0410de8>] delayed_work+0x88/0xb0 [libceph] [40540.771373] [<ffffffff8106f276>] process_one_work+0x1b6/0x530 [40540.805712] [<ffffffff8106f207>] ? process_one_work+0x147/0x530 [40540.840332] [<ffffffffa0410d60>] ? ceph_monc_request_next_osdmap+0x90/0x90 [libceph] [40540.905770] [<ffffffff810715b3>] worker_thread+0x173/0x400 [40540.940658] [<ffffffff81071440>] ? manage_workers+0x210/0x210 [40540.975198] [<ffffffff81076a9e>] kthread+0xbe/0xd0 [40541.007832] [<ffffffff81637534>] kernel_thread_helper+0x4/0x10 [40541.041102] [<ffffffff8162dfb0>] ? retint_restore_args+0x13/0x13 [40541.073901] [<ffffffff810769e0>] ? __init_kthread_worker+0x70/0x70 [40541.106460] [<ffffffff81637530>] ? gs_change+0x13/0x13 [40547.891421] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000 [40547.952225] mptbase: ioc0: Initiating recovery [40562.496137] mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x2c000000 [40570.422441] mptbase: ioc0: Attempting Retry Config request type 0x1, page 0x2, action 0 [40570.821746] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88020c00b400) [40580.834411] mptscsih: ioc0: attempting task abort! (sc=ffff88020c00b400) [40580.869874] sd 0:0:0:0: [sda] CDB: [40580.901848] Test Unit Ready: 00 00 00 00 00 00 [40585.396547] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000 [40585.464754] mptbase: ioc0: Initiating recovery [40608.506617] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88020c00b400) [40608.577389] mptscsih: ioc0: attempting host reset! (sc=ffff88020c560b00) the job was ubuntu@teuthology:/a/teuthology-2012-07-03_19:00:10-regression-master-testing-gcov/5473$ cat config.yaml kernel: &id001 kdb: true sha1: 84f5ea9e4cbb9fe39c525c0cf88a584f3080564d nuke-on-error: true overrides: ceph: coverage: true fs: btrfs log-whitelist: - slow request sha1: f6cdd8522397cac18a2eb485d6e38a1cf6d2872d roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - - client.0 targets: ubuntu@plana57.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuMOcu2XPQovy/Qzmwyvc9tvGP9JZVJ6cqiJ3RPOSGgAifKLTxe2ramHpD8AKcdthu8VAfouFpZK4CtBWKJowurR+4yZKgEugzvYuZ/nK/np56vreBQmRBWD1vLPtxPsTT3YGu5qx+ixdSwrSxexxc0/7+EW9x1D6knL+OGUNWksoGIRlXxjh9qafbw/1XKeQQF28vxBXHofXUFY8USMUcq5HDuaFfmgKzufH6vk84oqyr/jtGej6b4g6tbGiHPYR+o5tmTQHyxpOxqLZP2RFFqHlQ/QaOmRvSNIoOo+1UbqdcWsLk16/lXIS1mI+BZsZouk1H+fGeMTEUDGktiPW7 ubuntu@plana58.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDED+iOzDd09Q014u9BeiAiv+4SK+mv7u6GE8Oc59VBcshLVF8txB4BZaXO/OyQJL033i6d6OFvFdTKI8+7bmm1X+NM/7SPVVRpqaXhhnyB3TbMJi9Aa+Ak22fJ446H0Jf91q9di1m/IS8OUoVImwD8pSnTu7rHHy4ZeGkJpv/gpdXlhm0jq1s7d8Z3WpjPMYOoxilFz80gwh1eAM8emZKhqWdT/C+0SCUYLpU0EOdI+vvOEdEUeByg6xbB3Y6mOgJD2a6PREc0aZp/zxukBdPuY3yJzgT34b0WUuE5iu6ndnDDBoUD7fq7KlsFqrSHAICANls/A7PIPwD46DLaG6oN ubuntu@plana61.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTCMIScDTmD9NkfsWU7xeyZ+WOXai5izYeliiXDSjJC3bT6r8Fp+rhPfcHCVHiw++VsbvKZtkhjCSnJTVPWCdpRDghzJ3nZUBImWRo3PmHo1etQpCeimaOrIJ2q0ChN5jmSOqy5B+Z4om2vXBtBY6nkdTxDOr2+MH3NrSPkQSFB0zO+VPuwKXsemeUC6urb2IZZpxY3cxNq4fafTF9PROpgOnIA+o3igyU4duKEjnCzTHZjw/PL7Eph/7p6+UQgrUwe7pgVzT+2MM0zcBtBSXNqs3dCGmpvUapOkBlDoIX02EkWRNpkM3vfeFt1EFC17B5vd61Kg40bYUG8qWGR0T tasks: - internal.lock_machines: 3 - internal.save_config: null - internal.check_lock: null - internal.connect: null - internal.check_conflict: null - kernel: *id001 - internal.base: null - internal.archive: null - internal.coredump: null - internal.syslog: null - internal.timer: null - chef: null - clock: null - ceph: null - rbd.xfstests: null and it hung at 2012-07-03T20:05:48.938 INFO:teuthology.orchestra.run.out: 2012-07-03T20:06:09.659 INFO:teuthology.orchestra.run.out:001 10s 2012-07-03T20:06:12.238 INFO:teuthology.orchestra.run.out:002 1s 2012-07-03T20:06:15.527 INFO:teuthology.orchestra.run.out:003 0s 2012-07-03T20:06:31.224 INFO:teuthology.orchestra.run.out:004 13s 2012-07-03T20:06:35.969 INFO:teuthology.orchestra.run.out:005 0s
History
#1 Updated by Sage Weil about 11 years ago
- Status changed from 12 to Can't reproduce