Bug #2708: spinlock lockup in queue_con, queue_work - Linux kernel client - Ceph

Actions

Copy link

Bug #2708

closed

spinlock lockup in queue_con, queue_work

Added by Sage Weil almost 12 years ago. Updated almost 12 years ago.

Status:

Can't reproduce

Priority:

Urgent

Assignee:

Category:

libceph

Target version:

v3.5

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description


[40494.962428] m~B [send break]
[40505.554625] mptscsih: ioc0: attempting task abort! (sc=ffff88020c00b400)
[40505.598241] sd 0:0:0:0: [sda] CDB: 
[40505.638379] Test Unit Ready: 00 00 00 00 00 00
[40508.696718] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000
[40508.777030] mptbase: ioc0: Initiating recovery
[40517.703845] BUG: spinlock lockup on CPU#0, kworker/0:0/17445
[40517.745786]  lock: 0xffff88022720e480, .magic: dead4ead, .owner: kworker/0:0/17445, .owner_cpu: 0
[40517.826674] Pid: 17445, comm: kworker/0:0 Tainted: G      D      3.5.0-rc1-ceph-00030-g9a64e8e #1
[40517.907466] Call Trace:
[40517.945236]  <IRQ>  [<ffffffff81330f98>] spin_dump+0x78/0xc0
[40517.986573]  [<ffffffff813311bd>] do_raw_spin_lock+0xed/0x120
[40518.027404]  [<ffffffff8162d586>] _raw_spin_lock_irqsave+0x56/0x70
[40518.068299]  [<ffffffff8106e78d>] ? __queue_work+0x5d/0x430
[40518.107988]  [<ffffffff810622bc>] ? run_timer_softirq+0x21c/0x400
[40518.148330]  [<ffffffff8106e78d>] __queue_work+0x5d/0x430
[40518.187398]  [<ffffffff810621ca>] ? run_timer_softirq+0x12a/0x400
[40518.226457]  [<ffffffff8106ebc5>] queue_work_on+0x25/0x40
[40518.264080]  [<ffffffff8106ed4f>] queue_work+0x1f/0x30
[40518.300565]  [<ffffffff8106ed78>] schedule_work+0x18/0x20
[40518.337280]  [<ffffffff8136e281>] cursor_timer_handler+0x21/0x40
[40518.374139]  [<ffffffff81062269>] run_timer_softirq+0x1c9/0x400
[40518.410212]  [<ffffffff810621ca>] ? run_timer_softirq+0x12a/0x400
[40518.445941]  [<ffffffff8136e260>] ? store_cursor_blink+0xc0/0xc0
[40518.481035]  [<ffffffff81059ddf>] __do_softirq+0xcf/0x220
[40518.515487]  [<ffffffff8107b648>] ? hrtimer_interrupt+0x158/0x250
[40518.550255]  [<ffffffff8163762c>] call_softirq+0x1c/0x30
[40518.583461]  [<ffffffff8101633d>] do_softirq+0x9d/0xd0
[40518.615763]  [<ffffffff81059b55>] irq_exit+0xd5/0xf0
[40518.647253]  [<ffffffff81637fae>] smp_apic_timer_interrupt+0x6e/0x99
[40518.680741]  [<ffffffff81636c2f>] apic_timer_interrupt+0x6f/0x80
[40518.713164]  <EOI>  [<ffffffff810beb1d>] ? acct_collect+0xad/0x1b0
[40518.745566]  [<ffffffff8162dc14>] ? _raw_spin_unlock_irq+0x34/0x40
[40518.778181]  [<ffffffff8162dc10>] ? _raw_spin_unlock_irq+0x30/0x40
[40518.809923]  [<ffffffff810bebe3>] acct_collect+0x173/0x1b0
[40518.840339]  [<ffffffff810569d4>] do_exit+0x824/0x940
[40518.869445]  [<ffffffff8105297e>] ? kmsg_dump+0x11e/0x170
[40518.898312]  [<ffffffff810528dd>] ? kmsg_dump+0x7d/0x170
[40518.926176]  [<ffffffff8162eee0>] oops_end+0xb0/0xf0
[40518.953728]  [<ffffffff810438bd>] no_context+0x11d/0x2d0
[40518.981323]  [<ffffffff81043bbd>] __bad_area_nosemaphore+0x14d/0x230
[40519.010151]  [<ffffffff8162aa39>] ? __mutex_unlock_slowpath+0xd9/0x180
[40519.039221]  [<ffffffff81043cb3>] bad_area_nosemaphore+0x13/0x20
[40519.067740]  [<ffffffff81631bae>] do_page_fault+0x34e/0x4b0
[40519.095762]  [<ffffffff8162aaee>] ? mutex_unlock+0xe/0x10
[40519.123466]  [<ffffffff8132afad>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[40519.152634]  [<ffffffff8162e255>] page_fault+0x25/0x30
[40519.180433]  [<ffffffff8106f0fb>] ? process_one_work+0x3b/0x530
[40519.209358]  [<ffffffff8106f207>] ? process_one_work+0x147/0x530
[40519.238164]  [<ffffffffa040d770>] ? ceph_msg_revoke_incoming+0x180/0x180 [libceph]
[40519.291578]  [<ffffffff810715b3>] worker_thread+0x173/0x400
[40519.321184]  [<ffffffff81071440>] ? manage_workers+0x210/0x210
[40519.351120]  [<ffffffff81076a9e>] kthread+0xbe/0xd0
[40519.379784]  [<ffffffff81637534>] kernel_thread_helper+0x4/0x10
[40519.409634]  [<ffffffff8162dfb0>] ? retint_restore_args+0x13/0x13
[40519.439580]  [<ffffffff810769e0>] ? __init_kthread_worker+0x70/0x70
[40519.469801]  [<ffffffff81637530>] ? gs_change+0x13/0x13
[40532.547947] mptbase: ioc0: Attempting Retry Config request type 0x1, page 0x2, action 0
[40532.548001] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88020c00b400)
[40532.548008] mptscsih: ioc0: attempting target reset! (sc=ffff88020c00b400)
[40532.548009] sd 0:0:0:0: [sda] CDB: 
[40532.548013] Write(10): 2a 00 1e c4 53 f0 00 00 08 00
[40540.150206] BUG: spinlock lockup on CPU#4, kworker/4:0/17
[40540.183238]  lock: 0xffff88022720e480, .magic: dead4ead, .owner: kworker/0:0/17445, .owner_cpu: 0
[40540.249673] Pid: 17, comm: kworker/4:0 Tainted: G      D      3.5.0-rc1-ceph-00030-g9a64e8e #1
[40540.318328] Call Trace:
[40540.350117]  [<ffffffff81330f98>] spin_dump+0x78/0xc0
[40540.385118]  [<ffffffff813311bd>] do_raw_spin_lock+0xed/0x120
[40540.420846]  [<ffffffff8162d586>] _raw_spin_lock_irqsave+0x56/0x70
[40540.457344]  [<ffffffff8106e875>] ? __queue_work+0x145/0x430
[40540.493144]  [<ffffffff8106e875>] __queue_work+0x145/0x430
[40540.528362]  [<ffffffff8106ebc5>] queue_work_on+0x25/0x40
[40540.563230]  [<ffffffff8106ed4f>] queue_work+0x1f/0x30
[40540.597818]  [<ffffffff8106ee3d>] queue_delayed_work+0x2d/0x40
[40540.632828]  [<ffffffffa040a8b1>] queue_con+0x31/0xc0 [libceph]
[40540.667342]  [<ffffffffa040bdc7>] ceph_con_close+0x97/0xd0 [libceph]
[40540.701878]  [<ffffffffa040fee2>] __close_session+0x32/0x90 [libceph]
[40540.736712]  [<ffffffffa0410de8>] delayed_work+0x88/0xb0 [libceph]
[40540.771373]  [<ffffffff8106f276>] process_one_work+0x1b6/0x530
[40540.805712]  [<ffffffff8106f207>] ? process_one_work+0x147/0x530
[40540.840332]  [<ffffffffa0410d60>] ? ceph_monc_request_next_osdmap+0x90/0x90 [libceph]
[40540.905770]  [<ffffffff810715b3>] worker_thread+0x173/0x400
[40540.940658]  [<ffffffff81071440>] ? manage_workers+0x210/0x210
[40540.975198]  [<ffffffff81076a9e>] kthread+0xbe/0xd0
[40541.007832]  [<ffffffff81637534>] kernel_thread_helper+0x4/0x10
[40541.041102]  [<ffffffff8162dfb0>] ? retint_restore_args+0x13/0x13
[40541.073901]  [<ffffffff810769e0>] ? __init_kthread_worker+0x70/0x70
[40541.106460]  [<ffffffff81637530>] ? gs_change+0x13/0x13
[40547.891421] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000
[40547.952225] mptbase: ioc0: Initiating recovery
[40562.496137] mptscsih: ioc0: WARNING - Issuing Reset from mptscsih_IssueTaskMgmt!! doorbell=0x2c000000
[40570.422441] mptbase: ioc0: Attempting Retry Config request type 0x1, page 0x2, action 0
[40570.821746] mptscsih: ioc0: target reset: SUCCESS (sc=ffff88020c00b400)
[40580.834411] mptscsih: ioc0: attempting task abort! (sc=ffff88020c00b400)
[40580.869874] sd 0:0:0:0: [sda] CDB: 
[40580.901848] Test Unit Ready: 00 00 00 00 00 00
[40585.396547] mptbase: ioc0: WARNING - Issuing Reset from mpt_config!!, doorbell=0x24000000
[40585.464754] mptbase: ioc0: Initiating recovery
[40608.506617] mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff88020c00b400)
[40608.577389] mptscsih: ioc0: attempting host reset! (sc=ffff88020c560b00)

the job was

ubuntu@teuthology:/a/teuthology-2012-07-03_19:00:10-regression-master-testing-gcov/5473$ cat config.yaml 
kernel: &id001
  kdb: true
  sha1: 84f5ea9e4cbb9fe39c525c0cf88a584f3080564d
nuke-on-error: true
overrides:
  ceph:
    coverage: true
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: f6cdd8522397cac18a2eb485d6e38a1cf6d2872d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
targets:
  ubuntu@plana57.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCuMOcu2XPQovy/Qzmwyvc9tvGP9JZVJ6cqiJ3RPOSGgAifKLTxe2ramHpD8AKcdthu8VAfouFpZK4CtBWKJowurR+4yZKgEugzvYuZ/nK/np56vreBQmRBWD1vLPtxPsTT3YGu5qx+ixdSwrSxexxc0/7+EW9x1D6knL+OGUNWksoGIRlXxjh9qafbw/1XKeQQF28vxBXHofXUFY8USMUcq5HDuaFfmgKzufH6vk84oqyr/jtGej6b4g6tbGiHPYR+o5tmTQHyxpOxqLZP2RFFqHlQ/QaOmRvSNIoOo+1UbqdcWsLk16/lXIS1mI+BZsZouk1H+fGeMTEUDGktiPW7
  ubuntu@plana58.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDED+iOzDd09Q014u9BeiAiv+4SK+mv7u6GE8Oc59VBcshLVF8txB4BZaXO/OyQJL033i6d6OFvFdTKI8+7bmm1X+NM/7SPVVRpqaXhhnyB3TbMJi9Aa+Ak22fJ446H0Jf91q9di1m/IS8OUoVImwD8pSnTu7rHHy4ZeGkJpv/gpdXlhm0jq1s7d8Z3WpjPMYOoxilFz80gwh1eAM8emZKhqWdT/C+0SCUYLpU0EOdI+vvOEdEUeByg6xbB3Y6mOgJD2a6PREc0aZp/zxukBdPuY3yJzgT34b0WUuE5iu6ndnDDBoUD7fq7KlsFqrSHAICANls/A7PIPwD46DLaG6oN
  ubuntu@plana61.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTCMIScDTmD9NkfsWU7xeyZ+WOXai5izYeliiXDSjJC3bT6r8Fp+rhPfcHCVHiw++VsbvKZtkhjCSnJTVPWCdpRDghzJ3nZUBImWRo3PmHo1etQpCeimaOrIJ2q0ChN5jmSOqy5B+Z4om2vXBtBY6nkdTxDOr2+MH3NrSPkQSFB0zO+VPuwKXsemeUC6urb2IZZpxY3cxNq4fafTF9PROpgOnIA+o3igyU4duKEjnCzTHZjw/PL7Eph/7p6+UQgrUwe7pgVzT+2MM0zcBtBSXNqs3dCGmpvUapOkBlDoIX02EkWRNpkM3vfeFt1EFC17B5vd61Kg40bYUG8qWGR0T
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph: null
- rbd.xfstests: null

and it hung at

2012-07-03T20:05:48.938 INFO:teuthology.orchestra.run.out:
2012-07-03T20:06:09.659 INFO:teuthology.orchestra.run.out:001    10s
2012-07-03T20:06:12.238 INFO:teuthology.orchestra.run.out:002    1s
2012-07-03T20:06:15.527 INFO:teuthology.orchestra.run.out:003    0s
2012-07-03T20:06:31.224 INFO:teuthology.orchestra.run.out:004    13s
2012-07-03T20:06:35.969 INFO:teuthology.orchestra.run.out:005    0s

Actions

Copy link

Updated by Sage Weil almost 12 years ago

Status changed from 12 to Can't reproduce

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #2708

spinlock lockup in queue_con, queue_work

Updated by Sage Weil almost 12 years ago