Project

General

Profile

Actions

Bug #2242

closed

rbd: spinlock on wrong cpu

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
rbd
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description


2012-04-04T01:17:25.100598-07:00 plana34 kernel: [ 9681.094759] BUG: spinlock wrong CPU on CPU#3, rbd/27814
2012-04-04T01:17:25.100614-07:00 plana34 kernel: [ 9681.100064]  lock: ffffffffa031e900, .magic: dead4ead, .owner: rbd/27814, .owner_cpu: 5
2012-04-04T01:17:25.115582-07:00 plana34 kernel: [ 9681.108087] Pid: 27814, comm: rbd Not tainted 3.3.0-ceph-00066-g02615af #1
2012-04-04T01:17:25.115594-07:00 plana34 kernel: [ 9681.115026] Call Trace:
2012-04-04T01:17:25.123173-07:00 plana34 kernel: [ 9681.117488]  [<ffffffff81323c28>] spin_dump+0x78/0xc0
2012-04-04T01:17:25.123185-07:00 plana34 kernel: [ 9681.122604]  [<ffffffff81323c9b>] spin_bug+0x2b/0x40
2012-04-04T01:17:25.134065-07:00 plana34 kernel: [ 9681.127576]  [<ffffffff81323d38>] do_raw_spin_unlock+0x88/0xb0
2012-04-04T01:17:25.134078-07:00 plana34 kernel: [ 9681.133477]  [<ffffffff81615e6b>] _raw_spin_unlock+0x2b/0x40
2012-04-04T01:17:25.139794-07:00 plana34 kernel: [ 9681.139205]  [<ffffffffa031ab92>] rbd_put_client+0x42/0x60 [rbd]
2012-04-04T01:17:25.152088-07:00 plana34 kernel: [ 9681.145220]  [<ffffffffa031bb36>] rbd_dev_release+0xe6/0x170 [rbd]
2012-04-04T01:17:25.152101-07:00 plana34 kernel: [ 9681.151468]  [<ffffffff813e95d7>] device_release+0x27/0xa0
2012-04-04T01:17:25.163323-07:00 plana34 kernel: [ 9681.156961]  [<ffffffff81312ffd>] kobject_release+0x8d/0x1d0
2012-04-04T01:17:25.163336-07:00 plana34 kernel: [ 9681.162683]  [<ffffffff81312e7c>] kobject_put+0x2c/0x60
2012-04-04T01:17:25.173775-07:00 plana34 kernel: [ 9681.167917]  [<ffffffff813e9197>] put_device+0x17/0x20
2012-04-04T01:17:25.173788-07:00 plana34 kernel: [ 9681.173117]  [<ffffffff813ea1ea>] device_unregister+0x2a/0x60
2012-04-04T01:17:25.179586-07:00 plana34 kernel: [ 9681.178928]  [<ffffffffa031a22b>] rbd_remove+0x13b/0x170 [rbd]
2012-04-04T01:17:25.191007-07:00 plana34 kernel: [ 9681.184768]  [<ffffffff813eb507>] bus_attr_store+0x27/0x30
2012-04-04T01:17:25.191020-07:00 plana34 kernel: [ 9681.190321]  [<ffffffff811e9be6>] sysfs_write_file+0xe6/0x170
2012-04-04T01:17:25.201985-07:00 plana34 kernel: [ 9681.196077]  [<ffffffff8117b0d8>] vfs_write+0xc8/0x190
2012-04-04T01:17:25.201998-07:00 plana34 kernel: [ 9681.201281]  [<ffffffff8117b291>] sys_write+0x51/0x90
2012-04-04T01:17:25.213132-07:00 plana34 kernel: [ 9681.206342]  [<ffffffff8161e1a9>] system_call_fastpath+0x16/0x1b

ubuntu@teuthology:/a/nightly_coverage_2012-04-04-a/4363
Actions #1

Updated by Alex Elder about 12 years ago

OK, I think this problem arises because of the switch to a spinlock to
protect the client list. Doing so was the right idea in principle, however
rbd_client_release() calls ceph_destroy-client(), which calls ceph_msgr_flush(),
which calls flush_workqueue(), which can sleep. We should not be holding a
spinlock in that case.

I think the fix is to move the spinlock deeper, within the rbd_client_release()
call, and make it surround just the list deletion where it's really needed.

This conclusion was reached after a pretty quick look so I plan to look a bit
more closely later.

Actions #2

Updated by Alex Elder about 12 years ago

  • Status changed from New to Resolved
  • Assignee set to Alex Elder

This was fixed a couple of weeks ago, and the result has been committed
both to the testing and master branches of the ceph-client tree. It should
also go to Linus in the next pull request (for 3.4).

commit cd9d9f5df6098c50726200d4185e9e8da32785b3
Author: Alex Elder <>
Date: Wed Apr 4 13:35:44 2012 -0500

rbd: don't hold spinlock during messenger flush
Actions

Also available in: Atom PDF