Bug #2242: rbd: spinlock on wrong cpu - Linux kernel client - Ceph

Actions

Copy link

Bug #2242

closed

rbd: spinlock on wrong cpu

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Alex Elder

Category:

rbd

Target version:

v3.4

% Done:

Source:

Q/A

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description


2012-04-04T01:17:25.100598-07:00 plana34 kernel: [ 9681.094759] BUG: spinlock wrong CPU on CPU#3, rbd/27814
2012-04-04T01:17:25.100614-07:00 plana34 kernel: [ 9681.100064]  lock: ffffffffa031e900, .magic: dead4ead, .owner: rbd/27814, .owner_cpu: 5
2012-04-04T01:17:25.115582-07:00 plana34 kernel: [ 9681.108087] Pid: 27814, comm: rbd Not tainted 3.3.0-ceph-00066-g02615af #1
2012-04-04T01:17:25.115594-07:00 plana34 kernel: [ 9681.115026] Call Trace:
2012-04-04T01:17:25.123173-07:00 plana34 kernel: [ 9681.117488]  [<ffffffff81323c28>] spin_dump+0x78/0xc0
2012-04-04T01:17:25.123185-07:00 plana34 kernel: [ 9681.122604]  [<ffffffff81323c9b>] spin_bug+0x2b/0x40
2012-04-04T01:17:25.134065-07:00 plana34 kernel: [ 9681.127576]  [<ffffffff81323d38>] do_raw_spin_unlock+0x88/0xb0
2012-04-04T01:17:25.134078-07:00 plana34 kernel: [ 9681.133477]  [<ffffffff81615e6b>] _raw_spin_unlock+0x2b/0x40
2012-04-04T01:17:25.139794-07:00 plana34 kernel: [ 9681.139205]  [<ffffffffa031ab92>] rbd_put_client+0x42/0x60 [rbd]
2012-04-04T01:17:25.152088-07:00 plana34 kernel: [ 9681.145220]  [<ffffffffa031bb36>] rbd_dev_release+0xe6/0x170 [rbd]
2012-04-04T01:17:25.152101-07:00 plana34 kernel: [ 9681.151468]  [<ffffffff813e95d7>] device_release+0x27/0xa0
2012-04-04T01:17:25.163323-07:00 plana34 kernel: [ 9681.156961]  [<ffffffff81312ffd>] kobject_release+0x8d/0x1d0
2012-04-04T01:17:25.163336-07:00 plana34 kernel: [ 9681.162683]  [<ffffffff81312e7c>] kobject_put+0x2c/0x60
2012-04-04T01:17:25.173775-07:00 plana34 kernel: [ 9681.167917]  [<ffffffff813e9197>] put_device+0x17/0x20
2012-04-04T01:17:25.173788-07:00 plana34 kernel: [ 9681.173117]  [<ffffffff813ea1ea>] device_unregister+0x2a/0x60
2012-04-04T01:17:25.179586-07:00 plana34 kernel: [ 9681.178928]  [<ffffffffa031a22b>] rbd_remove+0x13b/0x170 [rbd]
2012-04-04T01:17:25.191007-07:00 plana34 kernel: [ 9681.184768]  [<ffffffff813eb507>] bus_attr_store+0x27/0x30
2012-04-04T01:17:25.191020-07:00 plana34 kernel: [ 9681.190321]  [<ffffffff811e9be6>] sysfs_write_file+0xe6/0x170
2012-04-04T01:17:25.201985-07:00 plana34 kernel: [ 9681.196077]  [<ffffffff8117b0d8>] vfs_write+0xc8/0x190
2012-04-04T01:17:25.201998-07:00 plana34 kernel: [ 9681.201281]  [<ffffffff8117b291>] sys_write+0x51/0x90
2012-04-04T01:17:25.213132-07:00 plana34 kernel: [ 9681.206342]  [<ffffffff8161e1a9>] system_call_fastpath+0x16/0x1b

ubuntu@teuthology:/a/nightly_coverage_2012-04-04-a/4363

Actions

Copy link

Updated by Alex Elder about 12 years ago

OK, I think this problem arises because of the switch to a spinlock to
protect the client list. Doing so was the right idea in principle, however
rbd_client_release() calls ceph_destroy-client(), which calls ceph_msgr_flush(),
which calls flush_workqueue(), which can sleep. We should not be holding a
spinlock in that case.

I think the fix is to move the spinlock deeper, within the rbd_client_release()
call, and make it surround just the list deletion where it's really needed.

This conclusion was reached after a pretty quick look so I plan to look a bit
more closely later.

Actions

Copy link

Updated by Alex Elder about 12 years ago

Status changed from New to Resolved
Assignee set to Alex Elder

This was fixed a couple of weeks ago, and the result has been committed
both to the testing and master branches of the ceph-client tree. It should
also go to Linus in the next pull request (for 3.4).

commit cd9d9f5df6098c50726200d4185e9e8da32785b3
Author: Alex Elder <elder@dreamhost.com>
Date: Wed Apr 4 13:35:44 2012 -0500

rbd: don't hold spinlock during messenger flush

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #2242

rbd: spinlock on wrong cpu

Updated by Alex Elder about 12 years ago

Updated by Alex Elder about 12 years ago