Project

General

Profile

Actions

Bug #3939

closed

krbd: circular locking report in sysfs code

Added by Alex Elder about 11 years ago. Updated about 11 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I intended to write this up before but don't think I did.
I'm getting a "possible circular locking dependency detected"
lockdep report (below). It is either from the rbd/map-unmap.sh
or the rbd/kernel.sh teuthology workunit. (I suspect it's the
former because of the area of code in question.)

I am running tests using the new request code. This new code
has not changed the sysfs stuff, so I don't believe this is
a new problem.

This particular bug should be identified, but I've thought for
a while that the locking surrounding the sysfs stuff might be
deserving of a sort of comprehensive review.

Note also that fixing http://tracker.newdream.net/3427 might
also affect whether this problem can occur. It would be best
to understand the problem before assuming that though.

[  521.833110] 
[  521.851538] ======================================================
[  521.876458] [ INFO: possible circular locking dependency detected ]
[  521.901795] 3.6.0-ceph-00224-g1370216 #1 Not tainted
[  521.925849] -------------------------------------------------------
[  521.952098] tee/20164 is trying to acquire lock:
[  521.976916]  (ctl_mutex/1){+.+.+.}, at: [<ffffffffa02e89b7>] rbd_dev_refresh+0x47/0x2f0 [rbd]
[  522.028191] 
[  522.028191] but task is already holding lock:
[  522.075418]  (s_active#92){++++.+}, at: [<ffffffff811ec66d>] sysfs_write_file+0xcd/0x170
[  522.128236] 
[  522.128236] which lock already depends on the new lock.
[  522.128236] 
[  522.203569] 
[  522.203569] the existing dependency chain (in reverse order) is:
[  522.254452] 
-> #1 (s_active#92){++++.+}:
[  522.302613]        [<ffffffff810b2e22>] lock_acquire+0xa2/0x140
[  522.332279]        [<ffffffff811ed896>] sysfs_deactivate+0x116/0x160
[  522.362442]        [<ffffffff811ee37b>] sysfs_addrm_finish+0x3b/0x70
[  522.392407]        [<ffffffff811ec3cb>] sysfs_hash_and_remove+0x5b/0xb0
[  522.423260]        [<ffffffff811f01c1>] remove_files.isra.1+0x31/0x40
[  522.454303]        [<ffffffff811f05bd>] sysfs_remove_group+0x4d/0x100
[  522.485679]        [<ffffffff813f67eb>] device_remove_groups+0x3b/0x60
[  522.517943]        [<ffffffff813f6ae4>] device_remove_attrs+0x44/0x80
[  522.550492]        [<ffffffff813f73d5>] device_del+0x125/0x1c0
[  522.582747]        [<ffffffff813f7492>] device_unregister+0x22/0x60
[  522.615507]        [<ffffffffa02e6d6f>] rbd_remove+0x1bf/0x1d0 [rbd]
[  522.648312]        [<ffffffff813f8b97>] bus_attr_store+0x27/0x30
[  522.680342]        [<ffffffff811ec686>] sysfs_write_file+0xe6/0x170
[  522.712192]        [<ffffffff8117bee3>] vfs_write+0xb3/0x180
[  522.743032]        [<ffffffff8117c20a>] sys_write+0x4a/0x90
[  522.772503]        [<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b
[  522.802689] 
-> #0 (ctl_mutex/1){+.+.+.}:
[  522.850774]        [<ffffffff810b2788>] __lock_acquire+0x1ac8/0x1b90
[  522.881017]        [<ffffffff810b2e22>] lock_acquire+0xa2/0x140
[  522.910390]        [<ffffffff8163253b>] mutex_lock_nested+0x4b/0x320
[  522.939702]        [<ffffffffa02e89b7>] rbd_dev_refresh+0x47/0x2f0 [rbd]
[  522.968812]        [<ffffffffa02e8c7f>] rbd_image_refresh+0x1f/0x40 [rbd]
[  522.998452]        [<ffffffff813f6128>] dev_attr_store+0x18/0x30
[  523.027243]        [<ffffffff811ec686>] sysfs_write_file+0xe6/0x170
[  523.056409]        [<ffffffff8117bee3>] vfs_write+0xb3/0x180
[  523.085407]        [<ffffffff8117c20a>] sys_write+0x4a/0x90
[  523.113700]        [<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b
[  523.142912] 
[  523.142912] other info that might help us debug this:
[  523.142912] 
[  523.216639]  Possible unsafe locking scenario:
[  523.216639] 
[  523.266500]        CPU0                    CPU1
[  523.292750]        ----                    ----
[  523.318539]   lock(s_active#92);
[  523.342655]                                lock(ctl_mutex/1);
[  523.370166]                                lock(s_active#92);
[  523.397380]   lock(ctl_mutex/1);
[  523.421779] 
[  523.421779]  *** DEADLOCK ***
[  523.421779] 
[  523.487594] 2 locks held by tee/20164:
[  523.511118]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff811ec5e4>] sysfs_write_file+0x44/0x170
[  523.562758]  #1:  (s_active#92){++++.+}, at: [<ffffffff811ec66d>] sysfs_write_file+0xcd/0x170
[  523.616541] 
[  523.616541] stack backtrace:
[  523.663438] Pid: 20164, comm: tee Not tainted 3.6.0-ceph-00224-g1370216 #1
[  523.693567] Call Trace:
[  523.717950]  [<ffffffff8162b274>] print_circular_bug+0x1fb/0x20c
[  523.747238]  [<ffffffff81085fec>] ? ttwu_stat+0x4c/0x140
[  523.775497]  [<ffffffff810b2788>] __lock_acquire+0x1ac8/0x1b90
[  523.804350]  [<ffffffff810b393d>] ? trace_hardirqs_on+0xd/0x10
[  523.833220]  [<ffffffffa02e89b7>] ? rbd_dev_refresh+0x47/0x2f0 [rbd]
[  523.863191]  [<ffffffff810b2e22>] lock_acquire+0xa2/0x140
[  523.892078]  [<ffffffffa02e89b7>] ? rbd_dev_refresh+0x47/0x2f0 [rbd]
[  523.921875]  [<ffffffff8163253b>] mutex_lock_nested+0x4b/0x320
[  523.951096]  [<ffffffffa02e89b7>] ? rbd_dev_refresh+0x47/0x2f0 [rbd]
[  523.980991]  [<ffffffff810b28f0>] ? lock_release_non_nested+0xa0/0x310
[  524.011289]  [<ffffffffa02e89b7>] rbd_dev_refresh+0x47/0x2f0 [rbd]
[  524.041564]  [<ffffffffa02e8c7f>] rbd_image_refresh+0x1f/0x40 [rbd]
[  524.072407]  [<ffffffff811ec66d>] ? sysfs_write_file+0xcd/0x170
[  524.103471]  [<ffffffff813f6128>] dev_attr_store+0x18/0x30
[  524.134233]  [<ffffffff811ec686>] sysfs_write_file+0xe6/0x170
[  524.165228]  [<ffffffff8117bee3>] vfs_write+0xb3/0x180
[  524.195866]  [<ffffffff8117c20a>] sys_write+0x4a/0x90
[  524.225908]  [<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b
Actions #1

Updated by Alex Elder about 11 years ago

  • Project changed from Linux kernel client to rbd
Actions #2

Updated by Alex Elder about 11 years ago

  • Status changed from New to Duplicate

Duplicate of 3925. I did write it up before.

Actions

Also available in: Atom PDF