Tasks #1112
check all igrab at ceph-client,remove deadlock : spin_lock(&inode->i_lock) + igrab(inode)
Description
Hi , at igrab function,it has existed the codes: spin_lock(&inode->i_lock);
if coding this:
spin_lock(&inode->i_lock);
...................
igrab(inode);
.................
spin_unlock(&inode->i_lock);
it is a deadlock.
Jeff ,Wu
linux-2.6.39:
http://lxr.linux.no/#linux+v2.6.39/fs/inode.c#L1144
1143
1144struct inode igrab(struct inode *inode)
1145{
1146 spin_lock(&inode->i_lock);
1147 if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
1148 __iget(inode);
1149 spin_unlock(&inode->i_lock);
1150 } else {
1151 spin_unlock(&inode->i_lock);
1152 /
1153 * Handle the case where s_op->clear_inode is not been
1154 * called yet, and somebody is calling igrab
1155 * while the inode is getting freed.
1156 */
1157 inode = NULL;
1158 }
1159 return inode;
1160}
1161EXPORT_SYMBOL(igrab);
History
#1 Updated by Sage Weil almost 13 years ago
- Target version set to v3.0
Hi Jeff-
Are there actual cases of this that you're seeing? I've fixed several of these, but I'm not aware currently of any others. Or are you just proposing a thorough audit?
#2 Updated by changping Wu almost 13 years ago
- File ceph-client__fsstress_log1 added
- File ceph-client_fsstress_log2 added
- File ceph-client_fsstress_log3 added
Hi ,
I attached some of logs at bug #1096 http://tracker.newdream.net/issues/1096.
:ceph-client-fsstress log 1,2,3.
I git ceph-client master :
commit 35b0ed997b1a49ff73a6110cbd04681467dbe217
Author: Sage Weil <sage@newdream.net>
Date: Wed May 25 14:56:12 2011 -0700
to verify bug 1096, open kernel_hacking spin/mutex lock debug.
but ,it printk log ,which like this : (some of the detail logs attached )
..........................................
[ 2714.810007] NMI backtrace for cpu 1
[ 2714.810007] CPU 1
[ 2714.810007] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2714.810007]
[ 2714.810007] Pid: 3051, comm: fsstress Not tainted 2.6.39+ #1 Dell Inc. OptiPlex 740/0UT225
[ 2714.810007] RIP: 0010:[<ffffffff812e720f>] [<ffffffff812e720f>] delay_tsc+0x1f/0x80
[ 2714.810007] RSP: 0018:ffff880100261d98 EFLAGS: 00000206
[ 2714.810007] RAX: 0000000000000000 RBX: 00000000012479db RCX: 000000001806526e
[ 2714.810007] RDX: 0000000000002c00 RSI: 0000000000000001 RDI: 0000000000000001
[ 2714.810007] RBP: ffff880100261db8 R08: 0000000000000002 R09: 0000000000000000
[ 2714.810007] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800d4a2a168
[ 2714.810007] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88010fd9a300
[ 2714.810007] FS: 00007f074f487700(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000
[ 2714.810007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2714.810007] CR2: 00007f12ffcd1000 CR3: 0000000100369000 CR4: 00000000000006e0
[ 2714.810007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2714.810007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2714.810007] Process fsstress (pid: 3051, threadinfo ffff880100260000, task ffff88010fd9a300)
[ 2714.810007] Stack:
[ 2714.810007] 00000000012479db ffff8800d4a2a168 00000000717c75b0 0000000000000000
[ 2714.810007] ffff880100261dc8 ffffffff812e71af ffff880100261e28 ffffffff812f9afc
[ 2714.810007] ffff88010fd9a300 ffff88010fd9a300 ffff88010fd9a9c0 ffff88010fd9a728
[ 2714.810007] Call Trace:
[ 2714.810007] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2714.810007] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2714.810007] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2714.810007] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2714.810007] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2714.810007] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2714.810007] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2714.810007] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2714.810007] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2714.810007] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2714.810007] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2714.810007] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2714.810007] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2714.810007] Code: 48 8d 7a 01 ff 15 3a be 97 00 c9 c3 55 48 89 e5 41 56 41 55 41 54 53 66 66 66 66 90 65 44 8b 2c 25 78 dc 00 00 49 89 fe
[ 2714.810007] 66 90 e8 c9 b7 d2 ff 66 90 4c 63 e0 eb 11 66 90 f3 90 65 8b
[ 2714.810007] Call Trace:
[ 2714.810007] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2714.810007] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2714.810007] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2714.810007] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2714.810007] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2714.810007] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2714.810007] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2714.810007] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2714.810007] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2714.810007] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2714.810007] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2714.810007] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2714.810007] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
.............................................
then i try to close spin_lock/spin_unlock at ceph_flush_dirty_caps to debug this issue.
do fsstress test ,
but it at ceph_set_page_dirty ,printk deadlock issue. next monday ,i will continue to reproduce it and attached logs.
then i add spin_unlock/spin_lock at ceph_set_page_dirty,run fsstress ,this deadlock issue don't exist.
spin_unlock(&inode->i_lock);
igrab(inode);
spin_lock(&inode->i_lock);
..........................
i note that ceph-client exist many igrab , but i'am not sure whether exist other "igrab" code need to modify.so add a task.
#3 Updated by changping Wu almost 13 years ago
static int ceph_set_page_dirty(struct page *page)
{
...............................
/* dirty the head */
spin_lock(&inode->i_lock);
if (ci->i_head_snapc == NULL)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
+spin_unlock(&inode->i_lock);
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
+spin_lock(&inode->i_lock);
++ci->i_wrbuffer_ref;
dout("%p set_page_dirty %p idx %lu head %d/%d -> %d/%d "
"snapc %p seq %lld (%d snaps)\n",
mapping->host, page, page->index,
ci->i_wrbuffer_ref-1, ci->i_wrbuffer_ref_head-1,
ci->i_wrbuffer_ref, ci->i_wrbuffer_ref_head,
snapc, snapc->seq, snapc->num_snaps);
spin_unlock(&inode->i_lock);
....................
}
#4 Updated by Sage Weil almost 13 years ago
- Category set to fs/ceph
- Assignee set to Sage Weil
Jeff Wu wrote:
static int ceph_set_page_dirty(struct page *page) {
...............................
/* dirty the head */
spin_lock(&inode->i_lock);
if (ci->i_head_snapc == NULL)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
+spin_unlock(&inode->i_lock);
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
+spin_lock(&inode->i_lock);
This one is fixed in the latest master branch already. The correct way to do this, btw, is to use ihold() instead of igrab().
I also pushed another patch that changes almost all igrab calls to ihold, which is more appropriate when we have a stable reference. This also avoids creating any new and interesting locking dependencies.
Can you see if any of these still come up with the latest code?
#5 Updated by changping Wu almost 13 years ago
Hi ,
I'am verifing fsstress test with ceph-client master branch:
commit 98cc99822dac96710a8b64bdc2be4eccffc78956
Author: Sage Weil <sage@newdream.net>
Date: Fri May 27 09:24:26 2011 -0700
ceph: use ihold when we already have an inode ref
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock. This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: Sage Weil <sage@newdream.net>
but ,i viewed that it still exist four "igrab" at ceph client,
---- igrab Matches (4 in 3 files) ----
Addr.c (client-source\ceph): igrab(inode);
Mds_client.c (client-source\ceph): inode = igrab(&cap->ci->vfs_inode);
Snap.c (client-source\ceph): igrab(inode);
Snap.c (client-source\ceph): struct inode *inode = igrab(&ci->vfs_inode);
Should these "igrab" not to replace with "ihold"?
#6 Updated by changping Wu almost 13 years ago
Hi ,
ceph_set_page_dirty still exist igrab ,
i merge the patch to ceph-cleint-standalone,
run fsstress, still hit a deadlock issue.
static int ceph_set_page_dirty(struct page *page)
...............................
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
..................................
#7 Updated by Sage Weil over 12 years ago
- Status changed from New to Resolved