Project

General

Profile

Tasks #1112

check all igrab at ceph-client,remove deadlock : spin_lock(&inode->i_lock) + igrab(inode)

Added by changping Wu over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
fs/ceph
Target version:
% Done:

0%

Tags:
Reviewed:
Affected Versions:

Description

Hi , at igrab function,it has existed the codes: spin_lock(&inode->i_lock);
if coding this:

spin_lock(&inode->i_lock);
...................
igrab(inode);
.................
spin_unlock(&inode->i_lock);

it is a deadlock.

Jeff ,Wu

linux-2.6.39:
http://lxr.linux.no/#linux+v2.6.39/fs/inode.c#L1144

1143
1144struct inode igrab(struct inode *inode)
1145{
1146 spin_lock(&inode->i_lock);
1147 if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
1148 __iget(inode);
1149 spin_unlock(&inode->i_lock);
1150 } else {
1151 spin_unlock(&inode->i_lock);
1152 /

1153 * Handle the case where s_op->clear_inode is not been
1154 * called yet, and somebody is calling igrab
1155 * while the inode is getting freed.
1156 */
1157 inode = NULL;
1158 }
1159 return inode;
1160}
1161EXPORT_SYMBOL(igrab);

ceph-client__fsstress_log1 (94.1 KB) changping Wu, 05/27/2011 08:00 AM

ceph-client_fsstress_log2 (48.4 KB) changping Wu, 05/27/2011 08:00 AM

ceph-client_fsstress_log3 (31.1 KB) changping Wu, 05/27/2011 08:00 AM

History

#1 Updated by Sage Weil over 9 years ago

  • Target version set to v3.0

Hi Jeff-

Are there actual cases of this that you're seeing? I've fixed several of these, but I'm not aware currently of any others. Or are you just proposing a thorough audit?

#2 Updated by changping Wu over 9 years ago

Hi ,
I attached some of logs at bug #1096 http://tracker.newdream.net/issues/1096.
:ceph-client-fsstress log 1,2,3.

I git ceph-client master :

commit 35b0ed997b1a49ff73a6110cbd04681467dbe217
Author: Sage Weil <>
Date: Wed May 25 14:56:12 2011 -0700

to verify bug 1096, open kernel_hacking spin/mutex lock debug.

but ,it printk log ,which like this : (some of the detail logs attached )
..........................................
[ 2714.810007] NMI backtrace for cpu 1
[ 2714.810007] CPU 1
[ 2714.810007] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2714.810007]
[ 2714.810007] Pid: 3051, comm: fsstress Not tainted 2.6.39+ #1 Dell Inc. OptiPlex 740/0UT225
[ 2714.810007] RIP: 0010:[<ffffffff812e720f>] [<ffffffff812e720f>] delay_tsc+0x1f/0x80
[ 2714.810007] RSP: 0018:ffff880100261d98 EFLAGS: 00000206
[ 2714.810007] RAX: 0000000000000000 RBX: 00000000012479db RCX: 000000001806526e
[ 2714.810007] RDX: 0000000000002c00 RSI: 0000000000000001 RDI: 0000000000000001
[ 2714.810007] RBP: ffff880100261db8 R08: 0000000000000002 R09: 0000000000000000
[ 2714.810007] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800d4a2a168
[ 2714.810007] R13: 0000000000000001 R14: 0000000000000001 R15: ffff88010fd9a300
[ 2714.810007] FS: 00007f074f487700(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000
[ 2714.810007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2714.810007] CR2: 00007f12ffcd1000 CR3: 0000000100369000 CR4: 00000000000006e0
[ 2714.810007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2714.810007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2714.810007] Process fsstress (pid: 3051, threadinfo ffff880100260000, task ffff88010fd9a300)
[ 2714.810007] Stack:
[ 2714.810007] 00000000012479db ffff8800d4a2a168 00000000717c75b0 0000000000000000
[ 2714.810007] ffff880100261dc8 ffffffff812e71af ffff880100261e28 ffffffff812f9afc
[ 2714.810007] ffff88010fd9a300 ffff88010fd9a300 ffff88010fd9a9c0 ffff88010fd9a728
[ 2714.810007] Call Trace:
[ 2714.810007] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2714.810007] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2714.810007] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2714.810007] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2714.810007] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2714.810007] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2714.810007] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2714.810007] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2714.810007] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2714.810007] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2714.810007] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2714.810007] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2714.810007] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2714.810007] Code: 48 8d 7a 01 ff 15 3a be 97 00 c9 c3 55 48 89 e5 41 56 41 55 41 54 53 66 66 66 66 90 65 44 8b 2c 25 78 dc 00 00 49 89 fe
[ 2714.810007] 66 90 e8 c9 b7 d2 ff 66 90 4c 63 e0 eb 11 66 90 f3 90 65 8b
[ 2714.810007] Call Trace:
[ 2714.810007] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2714.810007] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2714.810007] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2714.810007] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2714.810007] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2714.810007] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2714.810007] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2714.810007] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2714.810007] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2714.810007] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2714.810007] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2714.810007] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2714.810007] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b

.............................................

then i try to close spin_lock/spin_unlock at ceph_flush_dirty_caps to debug this issue.
do fsstress test ,
but it at ceph_set_page_dirty ,printk deadlock issue. next monday ,i will continue to reproduce it and attached logs.

then i add spin_unlock/spin_lock at ceph_set_page_dirty,run fsstress ,this deadlock issue don't exist.

spin_unlock(&inode->i_lock);
igrab(inode);
spin_lock(&inode->i_lock);

..........................
i note that ceph-client exist many igrab , but i'am not sure whether exist other "igrab" code need to modify.so add a task.

#3 Updated by changping Wu over 9 years ago

static int ceph_set_page_dirty(struct page *page) {
...............................
/* dirty the head */
spin_lock(&inode->i_lock);
if (ci->i_head_snapc == NULL)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
+spin_unlock(&inode->i_lock);
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
+spin_lock(&inode->i_lock);
++ci->i_wrbuffer_ref;
dout("%p set_page_dirty %p idx %lu head %d/%d -> %d/%d "
"snapc %p seq %lld (%d snaps)\n",
mapping->host, page, page->index,
ci->i_wrbuffer_ref-1, ci->i_wrbuffer_ref_head-1,
ci->i_wrbuffer_ref, ci->i_wrbuffer_ref_head,
snapc, snapc->seq, snapc->num_snaps);
spin_unlock(&inode->i_lock);

....................
}

#4 Updated by Sage Weil over 9 years ago

  • Category set to fs/ceph
  • Assignee set to Sage Weil

Jeff Wu wrote:

static int ceph_set_page_dirty(struct page *page) {
...............................
/* dirty the head */
spin_lock(&inode->i_lock);
if (ci->i_head_snapc == NULL)
ci->i_head_snapc = ceph_get_snap_context(snapc);
++ci->i_wrbuffer_ref_head;
+spin_unlock(&inode->i_lock);
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
+spin_lock(&inode->i_lock);

This one is fixed in the latest master branch already. The correct way to do this, btw, is to use ihold() instead of igrab().

I also pushed another patch that changes almost all igrab calls to ihold, which is more appropriate when we have a stable reference. This also avoids creating any new and interesting locking dependencies.

Can you see if any of these still come up with the latest code?

#5 Updated by changping Wu over 9 years ago

Hi ,
I'am verifing fsstress test with ceph-client master branch:

commit 98cc99822dac96710a8b64bdc2be4eccffc78956
Author: Sage Weil <>
Date: Fri May 27 09:24:26 2011 -0700

ceph: use ihold when we already have an inode ref
We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock. This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: Sage Weil &lt;&gt;

but ,i viewed that it still exist four "igrab" at ceph client,

---- igrab Matches (4 in 3 files) ----
Addr.c (client-source\ceph): igrab(inode);
Mds_client.c (client-source\ceph): inode = igrab(&cap->ci->vfs_inode);
Snap.c (client-source\ceph): igrab(inode);
Snap.c (client-source\ceph): struct inode *inode = igrab(&ci->vfs_inode);

Should these "igrab" not to replace with "ihold"?

#6 Updated by changping Wu over 9 years ago

Hi ,
ceph_set_page_dirty still exist igrab ,
i merge the patch to ceph-cleint-standalone,
run fsstress, still hit a deadlock issue.

static int ceph_set_page_dirty(struct page *page)
...............................
if (ci->i_wrbuffer_ref == 0)
igrab(inode);
..................................

#7 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved

Also available in: Atom PDF