https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2011-05-18T02:51:04Z
Ceph
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3953
2011-05-18T02:51:04Z
changping Wu
<ul></ul><p>do the following steps , be easy to reproduce</p>
<p>$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S<br />......................................<br />run for several minutes , press ctrl+c (^c) keyboard , abort<br />continue to do</p>
<p>$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S<br />.............................<br />run for several minutes , press ctrl+c (^c) keyboard , abort<br />continue to do</p>
<p>$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S<br />................................<br />run for several minutes , press ctrl+c (^c) keyboard , abort</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3960
2011-05-18T15:56:54Z
Sage Weil
sage@newdream.net
<ul><li><strong>Assignee</strong> set to <i>Sage Weil</i></li><li><strong>Target version</strong> set to <i>v0.29</i></li></ul>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3966
2011-05-18T17:04:22Z
Sage Weil
sage@newdream.net
<ul></ul><p>Was this with a single mds? Fsstress is known to turn up clustered mds bugs</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3967
2011-05-18T20:39:39Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/214/mds.alpha.log-tail-20k">mds.alpha.log-tail-20k</a> added</li><li><strong>File</strong> <a href="/attachments/download/215/ceph.conf.onemds">ceph.conf.onemds</a> added</li><li><strong>File</strong> <a href="/attachments/download/216/ceph.conf.twomds">ceph.conf.twomds</a> added</li></ul><p>Hi ,<br />single mds and two mds , both of them fsstress test hang.<br />ceph.conf and single mds test log mds.alpha.log attached</p>
<p>1) two mds:<br />; mds<br />; You need at least one. Define two to get a standby.<br />[mds]<br /> ; where the mds keeps it's secret encryption keys<br /> ; keyring = /etc/ceph/keyring.$name</p>
<pre><code>; mds logging to debug issues.<br /> ;debug ms = 1<br /> ;debug mds = 20<br /> ;mds_session_timeout =120</code></pre>
<p>[mds.alpha]<br /> host = ubuntu-mon0</p>
<p>[mds.beta]<br /> host = ubuntu-mon0</p>
<p>2) one mds:</p>
<p>----------------------------------------<br />; mds<br />; You need at least one. Define two to get a standby.<br />[mds]<br /> ; where the mds keeps it's secret encryption keys<br /> keyring = /mnt/data/keyring.$name</p>
<pre><code>; mds logging to debug issues.<br /> ;debug ms = 1<br /> debug mds = 20</code></pre>
<p>[mds.alpha]<br /> host = ubuntu-client0</p>
<p>;[mds.beta]<br />; host = beta<br />------------------------------------------------</p>
<p>ceph -s</p>
<ol>
<li>ceph -s<br />2011-05-19 11:38:23.201229 pg v1433: 396 pgs: 396 active+clean; 1595 MB data, 5388 MB used, 72425 MB / 81934 MB avail<br />2011-05-19 11:38:23.203378 mds e4: 1/1/1 up {0=alpha=up:active}<br />2011-05-19 11:38:23.203721 osd e3: 2 osds: 2 up, 2 in<br />2011-05-19 11:38:23.205931 log 2011-05-19 10:09:30.145914 osd1 172.16.10.176:6804/2028 147 : [INF] 2.30 scrub ok<br />2011-05-19 11:38:23.206511 mon e1: 1 mons at {alpha=172.16.10.176:6789/0}</li>
</ol>
<p>--------------<br />/var/log/ceph/mds.alpha.log , "tail -n 20k mds.alpha.log " attached</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3971
2011-05-19T10:08:00Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>4</i></li></ul><p>Do you have a larger piece of the mds log you can attach? (Perhaps the whole thing?)</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3972
2011-05-19T10:12:24Z
Sage Weil
sage@newdream.net
<ul><li><strong>translation missing: en.field_story_points</strong> set to <i>3</i></li><li><strong>translation missing: en.field_position</strong> set to <i>1</i></li><li><strong>translation missing: en.field_position</strong> changed from <i>1</i> to <i>672</i></li></ul>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3981
2011-05-19T19:17:33Z
changping Wu
<ul></ul><p>Hi , <br />The whole thing mds log size is too large, it about 5.4GB , can't attach it to this web.<br />this web limit the attachment file size (Maximum size: 5 MB) ,<br />mds.alpha.log-tail-20k is 4.8M , mds.alpha.log-tail-30k is 7.2M .</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=3982
2011-05-20T00:46:51Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/219/mds.alpha.log-tail-30k.tgz">mds.alpha.log-tail-30k.tgz</a> added</li></ul><p>tail -n 30k mds.alpha.log</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4001
2011-05-24T09:53:39Z
Sage Weil
sage@newdream.net
<ul></ul><p>The 30k lines still doesn't have the last client_request arrival. I bumped the limit to 50mb. Can you grab a bigger piece, and verify that grep client_request turns up something? Alternatively, post the full log (bzip2ed) somewhere I can grab it? Thanks!</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4006
2011-05-24T11:22:20Z
Sage Weil
sage@newdream.net
<ul><li><strong>Project</strong> changed from <i>Ceph</i> to <i>Linux kernel client</i></li><li><strong>Target version</strong> deleted (<del><i>v0.29</i></del>)</li></ul>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4007
2011-05-24T11:31:13Z
Sage Weil
sage@newdream.net
<ul><li><strong>Category</strong> set to <i>fs/ceph</i></li><li><strong>Target version</strong> set to <i>v3.0</i></li></ul><p>This is a kclient bug due to multiple threads entering flush_dirty_caps, which is not reentrant due to commit:e9964c102312967a4bc1fd501cb628c4a3b19034.</p>
<p>Fixed by commit:db3540522e955c1ebb391f4f5324dff4f20ecd09 in 'master' branch of ceph-client.git. Can you let us know if that fixes it for you?</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4028
2011-05-25T07:40:43Z
changping Wu
<ul></ul><p>I will build the kernel to verify it.thanks.</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4041
2011-05-25T23:50:08Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/225/ceph.fsstress.log.tgz">ceph.fsstress.log.tgz</a> added</li><li><strong>File</strong> <a href="/attachments/download/226/ceph-client_fsstress_log2">ceph-client_fsstress_log2</a> added</li><li><strong>File</strong> <a href="/attachments/download/227/ceph-client_fsstress_log3">ceph-client_fsstress_log3</a> added</li><li><strong>File</strong> <a href="/attachments/download/228/ceph-client__fsstress_log1">ceph-client__fsstress_log1</a> added</li></ul><p>Hi ,i git ceph-client master :</p>
<p>commit 35b0ed997b1a49ff73a6110cbd04681467dbe217<br />Author: Sage Weil <<a class="email" href="mailto:sage@newdream.net">sage@newdream.net</a>><br />Date: Wed May 25 14:56:12 2011 -0700</p>
<pre><code>ceph: unwind canceled flock state</code></pre>
<p>.............<br />open kernel_hacking ,build it , but ,the issue that fsstress test can't pass still exist.<br />maybe deaklock issue cause the bug.</p>
<p>and i also file a ceph-client task at :<br /><a class="external" href="http://tracker.newdream.net/issues/1112">http://tracker.newdream.net/issues/1112</a></p>
<p>it a deaklock issue.</p>
<p>i attached the detail logs,like this :</p>
<p>2194.881763] libceph: client4100 fsid 359eb12b-f380-12b6-6b38-0408838f0e69<br />[ 2194.890356] libceph: mon0 172.16.10.176:6789 session established<br />[ 2206.619076] <br />[ 2206.619082] =======================================================<br />[ 2206.620766] [ INFO: possible circular locking dependency detected ]<br />[ 2206.633091] 2.6.39+ <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a><br />[ 2206.633091] -------------------------------------------------------<br />[ 2206.633091] fsstress/3048 is trying to acquire lock:<br />[ 2206.633091] (&sb->s_type->i_lock_key#19){+.+...}, at: [<ffffffff811880b5>] igrab+0x25/0x60<br />[ 2206.633091] <br />[ 2206.633091] but task is already holding lock:<br />[ 2206.633091] (&(&mdsc->cap_dirty_lock)->rlock){+.+...}, at: [<ffffffffa030f03f>] ceph_flush_dirty_caps+0x2f/0xb0 [ceph]<br />[ 2206.633091] <br />[ 2206.633091] which lock already depends on the new lock.<br />[ 2206.633091] <br />[ 2206.633091] <br />[ 2206.633091] the existing dependency chain (in reverse order) is:<br />[ 2206.633091] <br />[ 2206.633091] -> <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> (&(&mdsc->cap_dirty_lock)->rlock){+.+...}:<br />[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140<br />[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70<br />[ 2206.633091] [<ffffffffa030d8c3>] __ceph_mark_dirty_caps+0xa3/0x190 [ceph]<br />[ 2206.633091] [<ffffffffa0302949>] ceph_setattr+0x299/0x6e0 [ceph]<br />[ 2206.633091] [<ffffffff8118a611>] notify_change+0x161/0x2c0<br />[ 2206.633091] [<ffffffff8116ceaa>] chown_common+0x9a/0xc0<br />[ 2206.633091] [<ffffffff8116d00e>] sys_lchown+0x7e/0x90<br />[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2206.633091] <br />[ 2206.633091] -> #0 (&sb->s_type->i_lock_key#19){+.+...}:<br />[ 2206.633091] [<ffffffff8109ef3a>] __lock_acquire+0x153a/0x15a0<br />[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140<br />[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70<br />[ 2206.633091] [<ffffffff811880b5>] igrab+0x25/0x60<br />[ 2206.633091] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]<br />[ 2206.633091] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]<br />[ 2206.633091] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90<br />[ 2206.633091] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30<br />[ 2206.633091] [<ffffffff81171317>] iterate_supers+0x77/0xe0<br />[ 2206.633091] [<ffffffff8119b54f>] sys_sync+0x2f/0x70<br />[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2206.633091] <br />[ 2206.633091] other info that might help us debug this:<br />[ 2206.633091] <br />[ 2206.633091] 2 locks held by fsstress/3048:<br />[ 2206.633091] #0: (&type->s_umount_key#38){.+.+..}, at: [<ffffffff81171307>] iterate_supers+0x67/0xe0<br />[ 2206.633091] <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a>: (&(&mdsc->cap_dirty_lock)->rlock){+.+...}, at: [<ffffffffa030f03f>] ceph_flush_dirty_caps+0x2f/0xb0 [ceph]<br />[ 2206.633091] <br />[ 2206.633091] stack backtrace:<br />[ 2206.633091] Pid: 3048, comm: fsstress Not tainted 2.6.39+ <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a><br />[ 2206.633091] Call Trace:<br />[ 2206.633091] [<ffffffff8109c701>] print_circular_bug+0xf1/0x100<br />[ 2206.633091] [<ffffffff8109ef3a>] __lock_acquire+0x153a/0x15a0<br />[ 2206.633091] [<ffffffff8105a4a0>] ? try_to_wake_up+0x430/0x430<br />[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140<br />[ 2206.633091] [<ffffffff811880b5>] ? igrab+0x25/0x60<br />[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70<br />[ 2206.633091] [<ffffffff811880b5>] ? igrab+0x25/0x60<br />[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2206.633091] [<ffffffff811880b5>] igrab+0x25/0x60<br />[ 2206.633091] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]<br />[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2206.633091] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]<br />[ 2206.633091] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90<br />[ 2206.633091] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30<br />[ 2206.633091] [<ffffffff81171317>] iterate_supers+0x77/0xe0<br />[ 2206.633091] [<ffffffff8119b54f>] sys_sync+0x2f/0x70<br />[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2532.330029] BUG: soft lockup - CPU#0 stuck for 67s! [fsstress:3055]<br />[ 2532.336342] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c<br />[ 2532.340009] irq event stamp: 176<br />[ 2532.340009] hardirqs last enabled at (175): [<ffffffff815f4bcd>] __mutex_lock_common+0x22d/0x3e0<br />[ 2532.340009] hardirqs last disabled at (176): [<ffffffff815f628f>] _raw_spin_lock_irq+0x1f/0x80<br />[ 2532.340009] softirqs last enabled at (0): [<ffffffff810602d8>] copy_process+0x648/0x1450<br />[ 2532.340009] softirqs last disabled at (0): [< (null)>] (null)<br />[ 2532.340009] CPU 0 <br />[ 2532.340009] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c<br />[ 2532.340009] <br />[ 2532.340009] Pid: 3055, comm: fsstress Not tainted 2.6.39+ <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> Dell Inc. OptiPlex 740/0UT225<br />[ 2532.340009] RIP: 0010:[<ffffffff812e71a0>] [<ffffffff812e71a0>] use_tsc_delay+0x20/0x20<br />[ 2532.340009] RSP: 0018:ffff8801002c9cc0 EFLAGS: 00000202<br />[ 2532.340009] RAX: 0000000000000000 RBX: ffff88010eeddc60 RCX: 0000000084a6fe8d<br />[ 2532.340009] RDX: 000000000000a200 RSI: ffff88010eeddc60 RDI: 0000000000000001<br />[ 2532.340009] RBP: ffff8801002c9d18 R08: 0000000000000000 R09: 0000000000000000<br />[ 2532.340009] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815ff88e<br />[ 2532.340009] R13: 0000029c840e16d2 R14: 0000000000000000 R15: 0000000000000000<br />[ 2532.340009] FS: 00007f074f487700(0000) GS:ffff88011f200000(0000) knlGS:0000000000000000<br />[ 2532.500026] BUG: soft lockup - CPU#1 stuck for 67s! [fsstress:3051]<br />[ 2532.500033] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c<br />[ 2532.500100] irq event stamp: 1092<br />[ 2532.500103] hardirqs last enabled at (1091): [<ffffffff815f6bf0>] _raw_spin_unlock_irq+0x30/0x40<br />[ 2532.500121] hardirqs last disabled at (1092): [<ffffffff815f628f>] _raw_spin_lock_irq+0x1f/0x80<br />[ 2532.500130] softirqs last enabled at (0): [<ffffffff810602d8>] copy_process+0x648/0x1450<br />[ 2532.500142] softirqs last disabled at (0): [< (null)>] (null)<br />[ 2532.500150] CPU 1 <br />[ 2532.500152] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c<br />[ 2532.500193] <br />[ 2532.500200] Pid: 3051, comm: fsstress Not tainted 2.6.39+ <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: gpf in tcp_sendpage (Closed)" href="https://tracker.ceph.com/issues/1">#1</a> Dell Inc. OptiPlex 740/0UT225<br />[ 2532.500209] RIP: 0010:[<ffffffff812e7232>] [<ffffffff812e7232>] delay_tsc+0x42/0x80<br />[ 2532.500222] RSP: 0018:ffff880100261d98 EFLAGS: 00000202<br />[ 2532.500227] RAX: 0000029c8ed94457 RBX: ffffffff815f6f54 RCX: 000000008ed94457<br />[ 2532.500232] RDX: 000000000000029c RSI: ffff8800d4a2a180 RDI: 0000000000000001<br />[ 2532.500237] RBP: ffff880100261db8 R08: 0000000000000000 R09: 0000000000000000<br />[ 2532.500241] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815ff88e<br />[ 2532.500246] R13: ffff88011f400000 R14: ffff880100260000 R15: 0000000000000000<br />[ 2532.500253] FS: 00007f074f487700(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000<br />[ 2532.500258] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br />[ 2532.500263] CR2: 00007f12ffcd1000 CR3: 0000000100369000 CR4: 00000000000006e0<br />[ 2532.500268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br />[ 2532.500273] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400<br />[ 2532.500278] Process fsstress (pid: 3051, threadinfo ffff880100260000, task ffff88010fd9a300)<br />[ 2532.500282] Stack:<br />[ 2532.500285] 0000000021dd0240 ffff8800d4a2a168 00000000717c75b0 0000000000000001<br />[ 2532.500295] ffff880100261dc8 ffffffff812e71af ffff880100261e28 ffffffff812f9afc<br />[ 2532.500302] ffff88010fd9a300 ffff88010fd9a300 ffff88010fd9a9c0 ffff88010fd9a728<br />[ 2532.500310] Call Trace:<br />[ 2532.500320] [<ffffffff812e71af>] __delay+0xf/0x20<br />[ 2532.500329] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150<br />[ 2532.500336] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70<br />[ 2532.500346] [<ffffffff811880b5>] ? igrab+0x25/0x60<br />[ 2532.500352] [<ffffffff811880b5>] igrab+0x25/0x60<br />[ 2532.500373] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]<br />[ 2532.500381] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2532.500392] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]<br />[ 2532.500398] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90<br />[ 2532.500405] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30<br />[ 2532.500412] [<ffffffff81171317>] iterate_supers+0x77/0xe0<br />[ 2532.500419] [<ffffffff8119b54f>] sys_sync+0x2f/0x70<br />[ 2532.500427] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2532.500431] Code: 00 00 49 89 fe 0f ae f0 66 66 90 e8 c9 b7 d2 ff 66 90 4c 63 e0 eb 11 66 90 f3 90 65 8b 1c 25 78 dc 00 00 44 39 eb 75 23 <br />[ 2532.500471] 66 90 e8 a6 b7 d2 ff 66 90 48 98 48 89 c2 4c 29 e2 49 39 d6 <br />[ 2532.500490] Call Trace:<br />[ 2532.500497] [<ffffffff812e71af>] __delay+0xf/0x20<br />[ 2532.500504] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150<br />[ 2532.500511] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70<br />[ 2532.500518] [<ffffffff811880b5>] ? igrab+0x25/0x60<br />[ 2532.500525] [<ffffffff811880b5>] igrab+0x25/0x60<br />[ 2532.500537] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]<br />[ 2532.500544] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2532.500555] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]<br />[ 2532.500561] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90<br />[ 2532.500568] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30<br />[ 2532.500574] [<ffffffff81171317>] iterate_supers+0x77/0xe0<br />[ 2532.500580] [<ffffffff8119b54f>] sys_sync+0x2f/0x70<br />[ 2532.500587] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2532.340009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033<br />[ 2532.340009] CR2: 0000000002f0e000 CR3: 0000000112095000 CR4: 00000000000006f0<br />[ 2532.340009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br />[ 2532.340009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400<br />[ 2532.340009] Process fsstress (pid: 3055, threadinfo ffff8801002c8000, task ffff880117542300)<br />[ 2532.340009] Stack:<br />[ 2532.340009] ffffffff812f9afc ffff880117542300 0000000000000246 ffff8801175429c0<br />[ 2532.340009] ffff880117542728 ffff8801002c9d08 ffff88010eeddc48 ffff88010eeddc60<br />[ 2532.340009] ffff8800d4a29e58 ffff88010eedd800 ffff8801176cd000 ffff8801002c9d48<br />[ 2532.340009] Call Trace:<br />[ 2532.340009] [<ffffffff812f9afc>] ? do_raw_spin_lock+0x11c/0x150<br />[ 2532.340009] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70<br />[ 2532.340009] [<ffffffffa030c8c1>] ? __mark_caps_flushing+0xb1/0x1a0 [ceph]<br />[ 2532.340009] [<ffffffffa030c66d>] ? ceph_cap_string+0xed/0x110 [ceph]<br />[ 2532.340009] [<ffffffffa030c8c1>] __mark_caps_flushing+0xb1/0x1a0 [ceph]<br />[ 2532.340009] [<ffffffffa030ee90>] ceph_check_caps+0x600/0x6f0 [ceph]<br />[ 2532.340009] [<ffffffffa030f082>] ceph_flush_dirty_caps+0x72/0xb0 [ceph]<br />[ 2532.340009] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90<br />[ 2532.340009] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]<br />[ 2532.340009] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90<br />[ 2532.340009] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30<br />[ 2532.340009] [<ffffffff81171317>] iterate_supers+0x77/0xe0<br />[ 2532.340009] [<ffffffff8119b54f>] sys_sync+0x2f/0x70<br />[ 2532.340009] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b<br />[ 2532.340009] Code: 48 ff c8 c9 c3 eb 04 90 90 90 90 55 48 89 e5 66 66 66 66 90 48 c7 05 94 be 97 00 f0 71 2e 81 c9 c3 eb 08 90 90 90 90 90 <br />[ 2532.340009] 48 89 e5 66 66 66 66 90 ff 15 79 be 97 00 c9 c3 eb 0d 90 90</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4095
2011-05-30T20:29:18Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/235/ceph.fsstress.log.tgz">ceph.fsstress.log.tgz</a> added</li><li><strong>File</strong> <a href="/attachments/download/236/ceph.conf">ceph.conf</a> <a class="icon-only icon-magnifier" title="View" href="/attachments/236/ceph.conf">View</a> added</li></ul><p>Hi ,i git ceph-client master branch:</p>
<p>commit 98cc99822dac96710a8b64bdc2be4eccffc78956<br />Author: Sage Weil <<a class="email" href="mailto:sage@newdream.net">sage@newdream.net</a>><br />Date: Fri May 27 09:24:26 2011 -0700</p>
<pre><code>ceph: use ihold when we already have an inode ref<br />..............................................................<br />+ " ceph 0.28.2 " to verify this issue.</code></pre>
<p>run "./fsstress -d /mnt/ceph/fstest -l 1 -n 10000 -p 1 -v"</p>
<p>fsstress test can't still finish.</p>
<p>logs:</p>
<p>1/441: unlink d4/d14/d100/dcb/d1c8/c1f1 0<br />1/442: read d4/d15/d7d/df4/d1c/d27/d71/f1c5 [2619005,59869] 0<br />1/443: mkdir d4/d15/d7d/df4/d1bb/d271 0</p>
<p>^c //crtl +c to abort.<br />----------------------------------------------------<br />kernel log:<br />[64857.090811] libceph: client4102 fsid ab8e5be8-25da-b4ac-6843-c92881b4b1df<br />[64857.099172] libceph: mon0 172.16.10.176:6789 session established<br />[66905.930618] libceph: get_reply unknown tid 481838 from osd1</p>
<p>os :ubuntu 10.10 x86_64 ,hand compiling ceph-client .</p>
<p>ceph log and ceph.conf attached.</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4126
2011-05-30T21:45:52Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/237/dynamic_debug_caps.logs.tgz">dynamic_debug_caps.logs.tgz</a> added</li></ul><p>echo 'file fs/ceph/caps.c +p' > /sys/kernel/debug/dynamic_debug/control</p>
<p>logs attached .</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4180
2011-06-01T00:19:08Z
changping Wu
<ul><li><strong>File</strong> <a href="/attachments/download/238/0601.tgz">0601.tgz</a> added</li></ul><p>git ceph-client commit 98cc99822dac96710a8b64bdc2be4eccffc78956 ,<br />hand compiling , btrfs+ ubuntu 10.10+2.6.39+ ..</p>
<p>echo 'file fs/ceph/caps.c +p' > /sys/kernel/debug/dynamic_debug/control<br />echo 'file fs/ceph/file.c +p' > /sys/kernel/debug/dynamic_debug/control<br />echo 'file fs/ceph/addr.c +p' > /sys/kernel/debug/dynamic_debug/control<br />echo 'file fs/ceph/mds_client.c +p' > /sys/kernel/debug/dynamic_debug/control<br />echo 'file fs/ceph/locks.c +p' > /sys/kernel/debug/dynamic_debug/control</p>
<p>--------------------<br />fsstress test logs:<br />...........................................................................................<br />0/874: write d1/d10/d25/d46/f63 [2672869,81812] 0<br />0/875: rename d1/d10/d25/d2d/d5e/d83 to d1/d10/d25/dfa/d11a 0<br />0/876: dread d1/d7f/d50/f65 [0,4194304] 0<br />0/877: rename d1/d10/d8b/dd4/le9 to d1/d10/d25/d46/d5b/d8d/da1/l11b 0<br />0/878: dread - d1/d10/d47/d86/dd8/ffc zero size<br />0/879: chown d1/d10/d25/d46/d5b/d8d/d28/ff9 13174 0<br />0/880: symlink d1/d10/d25/d84/db8/de2/d103/l11c 0<br />0/881: read d1/f38 [979656,102826] 0<br />0/882: chown d1/d10/d25/d84/l99 16763990 0<br />0/883: rename d1/d10/d47/d86 to d1/d10/d25/d46/d5b/d8d/d11d 0<br />0/884: write d1/d7f/d50/d64/d67/ff4 [1643273,104602] 0<br />0/885: symlink d1/d10/d25/dfa/d11a/l11e 0<br />0/886: mkdir d1/d10/d47/de3/df7/d107/d11f 0<br />0/887: mkdir d1/d10/d8b/daa/d120 0<br />0/888: mknod d1/d10/d47/de3/df7/d107/c121 0<br />0/889: mkdir d1/d32/d122 0<br />0/890: truncate d1/f38 5157986 0<br />0/891: chown d1/d10/d25/d46/d5b/d8d/d28/dae 7711732 0<br />^c (hang, ctrl+c ,abort)</p>
<p>------------------<br />the kernel logs attached.</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4184
2011-06-01T11:14:40Z
Sage Weil
sage@newdream.net
<ul></ul><p>The problem is a short O_DIRECT read that hits EOF. This seems to fix it for me:<br /><pre>
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 203252d..7eb2e04 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -354,8 +354,17 @@ more:
goto out;
}
- /* check i_size */
- *checkeof = 1;
+ if (align_to_pages) {
+ if (left > 0) {
+ dout("zeroing trailing bytes %d\n", left);
+ ceph_zero_page_vector_range(page_off + read,
+ left, pages);
+ read += left;
+ }
+ } else {
+ /* check i_size */
+ *checkeof = 1;
+ }
}
out:
</pre><br />I'm not certain that is correct, though. I need to verify that the correct thing to do when you O_DIRECT read 4MB from a ~10k file is ~10k data + (4MB-10k) of zeros...</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4185
2011-06-01T11:35:38Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>4</i> to <i>In Progress</i></li></ul><p>Scratch that, something a bit more subtle is going on.</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4187
2011-06-01T16:16:26Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>4</i></li></ul><p>This should be fixed by commit:85defe76f7e2a0b3d285a3be72fcffce96629b5c, pushed to the master branch. Can you test and let me know? It's passing tests on my end.</p>
<p>FWIW, I was reproducing the problem with<br /><pre>
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
int main(int argc, char **argv)
{
char buf[409600];
int fd = open(argv[1], O_WRONLY|O_CREAT);
ssize_t r;
write(fd, "foo", 3);
ftruncate(fd, 10000);
fsync(fd);
close(fd);
fd = open(argv[1], O_RDONLY|O_DIRECT);
r = read(fd, buf, sizeof(buf));
printf("got %d\n", r);
close(fd);
return 0;
}
</pre></p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4193
2011-06-02T02:51:20Z
changping Wu
<ul></ul><p>Hi , I apply the patch ,to verify this bug,</p>
<p>run "./fsstress -d /mnt/ceph/fstest -l 1 -n 10000 -p 1 -v" , pass.<br />run "./fsstress -d /mnt/ceph/fstest -l 1 -n 1000 -p 10 -v" , pass.</p>
<p>this bug should be fixed .</p>
Linux kernel client - Bug #1096: LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6
https://tracker.ceph.com/issues/1096?journal_id=4194
2011-06-02T09:54:33Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>4</i> to <i>Resolved</i></li></ul><p>Thanks Jeff!</p>