Project

General

Profile

Bug #1096

LTP fsstress test always hang ,ceph 0.27.1+linux-2.6.38.6

Added by changping Wu almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

Hi ,
I do fsstress test for ceph 0.27.1 + linux-2.6.38.6 + ubuntu 10.10

$modprobe libceph
$modprobe ceph
$mount -t ceph monaddr:6789:/ /mnt/ceph
$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S

When it is running at some steps ,fsstress process always hang ,
it look like deadlock or be limited to continue creating dir/file.
which doesn't continue running , we need press ctrl+c (^c) keyboard to exit it.

Jeff =================================
logs:
1. test case 1
.....................................
4/1685: mknod d4/d15/d7d/df4/d1c/d27/d34/d4b/d5f/dfb/d1fb/c26f 0
4/1686: mknod d4/d15/d7d/df4/d1c/d64/d6b/d1ff/c270 0
4/1687: unlink d4/d14/d100/dcb/d1c8/c1f1 0
4/1688: read d4/d15/d7d/df4/d1c/d27/d71/f1c5 [2619005,59869] 0
4/1689: mkdir d4/d15/d7d/df4/d1bb/d271 0

~~~stop at this steps ,ctrl+c(^c) abort.

2 test case 2:
.........................
1/6748: creat d0/d33/d3f/d17c/d287/d9b3/f9e3 x:0 0 0
1/6749: link d0/d7/d18/d4f/d97/d126/d1e7/d93f/d68f/f3b0 d0/d7/d1dc/d346/d450/d9b/d179/f9e4 0
1/6750: rename d0/d7/d151/dae/d1ce/d377/f69e to d0/d7/d18/dec/d1e0/d23a/d298/d7b4/f9e5 0

~~~stop at this steps ,ctrl+c(^c) abort.

3. test case 3:
................................
24/5902: dwrite d4/d11/d78/d8b/d8f/ded/d114/d194/d35/d1cb/d2fe/f426 [0,4194304] 0
24/5903: readlink d4/d11/d78/d8b/d8f/ded/d114/d153/d417/l760 0
24/5904: rmdir d4/d5c/d446 39

~~~stop at this steps ,ctrl+c(^c) abort.

mds.alpha.log-tail-20k (4.77 MB) changping Wu, 05/18/2011 08:39 PM

ceph.conf.onemds (4.33 KB) changping Wu, 05/18/2011 08:39 PM

ceph.conf.twomds (3.89 KB) changping Wu, 05/18/2011 08:39 PM

mds.alpha.log-tail-30k.tgz (594 KB) changping Wu, 05/20/2011 12:46 AM

ceph.fsstress.log.tgz (757 KB) changping Wu, 05/25/2011 11:50 PM

ceph-client_fsstress_log2 (48.4 KB) changping Wu, 05/25/2011 11:50 PM

ceph-client_fsstress_log3 (31.1 KB) changping Wu, 05/25/2011 11:50 PM

ceph-client__fsstress_log1 (94.1 KB) changping Wu, 05/25/2011 11:50 PM

ceph.fsstress.log.tgz (3.72 MB) changping Wu, 05/30/2011 08:29 PM

ceph.conf View (4.35 KB) changping Wu, 05/30/2011 08:29 PM

dynamic_debug_caps.logs.tgz (689 KB) changping Wu, 05/30/2011 09:45 PM

0601.tgz (1.8 MB) changping Wu, 06/01/2011 12:19 AM

History

#1 Updated by changping Wu almost 13 years ago

do the following steps , be easy to reproduce

$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S
......................................
run for several minutes , press ctrl+c (^c) keyboard , abort
continue to do

$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S
.............................
run for several minutes , press ctrl+c (^c) keyboard , abort
continue to do

$./fsstress -d /mnt/ceph/mdstest -f write=freq -l 100 -n 10000 -p 30 -v -S
................................
run for several minutes , press ctrl+c (^c) keyboard , abort

#2 Updated by Sage Weil almost 13 years ago

  • Assignee set to Sage Weil
  • Target version set to v0.29

#3 Updated by Sage Weil almost 13 years ago

Was this with a single mds? Fsstress is known to turn up clustered mds bugs

#4 Updated by changping Wu almost 13 years ago

Hi ,
single mds and two mds , both of them fsstress test hang.
ceph.conf and single mds test log mds.alpha.log attached

1) two mds:
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
; keyring = /etc/ceph/keyring.$name

; mds logging to debug issues.
;debug ms = 1
;debug mds = 20
;mds_session_timeout =120

[mds.alpha]
host = ubuntu-mon0

[mds.beta]
host = ubuntu-mon0

2) one mds:

----------------------------------------
; mds
; You need at least one. Define two to get a standby.
[mds]
; where the mds keeps it's secret encryption keys
keyring = /mnt/data/keyring.$name

; mds logging to debug issues.
;debug ms = 1
debug mds = 20

[mds.alpha]
host = ubuntu-client0

;[mds.beta]
; host = beta
------------------------------------------------

ceph -s

  1. ceph -s
    2011-05-19 11:38:23.201229 pg v1433: 396 pgs: 396 active+clean; 1595 MB data, 5388 MB used, 72425 MB / 81934 MB avail
    2011-05-19 11:38:23.203378 mds e4: 1/1/1 up {0=alpha=up:active}
    2011-05-19 11:38:23.203721 osd e3: 2 osds: 2 up, 2 in
    2011-05-19 11:38:23.205931 log 2011-05-19 10:09:30.145914 osd1 172.16.10.176:6804/2028 147 : [INF] 2.30 scrub ok
    2011-05-19 11:38:23.206511 mon e1: 1 mons at {alpha=172.16.10.176:6789/0}

--------------
/var/log/ceph/mds.alpha.log , "tail -n 20k mds.alpha.log " attached

#5 Updated by Sage Weil almost 13 years ago

  • Status changed from New to 4

Do you have a larger piece of the mds log you can attach? (Perhaps the whole thing?)

#6 Updated by Sage Weil almost 13 years ago

  • translation missing: en.field_story_points set to 3
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 672

#7 Updated by changping Wu almost 13 years ago

Hi ,
The whole thing mds log size is too large, it about 5.4GB , can't attach it to this web.
this web limit the attachment file size (Maximum size: 5 MB) ,
mds.alpha.log-tail-20k is 4.8M , mds.alpha.log-tail-30k is 7.2M .

#8 Updated by changping Wu almost 13 years ago

tail -n 30k mds.alpha.log

#9 Updated by Sage Weil almost 13 years ago

The 30k lines still doesn't have the last client_request arrival. I bumped the limit to 50mb. Can you grab a bigger piece, and verify that grep client_request turns up something? Alternatively, post the full log (bzip2ed) somewhere I can grab it? Thanks!

#10 Updated by Sage Weil almost 13 years ago

  • Project changed from Ceph to Linux kernel client
  • Target version deleted (v0.29)

#11 Updated by Sage Weil almost 13 years ago

  • Category set to fs/ceph
  • Target version set to v3.0

This is a kclient bug due to multiple threads entering flush_dirty_caps, which is not reentrant due to commit:e9964c102312967a4bc1fd501cb628c4a3b19034.

Fixed by commit:db3540522e955c1ebb391f4f5324dff4f20ecd09 in 'master' branch of ceph-client.git. Can you let us know if that fixes it for you?

#12 Updated by changping Wu almost 13 years ago

I will build the kernel to verify it.thanks.

#13 Updated by changping Wu almost 13 years ago

Hi ,i git ceph-client master :

commit 35b0ed997b1a49ff73a6110cbd04681467dbe217
Author: Sage Weil <>
Date: Wed May 25 14:56:12 2011 -0700

ceph: unwind canceled flock state

.............
open kernel_hacking ,build it , but ,the issue that fsstress test can't pass still exist.
maybe deaklock issue cause the bug.

and i also file a ceph-client task at :
http://tracker.newdream.net/issues/1112

it a deaklock issue.

i attached the detail logs,like this :

2194.881763] libceph: client4100 fsid 359eb12b-f380-12b6-6b38-0408838f0e69
[ 2194.890356] libceph: mon0 172.16.10.176:6789 session established
[ 2206.619076]
[ 2206.619082] =======================================================
[ 2206.620766] [ INFO: possible circular locking dependency detected ]
[ 2206.633091] 2.6.39+ #1
[ 2206.633091] -------------------------------------------------------
[ 2206.633091] fsstress/3048 is trying to acquire lock:
[ 2206.633091] (&sb->s_type->i_lock_key#19){+.+...}, at: [<ffffffff811880b5>] igrab+0x25/0x60
[ 2206.633091]
[ 2206.633091] but task is already holding lock:
[ 2206.633091] (&(&mdsc->cap_dirty_lock)->rlock){+.+...}, at: [<ffffffffa030f03f>] ceph_flush_dirty_caps+0x2f/0xb0 [ceph]
[ 2206.633091]
[ 2206.633091] which lock already depends on the new lock.
[ 2206.633091]
[ 2206.633091]
[ 2206.633091] the existing dependency chain (in reverse order) is:
[ 2206.633091]
[ 2206.633091] -> #1 (&(&mdsc->cap_dirty_lock)->rlock){+.+...}:
[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140
[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70
[ 2206.633091] [<ffffffffa030d8c3>] __ceph_mark_dirty_caps+0xa3/0x190 [ceph]
[ 2206.633091] [<ffffffffa0302949>] ceph_setattr+0x299/0x6e0 [ceph]
[ 2206.633091] [<ffffffff8118a611>] notify_change+0x161/0x2c0
[ 2206.633091] [<ffffffff8116ceaa>] chown_common+0x9a/0xc0
[ 2206.633091] [<ffffffff8116d00e>] sys_lchown+0x7e/0x90
[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2206.633091]
[ 2206.633091] -> #0 (&sb->s_type->i_lock_key#19){+.+...}:
[ 2206.633091] [<ffffffff8109ef3a>] __lock_acquire+0x153a/0x15a0
[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140
[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70
[ 2206.633091] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2206.633091] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2206.633091] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2206.633091] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2206.633091] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2206.633091] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2206.633091] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2206.633091]
[ 2206.633091] other info that might help us debug this:
[ 2206.633091]
[ 2206.633091] 2 locks held by fsstress/3048:
[ 2206.633091] #0: (&type->s_umount_key#38){.+.+..}, at: [<ffffffff81171307>] iterate_supers+0x67/0xe0
[ 2206.633091] #1: (&(&mdsc->cap_dirty_lock)->rlock){+.+...}, at: [<ffffffffa030f03f>] ceph_flush_dirty_caps+0x2f/0xb0 [ceph]
[ 2206.633091]
[ 2206.633091] stack backtrace:
[ 2206.633091] Pid: 3048, comm: fsstress Not tainted 2.6.39+ #1
[ 2206.633091] Call Trace:
[ 2206.633091] [<ffffffff8109c701>] print_circular_bug+0xf1/0x100
[ 2206.633091] [<ffffffff8109ef3a>] __lock_acquire+0x153a/0x15a0
[ 2206.633091] [<ffffffff8105a4a0>] ? try_to_wake_up+0x430/0x430
[ 2206.633091] [<ffffffff8109f054>] lock_acquire+0xb4/0x140
[ 2206.633091] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2206.633091] [<ffffffff815f6196>] _raw_spin_lock+0x36/0x70
[ 2206.633091] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2206.633091] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2206.633091] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2206.633091] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2206.633091] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2206.633091] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2206.633091] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2206.633091] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2206.633091] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2206.633091] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2532.330029] BUG: soft lockup - CPU#0 stuck for 67s! [fsstress:3055]
[ 2532.336342] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2532.340009] irq event stamp: 176
[ 2532.340009] hardirqs last enabled at (175): [<ffffffff815f4bcd>] __mutex_lock_common+0x22d/0x3e0
[ 2532.340009] hardirqs last disabled at (176): [<ffffffff815f628f>] _raw_spin_lock_irq+0x1f/0x80
[ 2532.340009] softirqs last enabled at (0): [<ffffffff810602d8>] copy_process+0x648/0x1450
[ 2532.340009] softirqs last disabled at (0): [< (null)>] (null)
[ 2532.340009] CPU 0
[ 2532.340009] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2532.340009]
[ 2532.340009] Pid: 3055, comm: fsstress Not tainted 2.6.39+ #1 Dell Inc. OptiPlex 740/0UT225
[ 2532.340009] RIP: 0010:[<ffffffff812e71a0>] [<ffffffff812e71a0>] use_tsc_delay+0x20/0x20
[ 2532.340009] RSP: 0018:ffff8801002c9cc0 EFLAGS: 00000202
[ 2532.340009] RAX: 0000000000000000 RBX: ffff88010eeddc60 RCX: 0000000084a6fe8d
[ 2532.340009] RDX: 000000000000a200 RSI: ffff88010eeddc60 RDI: 0000000000000001
[ 2532.340009] RBP: ffff8801002c9d18 R08: 0000000000000000 R09: 0000000000000000
[ 2532.340009] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815ff88e
[ 2532.340009] R13: 0000029c840e16d2 R14: 0000000000000000 R15: 0000000000000000
[ 2532.340009] FS: 00007f074f487700(0000) GS:ffff88011f200000(0000) knlGS:0000000000000000
[ 2532.500026] BUG: soft lockup - CPU#1 stuck for 67s! [fsstress:3051]
[ 2532.500033] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2532.500100] irq event stamp: 1092
[ 2532.500103] hardirqs last enabled at (1091): [<ffffffff815f6bf0>] _raw_spin_unlock_irq+0x30/0x40
[ 2532.500121] hardirqs last disabled at (1092): [<ffffffff815f628f>] _raw_spin_lock_irq+0x1f/0x80
[ 2532.500130] softirqs last enabled at (0): [<ffffffff810602d8>] copy_process+0x648/0x1450
[ 2532.500142] softirqs last disabled at (0): [< (null)>] (null)
[ 2532.500150] CPU 1
[ 2532.500152] Modules linked in: ceph libceph nfsd lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep c
[ 2532.500193]
[ 2532.500200] Pid: 3051, comm: fsstress Not tainted 2.6.39+ #1 Dell Inc. OptiPlex 740/0UT225
[ 2532.500209] RIP: 0010:[<ffffffff812e7232>] [<ffffffff812e7232>] delay_tsc+0x42/0x80
[ 2532.500222] RSP: 0018:ffff880100261d98 EFLAGS: 00000202
[ 2532.500227] RAX: 0000029c8ed94457 RBX: ffffffff815f6f54 RCX: 000000008ed94457
[ 2532.500232] RDX: 000000000000029c RSI: ffff8800d4a2a180 RDI: 0000000000000001
[ 2532.500237] RBP: ffff880100261db8 R08: 0000000000000000 R09: 0000000000000000
[ 2532.500241] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff815ff88e
[ 2532.500246] R13: ffff88011f400000 R14: ffff880100260000 R15: 0000000000000000
[ 2532.500253] FS: 00007f074f487700(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000
[ 2532.500258] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2532.500263] CR2: 00007f12ffcd1000 CR3: 0000000100369000 CR4: 00000000000006e0
[ 2532.500268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2532.500273] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2532.500278] Process fsstress (pid: 3051, threadinfo ffff880100260000, task ffff88010fd9a300)
[ 2532.500282] Stack:
[ 2532.500285] 0000000021dd0240 ffff8800d4a2a168 00000000717c75b0 0000000000000001
[ 2532.500295] ffff880100261dc8 ffffffff812e71af ffff880100261e28 ffffffff812f9afc
[ 2532.500302] ffff88010fd9a300 ffff88010fd9a300 ffff88010fd9a9c0 ffff88010fd9a728
[ 2532.500310] Call Trace:
[ 2532.500320] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2532.500329] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2532.500336] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2532.500346] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2532.500352] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2532.500373] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2532.500381] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2532.500392] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2532.500398] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2532.500405] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2532.500412] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2532.500419] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2532.500427] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2532.500431] Code: 00 00 49 89 fe 0f ae f0 66 66 90 e8 c9 b7 d2 ff 66 90 4c 63 e0 eb 11 66 90 f3 90 65 8b 1c 25 78 dc 00 00 44 39 eb 75 23
[ 2532.500471] 66 90 e8 a6 b7 d2 ff 66 90 48 98 48 89 c2 4c 29 e2 49 39 d6
[ 2532.500490] Call Trace:
[ 2532.500497] [<ffffffff812e71af>] __delay+0xf/0x20
[ 2532.500504] [<ffffffff812f9afc>] do_raw_spin_lock+0x11c/0x150
[ 2532.500511] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2532.500518] [<ffffffff811880b5>] ? igrab+0x25/0x60
[ 2532.500525] [<ffffffff811880b5>] igrab+0x25/0x60
[ 2532.500537] [<ffffffffa030f063>] ceph_flush_dirty_caps+0x53/0xb0 [ceph]
[ 2532.500544] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2532.500555] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2532.500561] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2532.500568] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2532.500574] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2532.500580] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2532.500587] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2532.340009] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2532.340009] CR2: 0000000002f0e000 CR3: 0000000112095000 CR4: 00000000000006f0
[ 2532.340009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2532.340009] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2532.340009] Process fsstress (pid: 3055, threadinfo ffff8801002c8000, task ffff880117542300)
[ 2532.340009] Stack:
[ 2532.340009] ffffffff812f9afc ffff880117542300 0000000000000246 ffff8801175429c0
[ 2532.340009] ffff880117542728 ffff8801002c9d08 ffff88010eeddc48 ffff88010eeddc60
[ 2532.340009] ffff8800d4a29e58 ffff88010eedd800 ffff8801176cd000 ffff8801002c9d48
[ 2532.340009] Call Trace:
[ 2532.340009] [<ffffffff812f9afc>] ? do_raw_spin_lock+0x11c/0x150
[ 2532.340009] [<ffffffff815f61b6>] _raw_spin_lock+0x56/0x70
[ 2532.340009] [<ffffffffa030c8c1>] ? __mark_caps_flushing+0xb1/0x1a0 [ceph]
[ 2532.340009] [<ffffffffa030c66d>] ? ceph_cap_string+0xed/0x110 [ceph]
[ 2532.340009] [<ffffffffa030c8c1>] __mark_caps_flushing+0xb1/0x1a0 [ceph]
[ 2532.340009] [<ffffffffa030ee90>] ceph_check_caps+0x600/0x6f0 [ceph]
[ 2532.340009] [<ffffffffa030f082>] ceph_flush_dirty_caps+0x72/0xb0 [ceph]
[ 2532.340009] [<ffffffff8119b4f0>] ? __sync_filesystem+0x90/0x90
[ 2532.340009] [<ffffffffa03017c9>] ceph_sync_fs+0x49/0x60 [ceph]
[ 2532.340009] [<ffffffff8119b4be>] __sync_filesystem+0x5e/0x90
[ 2532.340009] [<ffffffff8119b50f>] sync_one_sb+0x1f/0x30
[ 2532.340009] [<ffffffff81171317>] iterate_supers+0x77/0xe0
[ 2532.340009] [<ffffffff8119b54f>] sys_sync+0x2f/0x70
[ 2532.340009] [<ffffffff815fee42>] system_call_fastpath+0x16/0x1b
[ 2532.340009] Code: 48 ff c8 c9 c3 eb 04 90 90 90 90 55 48 89 e5 66 66 66 66 90 48 c7 05 94 be 97 00 f0 71 2e 81 c9 c3 eb 08 90 90 90 90 90
[ 2532.340009] 48 89 e5 66 66 66 66 90 ff 15 79 be 97 00 c9 c3 eb 0d 90 90

#14 Updated by changping Wu almost 13 years ago

Hi ,i git ceph-client master branch:

commit 98cc99822dac96710a8b64bdc2be4eccffc78956
Author: Sage Weil <>
Date: Fri May 27 09:24:26 2011 -0700

ceph: use ihold when we already have an inode ref
..............................................................
+ " ceph 0.28.2 " to verify this issue.

run "./fsstress -d /mnt/ceph/fstest -l 1 -n 10000 -p 1 -v"

fsstress test can't still finish.

logs:

1/441: unlink d4/d14/d100/dcb/d1c8/c1f1 0
1/442: read d4/d15/d7d/df4/d1c/d27/d71/f1c5 [2619005,59869] 0
1/443: mkdir d4/d15/d7d/df4/d1bb/d271 0

^c //crtl +c to abort.
----------------------------------------------------
kernel log:
[64857.090811] libceph: client4102 fsid ab8e5be8-25da-b4ac-6843-c92881b4b1df
[64857.099172] libceph: mon0 172.16.10.176:6789 session established
[66905.930618] libceph: get_reply unknown tid 481838 from osd1

os :ubuntu 10.10 x86_64 ,hand compiling ceph-client .

ceph log and ceph.conf attached.

#15 Updated by changping Wu almost 13 years ago

echo 'file fs/ceph/caps.c +p' > /sys/kernel/debug/dynamic_debug/control

logs attached .

#16 Updated by changping Wu almost 13 years ago

git ceph-client commit 98cc99822dac96710a8b64bdc2be4eccffc78956 ,
hand compiling , btrfs+ ubuntu 10.10+2.6.39+ ..

echo 'file fs/ceph/caps.c +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/ceph/file.c +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/ceph/addr.c +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/ceph/mds_client.c +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/ceph/locks.c +p' > /sys/kernel/debug/dynamic_debug/control

--------------------
fsstress test logs:
...........................................................................................
0/874: write d1/d10/d25/d46/f63 [2672869,81812] 0
0/875: rename d1/d10/d25/d2d/d5e/d83 to d1/d10/d25/dfa/d11a 0
0/876: dread d1/d7f/d50/f65 [0,4194304] 0
0/877: rename d1/d10/d8b/dd4/le9 to d1/d10/d25/d46/d5b/d8d/da1/l11b 0
0/878: dread - d1/d10/d47/d86/dd8/ffc zero size
0/879: chown d1/d10/d25/d46/d5b/d8d/d28/ff9 13174 0
0/880: symlink d1/d10/d25/d84/db8/de2/d103/l11c 0
0/881: read d1/f38 [979656,102826] 0
0/882: chown d1/d10/d25/d84/l99 16763990 0
0/883: rename d1/d10/d47/d86 to d1/d10/d25/d46/d5b/d8d/d11d 0
0/884: write d1/d7f/d50/d64/d67/ff4 [1643273,104602] 0
0/885: symlink d1/d10/d25/dfa/d11a/l11e 0
0/886: mkdir d1/d10/d47/de3/df7/d107/d11f 0
0/887: mkdir d1/d10/d8b/daa/d120 0
0/888: mknod d1/d10/d47/de3/df7/d107/c121 0
0/889: mkdir d1/d32/d122 0
0/890: truncate d1/f38 5157986 0
0/891: chown d1/d10/d25/d46/d5b/d8d/d28/dae 7711732 0
^c (hang, ctrl+c ,abort)

------------------
the kernel logs attached.

#17 Updated by Sage Weil almost 13 years ago

The problem is a short O_DIRECT read that hits EOF. This seems to fix it for me:


diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 203252d..7eb2e04 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -354,8 +354,17 @@ more:
                        goto out;
                }

-               /* check i_size */
-               *checkeof = 1;
+               if (align_to_pages) {
+                       if (left > 0) {
+                               dout("zeroing trailing bytes %d\n", left);
+                               ceph_zero_page_vector_range(page_off + read,
+                                                           left, pages);
+                               read += left;
+                       }
+               } else {
+                       /* check i_size */
+                       *checkeof = 1;
+               }
        }

 out:

I'm not certain that is correct, though. I need to verify that the correct thing to do when you O_DIRECT read 4MB from a ~10k file is ~10k data + (4MB-10k) of zeros...

#18 Updated by Sage Weil almost 13 years ago

  • Status changed from 4 to In Progress

Scratch that, something a bit more subtle is going on.

#19 Updated by Sage Weil almost 13 years ago

  • Status changed from In Progress to 4

This should be fixed by commit:85defe76f7e2a0b3d285a3be72fcffce96629b5c, pushed to the master branch. Can you test and let me know? It's passing tests on my end.

FWIW, I was reproducing the problem with

#define _GNU_SOURCE
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>
       #include <stdio.h>

int main(int argc, char **argv)
{
        char buf[409600];
        int fd = open(argv[1], O_WRONLY|O_CREAT);
        ssize_t r;

        write(fd, "foo", 3);
        ftruncate(fd, 10000);
        fsync(fd);
        close(fd);

        fd = open(argv[1], O_RDONLY|O_DIRECT);
        r = read(fd, buf, sizeof(buf));
        printf("got %d\n", r);
        close(fd);

        return 0;
}

#20 Updated by changping Wu almost 13 years ago

Hi , I apply the patch ,to verify this bug,

run "./fsstress -d /mnt/ceph/fstest -l 1 -n 10000 -p 1 -v" , pass.
run "./fsstress -d /mnt/ceph/fstest -l 1 -n 1000 -p 10 -v" , pass.

this bug should be fixed .

#21 Updated by Sage Weil almost 13 years ago

  • Status changed from 4 to Resolved

Thanks Jeff!

Also available in: Atom PDF