Project

General

Profile

Bug #3370

Updated by David Zafman over 11 years ago


 Workunit bonnie hung over NFS client with retransmitted NFS read: 

 ubuntu      2667    2572    0 Oct18 ?          00:00:00 bash -c mkdir -- /tmp/cephtest/mnt.1/client.1/tmp && cd -- /tmp/cephtest/mnt.1/cli 
 ubuntu      2669    2667    0 Oct18 ?          00:00:00 /bin/bash /tmp/cephtest/workunit.client.1/suites/bonnie.sh 
 ubuntu      2672    2669    0 Oct18 ?          00:01:09 /usr/sbin/bonnie++ -n 100 

 In the syslog the kernel noticed nfsd not making progress: 

 INFO: task nfsd:1181 blocked for more than 120 seconds. 

 All 8 nfsd processes look like this 
 [<ffffffff8112a20e>] sleep_on_page+0xe/0x20 
 [<ffffffff8112a1f7>] __lock_page+0x67/0x70 
 [<ffffffff811aaa2f>] __generic_file_splice_read+0x59f/0x5d0 
 [<ffffffff811aaa9e>] generic_file_splice_read+0x3e/0x80 
 [<ffffffff811a921b>] do_splice_to+0x7b/0xa0 
 [<ffffffff811a94d7>] splice_direct_to_actor+0xa7/0x1c0 
 [<ffffffffa036b762>] nfsd_vfs_read.isra.13+0x112/0x160 [nfsd] 
 [<ffffffffa036dc98>] nfsd_read_file+0x88/0xb0 [nfsd] 
 [<ffffffffa037c7a2>] nfsd4_encode_read+0x132/0x1f0 [nfsd] 
 [<ffffffffa03815dd>] nfsd4_encode_operation+0x5d/0xa0 [nfsd] 
 [<ffffffffa037851a>] nfsd4_proc_compound+0x25a/0x630 [nfsd] 
 [<ffffffffa0367b4e>] nfsd_dispatch+0xbe/0x1c0 [nfsd] 
 [<ffffffffa025ab19>] svc_process+0x489/0x7a0 [sunrpc] 
 [<ffffffffa036718d>] nfsd+0xbd/0x1a0 [nfsd] 
 [<ffffffff810791fe>] kthread+0xae/0xc0 
 [<ffffffff8163f3c4>] kernel_thread_helper+0x4/0x10 
 [<ffffffffffffffff>] 0xffffffffffffffff 

 A direct read attempt through the ceph client: 

 dd if=/tmp/cephtest/mnt.0/client.1/tmp/Bonnie.2672 of=/dev/null 

 Hung here 
 [<ffffffff8112a22e>] sleep_on_page_killable+0xe/0x40 
 [<ffffffff8112a187>] __lock_page_killable+0x67/0x70 
 [<ffffffff8112c63e>] generic_file_aio_read+0x48e/0x730 
 [<ffffffffa03f1d54>] ceph_aio_read+0x654/0x880 [ceph] 
 [<ffffffff8117b703>] do_sync_read+0xa3/0xe0 
 [<ffffffff8117c060>] vfs_read+0xb0/0x180 
 [<ffffffff8117c17a>] sys_read+0x4a/0x90 
 [<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b 
 [<ffffffffffffffff>] 0xffffffffffffffff 

 I'm categorizing as ceph client issue, it is likely an interaction with kernel nfs server. 

Back