Bug #3370

All nfsd hung trying to lock page(s) on export of kclient ceph

Added by David Zafman about 11 years ago. Updated about 6 years ago.

Target version:
% Done:


3 - minor
Affected Versions:
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):


Workunit bonnie hung over NFS client with retransmitted NFS read:

ubuntu 2667 2572 0 Oct18 ? 00:00:00 bash c mkdir - /tmp/cephtest/mnt.1/client.1/tmp && cd -- /tmp/cephtest/mnt.1/cli
ubuntu 2669 2667 0 Oct18 ? 00:00:00 /bin/bash /tmp/cephtest/workunit.client.1/suites/
ubuntu 2672 2669 0 Oct18 ? 00:01:09 /usr/sbin/bonnie++ -n 100

In the syslog the kernel noticed nfsd not making progress:

INFO: task nfsd:1181 blocked for more than 120 seconds.

All 8 nfsd processes look like this
[<ffffffff8112a20e>] sleep_on_page+0xe/0x20
[<ffffffff8112a1f7>] __lock_page+0x67/0x70
[<ffffffff811aaa2f>] __generic_file_splice_read+0x59f/0x5d0
[<ffffffff811aaa9e>] generic_file_splice_read+0x3e/0x80
[<ffffffff811a921b>] do_splice_to+0x7b/0xa0
[<ffffffff811a94d7>] splice_direct_to_actor+0xa7/0x1c0
[<ffffffffa036b762>] nfsd_vfs_read.isra.13+0x112/0x160 [nfsd]
[<ffffffffa036dc98>] nfsd_read_file+0x88/0xb0 [nfsd]
[<ffffffffa037c7a2>] nfsd4_encode_read+0x132/0x1f0 [nfsd]
[<ffffffffa03815dd>] nfsd4_encode_operation+0x5d/0xa0 [nfsd]
[<ffffffffa037851a>] nfsd4_proc_compound+0x25a/0x630 [nfsd]
[<ffffffffa0367b4e>] nfsd_dispatch+0xbe/0x1c0 [nfsd]
[<ffffffffa025ab19>] svc_process+0x489/0x7a0 [sunrpc]
[<ffffffffa036718d>] nfsd+0xbd/0x1a0 [nfsd]
[<ffffffff810791fe>] kthread+0xae/0xc0
[<ffffffff8163f3c4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

A direct read attempt through the ceph client:

dd if=/tmp/cephtest/mnt.0/client.1/tmp/Bonnie.2672 of=/dev/null

Hung here
[<ffffffff8112a22e>] sleep_on_page_killable+0xe/0x40
[<ffffffff8112a187>] __lock_page_killable+0x67/0x70
[<ffffffff8112c63e>] generic_file_aio_read+0x48e/0x730
[<ffffffffa03f1d54>] ceph_aio_read+0x654/0x880 [ceph]
[<ffffffff8117b703>] do_sync_read+0xa3/0xe0
[<ffffffff8117c060>] vfs_read+0xb0/0x180
[<ffffffff8117c17a>] sys_read+0x4a/0x90
[<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

I'm categorizing as ceph client issue, it is likely an interaction with kernel nfs server.


#1 Updated by David Zafman about 11 years ago

  • Description updated (diff)

I verified that PG_locked was set in struct page flags field. I suspected that ceph_readpages() was leaving pages locked, so I ran my test case with that function disabled. That function is not called in a ceph kernel client read, but is part of readahead that ends up in the code path that the kernel NFS server uses to read files.

My Bonnie run with that function disabled was able to get past the I/O portion of the test without hanging. During some earlier testing I didn't see the function finish_read() getting called at all. I presume that's where the unlock_page() from the complete I/O is supposed to occur.

#2 Updated by Sage Weil about 11 years ago

It might be that leaving the pages locked for the duration of the read is the wrong thing. My recollection is vague, but I think we've switched this behavior around a few different times. In 7c272194e66e91830b90f6202e61c69f8590f1eb we switched from a blocking implementation (which sucked for obvious reasons, but left the pages locked for the duration of the read) to an async one, which still left them locked. I suggest checking other file systems to see what their readpages behavior is...

#3 Updated by David Zafman about 11 years ago

  • Status changed from New to Fix Under Review

#4 Updated by David Zafman almost 11 years ago

  • Status changed from Fix Under Review to Resolved

commit: 2978257c56935878f8a756c6cb169b569e99bb91

#5 Updated by Greg Farnum over 6 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (26)

#6 Updated by zhou wei about 6 years ago

David Zafman wrote:

commit: 2978257c56935878f8a756c6cb169b569e99bb91

I can't find this commit? can some body give me a refer?

Also available in: Atom PDF