Project

General

Profile

Actions

Bug #3370

closed

All nfsd hung trying to lock page(s) on export of kclient ceph

Added by David Zafman over 11 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Workunit bonnie hung over NFS client with retransmitted NFS read:

ubuntu 2667 2572 0 Oct18 ? 00:00:00 bash c mkdir - /tmp/cephtest/mnt.1/client.1/tmp && cd -- /tmp/cephtest/mnt.1/cli
ubuntu 2669 2667 0 Oct18 ? 00:00:00 /bin/bash /tmp/cephtest/workunit.client.1/suites/bonnie.sh
ubuntu 2672 2669 0 Oct18 ? 00:01:09 /usr/sbin/bonnie++ -n 100

In the syslog the kernel noticed nfsd not making progress:

INFO: task nfsd:1181 blocked for more than 120 seconds.

All 8 nfsd processes look like this
[<ffffffff8112a20e>] sleep_on_page+0xe/0x20
[<ffffffff8112a1f7>] __lock_page+0x67/0x70
[<ffffffff811aaa2f>] __generic_file_splice_read+0x59f/0x5d0
[<ffffffff811aaa9e>] generic_file_splice_read+0x3e/0x80
[<ffffffff811a921b>] do_splice_to+0x7b/0xa0
[<ffffffff811a94d7>] splice_direct_to_actor+0xa7/0x1c0
[<ffffffffa036b762>] nfsd_vfs_read.isra.13+0x112/0x160 [nfsd]
[<ffffffffa036dc98>] nfsd_read_file+0x88/0xb0 [nfsd]
[<ffffffffa037c7a2>] nfsd4_encode_read+0x132/0x1f0 [nfsd]
[<ffffffffa03815dd>] nfsd4_encode_operation+0x5d/0xa0 [nfsd]
[<ffffffffa037851a>] nfsd4_proc_compound+0x25a/0x630 [nfsd]
[<ffffffffa0367b4e>] nfsd_dispatch+0xbe/0x1c0 [nfsd]
[<ffffffffa025ab19>] svc_process+0x489/0x7a0 [sunrpc]
[<ffffffffa036718d>] nfsd+0xbd/0x1a0 [nfsd]
[<ffffffff810791fe>] kthread+0xae/0xc0
[<ffffffff8163f3c4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

A direct read attempt through the ceph client:

dd if=/tmp/cephtest/mnt.0/client.1/tmp/Bonnie.2672 of=/dev/null

Hung here
[<ffffffff8112a22e>] sleep_on_page_killable+0xe/0x40
[<ffffffff8112a187>] __lock_page_killable+0x67/0x70
[<ffffffff8112c63e>] generic_file_aio_read+0x48e/0x730
[<ffffffffa03f1d54>] ceph_aio_read+0x654/0x880 [ceph]
[<ffffffff8117b703>] do_sync_read+0xa3/0xe0
[<ffffffff8117c060>] vfs_read+0xb0/0x180
[<ffffffff8117c17a>] sys_read+0x4a/0x90
[<ffffffff8163e1e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

I'm categorizing as ceph client issue, it is likely an interaction with kernel nfs server.

Actions

Also available in: Atom PDF