umount stuck on NFS over RBD switch over by using Pacemaker

Added by Wen-Dwo Hwang Hwang over 8 years ago. Updated over 8 years ago.

3 - minor
I am testing NFS over RBD recently. I am trying to build the NFS HA environment under Ubuntu 14.04 for testing, and the packages version information as follows:
- Ubuntu 14.04 : 3.13.0-32-generic(Ubuntu 14.04.2 LTS)
- ceph : 0.80.9-0ubuntu0.14.04.2
- ceph-common : 0.80.9-0ubuntu0.14.04.2
- pacemaker (git20130802-1ubuntu2.3)
- corosync (2.3.3-1ubuntu1)
PS1: I also tried ceph/ceph-common(0.87.1-1trusty and 0.87.2-1trusty) on 3.13.0-48-generic(Ubuntu 14.04.2) server and I got same situations.
PS2: I also upgraded the kernel version of Ubuntu 14.04 AMD64 to 3.18 and I got the same situations.

The environment has 5 nodes (running in VirtualBox) in the Ceph cluster (3 MONs and 5 OSDs) and two NFS gateway (nfs1 and nfs2) for high availability. I issued the command, 'sudo service pacemaker stop', on 'nfs1' to force these resources stopped and 
transferred to 'nfs2', and vice versa.

When the two nodes are up, I issue 'sudo service pacemaker stop' on one node, the other node will take over all resources. Everything looks fine. Then I wait about 30 minutes and do nothing to the NFS gateways. I repeated the previous steps to test fail over procedure. I found the process code of 'umount' is 'D' (uninterruptible sleep), the 'ps' showed the following result

root 21047 0.0 0.0 17412 952 ? D 16:39 0:00 umount /mnt/block1

Have any idea to solve or work around? Because of 'umount' stuck, both 'reboot' and 'shutdown' command can't work well. So if I don't wait 20 minutes for 'umount' time out, the only way I can do is powering off the server directly.
Any help would be much appreciated.

I attached my configurations and loggings as follows.

Pacemaker configurations:

crm configure primitive p_rbd_map_1 \
params user="admin" pool="block_data" name="data01"
cephconf="/etc/ceph/ceph.conf" \
op monitor interval="10s" timeout="20s"

crm configure primitive p_fs_rbd_1 ocf:heartbeat:Filesystem \
params directory="/mnt/block1" fstype="xfs" device="/dev/rbd1" \
fast_stop="no" options="noatime,nodiratime,nobarrier,inode64" \
op monitor interval="20s" timeout="40s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"

crm configure primitive p_export_rbd_1 ocf:heartbeat:exportfs \
params directory="/mnt/block1" clientspec=""
options="rw,async,no_subtree_check,no_root_squash" fsid="1" \
op monitor interval="10s" timeout="20s" \
op start interval="0" timeout="40s"

crm configure primitive p_vip_1 ocf:heartbeat:IPaddr2 \
params ip="" cidr_netmask="24" \
op monitor interval="5"

crm configure primitive p_nfs_server lsb:nfs-kernel-server \
op monitor interval="10s" timeout="30s"

crm configure primitive p_rpcbind upstart:rpcbind \
op monitor interval="10s" timeout="30s"

crm configure group g_rbd_share_1 p_rbd_map_1 p_fs_rbd_1 p_export_rbd_1 p_vip_1 \
meta target-role="Started"

crm configure group g_nfs p_rpcbind p_nfs_server \
meta target-role="Started"

crm configure clone clo_nfs g_nfs \
meta globally-unique="false" target-role="Started"

'crm_mon' status results for normal condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 ( Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1
p_export_rbd_1 (ocf::heartbeat:exportfs): Started nfs1
p_vip_1 (ocf::heartbeat:IPaddr2): Started nfs1
Clone Set: clo_nfs [g_nfs]
Started: [ nfs1 nfs2 ]

'crm_mon' status results for fail over condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 ( Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1 (unmanaged) FAILED
p_export_rbd_1 (ocf::heartbeat:exportfs): Stopped
p_vip_1 (ocf::heartbeat:IPaddr2): Stopped
Clone Set: clo_nfs [g_nfs]
Started: [ nfs2 ]
Stopped: [ nfs1 ]

Failed actions:
p_fs_rbd_1_stop_0 (node=nfs1, call=114, rc=1, status=Timed Out,
last-rc-change=Wed May 13 16:39:10 2015, queued=60002ms, exec=1ms
): unknown error

'demsg' messages:

[ 9470.284509] nfsd: last server has exited, flushing export cache
[ 9470.322893] init: rpcbind main process (4267) terminated with status 2
[ 9600.520281] INFO: task umount:2675 blocked for more than 120 seconds.
[ 9600.520445] Not tainted 3.13.0-32-generic #57-Ubuntu
[ 9600.520570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
[ 9600.520792] umount D ffff88003fc13480 0 2675 1 0x00000000
[ 9600.520800] ffff88003a4f9dc0 0000000000000082 ffff880039ece000
[ 9600.520805] 0000000000013480 0000000000013480 ffff880039ece000
[ 9600.520809] ffff88003fc141a0 0000000000000001 0000000000000000
[ 9600.520814] Call Trace:
[ 9600.520830] [<ffffffff817251a9>] schedule+0x29/0x70
[ 9600.520882] [<ffffffffa043b300>] _xfs_log_force+0x220/0x280 [xfs]
[ 9600.520891] [<ffffffff8109a9b0>] ? wake_up_state+0x20/0x20
[ 9600.520922] [<ffffffffa043b386>] xfs_log_force+0x26/0x80 [xfs]
[ 9600.520947] [<ffffffffa03f3b6d>] xfs_fs_sync_fs+0x2d/0x50 [xfs]
[ 9600.520954] [<ffffffff811edc22>] sync_filesystem+0x72/0xa0
[ 9600.520960] [<ffffffff811bfe30>] generic_shutdown_super+0x30/0xf0
[ 9600.520966] [<ffffffff811c0127>] kill_block_super+0x27/0x70
[ 9600.520971] [<ffffffff811c040d>] deactivate_locked_super+0x3d/0x60
[ 9600.520976] [<ffffffff811c09c6>] deactivate_super+0x46/0x60
[ 9600.520981] [<ffffffff811dd856>] mntput_no_expire+0xd6/0x170
[ 9600.520986] [<ffffffff811dedfe>] SyS_umount+0x8e/0x100
[ 9600.520991] [<ffffffff8173186d>] system_call_fastpath+0x1a/0x1f
[ 9720.520295] INFO: task xfsaild/rbd1:5577 blocked for more than 120 seconds.
[ 9720.520449] Not tainted 3.13.0-32-generic #57-Ubuntu
[ 9720.520574] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
[ 9720.520797] xfsaild/rbd1 D ffff88003fc13480 0 5577 2 0x00000000
[ 9720.520805] ffff88003b571d58 0000000000000046 ffff88003c404800
[ 9720.520811] 0000000000013480 0000000000013480 ffff88003c404800
[ 9720.520815] ffff88003fc141a0 0000000000000001 0000000000000000
[ 9720.520819] Call Trace:
[ 9720.520835] [<ffffffff817251a9>] schedule+0x29/0x70
[ 9720.520887] [<ffffffffa043b300>] _xfs_log_force+0x220/0x280 [xfs]
[ 9720.520896] [<ffffffff8109a9b0>] ? wake_up_state+0x20/0x20
[ 9720.520927] [<ffffffffa043b386>] xfs_log_force+0x26/0x80 [xfs]
[ 9720.520958] [<ffffffffa043f920>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 9720.520986] [<ffffffffa043fa61>] xfsaild+0x141/0x5c0 [xfs]
[ 9720.521013] [<ffffffffa043f920>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 9720.521019] [<ffffffff8108b572>] kthread+0xd2/0xf0
[ 9720.521024] [<ffffffff8108b4a0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 9720.521029] [<ffffffff817317bc>] ret_from_fork+0x7c/0xb0
[ 9720.521033] [<ffffffff8108b4a0>] ? kthread_create_on_node+0x1c0/0x1c0


#1 Updated by Josh Durgin over 8 years ago

  • Project changed from rbd to Linux kernel client
  • Category set to rbd

#2 Updated by Ilya Dryomov over 8 years ago

I'm not exactly sure what's going on here. Are there any stuck osd requests (you can check with cat /sys/kernel/debug/ceph/*/osdc)?
Can you try to distill this test case into a bunch of commands that don't involve pacemaker to rule out pacemaker configuration issues?

#3 Updated by Ilya Dryomov over 8 years ago

  • Status changed from New to Need More Info

#4 Updated by Wen-Dwo Hwang Hwang over 8 years ago

Thanks for your reply.
I followed your suggestion to do some test again that don't involve pacemaker. I have tried to wait 10, 20, 60 minutes and the 'umount' procedure seems work well. I will focus on the pacemaker configurations adjustment at the next step.
By the way, I attached my test commands as follows. (NFS gateway node kernel version 3.18.0-031800-generic)
(1) Create pool
sudo ceph osd pool create block_data1 4096

(2) Prepare the testing environment
sudo mkdir -p /mnt/block1
Edit /etc/exports
/mnt/block1 *(rw,async,no_subtree_check,no_root_squash)

(3) Test begins shell scrip:
sudo rbd map data01 -p block_data1
sudo mount -o noatime,nodiratime,nobarrier,inode64 /dev/rbd0 /mnt/block1
sudo exportfs -v -o rw,async,no_subtree_check,no_root_squash
sudo service rpcbind start
sudo service nfs-kernel-server start

(4) Test ends shell script:
sudo service nfs-kernel-server stop
sudo service rpcbind stop
sudo exportfs -v -u
sudo umount /mnt/block1
sudo rbd unmap /dev/rbd0

