Bug #15887
client reboot stuck if the ceph node is not reachable or shutdown
0%
Description
1) mount cephfs on client,
2) shutdown osd+mon node or make it not reachable
3) while client is accessing the mount(simple ls on dir), reboot the client
It will be stuck forever until it can reach the ceph nodes or unless hard reset is done
Expected behavior:
reboot should work when ceph nodes are not reachable
[ubuntu@mira101 ~]$ ceph -v
ceph version 10.2.0-910-gab42bc5 (ab42bc5925cc1aaaa837522c4cbcf60afb2ac764)
[ubuntu@mira101 ~]$ cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [ubuntu@mira101 ~]$ uname -a Linux mira101 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [vakulkar@vakulkar ~]$ ipmitool -H mira101.ipmi.sepia.ceph.com -U inktank -I lanplus sol activate [SOL Session operational. Use ~? for help] [ 9116.919063] INFO: task systemd:32250 blocked for more than 120 seconds. [ 9116.925739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9116.933721] systemd D 000000000000001b 0 32250 0 0x00000080 [ 9116.941004] ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8 [ 9116.948789] ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308 [ 9116.956497] ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b [ 9116.964199] Call Trace: [ 9116.966721] [<ffffffff8163a909>] schedule+0x29/0x70 [ 9116.971844] [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph] [ 9116.978265] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 [ 9116.984258] [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0 [ 9116.989560] [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph] [ 9116.995636] [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30 [ 9117.001219] [<ffffffff811e20d2>] iterate_supers+0xb2/0x110 [ 9117.006872] [<ffffffff8120fae4>] sys_sync+0x64/0xb0 [ 9117.011941] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 9218.991528] libceph: mds0 172.21.5.138:6812 socket closed (con state OPEN) [ 9220.038994] libceph: connect 172.21.5.138:6812 error -101 [ 9220.044413] libceph: mds0 172.21.5.138:6812 connect error [ 9221.038566] libceph: connect 172.21.5.138:6812 error -101 [ 9221.043982] libceph: mds0 172.21.5.138:6812 connect error [ 9223.037427] libceph: connect 172.21.5.138:6812 error -101 [ 9223.042839] libceph: mds0 172.21.5.138:6812 connect error [ 9227.035265] libceph: connect 172.21.5.138:6812 error -101 [ 9227.040678] libceph: mds0 172.21.5.138:6812 connect error [ 9231.784910] libceph: mon0 172.21.5.138:6789 socket closed (con state OPEN) [ 9231.791802] libceph: mon0 172.21.5.138:6789 session lost, hunting for new mon [ 9231.798965] libceph: connect 172.21.5.138:6789 error -101 [ 9231.804489] libceph: mon0 172.21.5.138:6789 connect error [ 9235.046992] libceph: connect 172.21.5.138:6812 error -101 [ 9235.052416] libceph: mds0 172.21.5.138:6812 connect error [ 9236.953992] INFO: task systemd:32250 blocked for more than 120 seconds. [ 9236.960638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9236.968467] systemd D 000000000000001b 0 32250 0 0x00000080 [ 9236.975619] ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8 [ 9236.983260] ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308 [ 9236.990856] ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b [ 9236.998497] Call Trace: [ 9237.000952] [<ffffffff8163a909>] schedule+0x29/0x70 [ 9237.005936] [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph] [ 9237.012203] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 [ 9237.018042] [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0 [ 9237.023191] [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph] [ 9237.029112] [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30 [ 9237.034613] [<ffffffff811e20d2>] iterate_supers+0xb2/0x110 [ 9237.040190] [<ffffffff8120fae4>] sys_sync+0x64/0xb0 [ 9237.045161] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 9242.083366] libceph: connect 172.21.5.138:6789 error -101 [ 9242.088780] libceph: mon0 172.21.5.138:6789 connect error [ 9251.038495] libceph: connect 172.21.5.138:6812 error -101 [ 9251.043912] libceph: mds0 172.21.5.138:6812 connect error [ 9252.093949] libceph: connect 172.21.5.138:6789 error -101 [ 9252.099365] libceph: mon0 172.21.5.138:6789 connect error [ 9262.104746] libceph: connect 172.21.5.138:6789 error -101 [ 9262.110161] libceph: mon0 172.21.5.138:6789 connect error
Related issues
History
#1 Updated by Greg Farnum almost 8 years ago
- Status changed from New to Won't Fix
If the cluster is unavailable, we can't do a clean shutdown. I guess we could try and distinguish between dirty requests and simple information ones, but that would be more difficult than it sounds.
If you do a force unmount I believe it all goes away; that's part of the reason teuthology includes the -f flag.
#2 Updated by John Spray almost 8 years ago
See also: http://tracker.ceph.com/issues/9477
#3 Updated by Ilya Dryomov almost 8 years ago
Also http://www.spinics.net/lists/ceph-devel/msg27376.html, http://tracker.ceph.com/issues/13189.
In the cephfs case, if you know your cluster is gone, you can do umount -f. In the local-FS-on-rbd case, umount -f on a local FS won't help, so it's much worse. That's the reason I picked on your teuthology-nuke pull request, Vasu.
Blindly aborting outstanding requests is bad, but, at least in the rbd case, if the init system wasn't set up properly and shut the network down before umounting, or if the cluster is just gone, we are past the point of return and might as well abort. It's on my TODO list.
#4 Updated by Alex Gorbachev about 5 years ago
This is still an issue today with RHEL 7.2, any recommendations on workarounds?
#5 Updated by Ilya Dryomov about 5 years ago
- Assignee set to Ilya Dryomov
For kcephfs, I believe umount -f is now more aggressive and aborts OSD requests in addition to MDS requests, but that code is definitely not in RHEL 7.2. Just as before, you need to do umount -f before you do reboot though. Otherwise the kernel client wouldn't know that you wanted your dirty state discarded and would still hang on to it.
There aren't really any workarounds beyond "if you know that your cluster is inaccessible or gone, umount -f before rebooting", but it would be good if you could describe your scenario in more detail.
#6 Updated by Ilya Dryomov about 5 years ago
- Related to Bug #13189: Reboot systerm can not set up. added
#7 Updated by Ilya Dryomov about 5 years ago
- Related to Bug #20927: libceph does not give up reconnecting on broken network added