Project

General

Profile

Bug #15887

client reboot stuck if the ceph node is not reachable or shutdown

Added by Vasu Kulkarni about 4 years ago. Updated about 1 year ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature:

Description

1) mount cephfs on client,
2) shutdown osd+mon node or make it not reachable
3) while client is accessing the mount(simple ls on dir), reboot the client
It will be stuck forever until it can reach the ceph nodes or unless hard reset is done

Expected behavior:
reboot should work when ceph nodes are not reachable

[ubuntu@mira101 ~]$ ceph -v
ceph version 10.2.0-910-gab42bc5 (ab42bc5925cc1aaaa837522c4cbcf60afb2ac764)

[ubuntu@mira101 ~]$ cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
[ubuntu@mira101 ~]$ uname -a
Linux mira101 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[vakulkar@vakulkar ~]$ ipmitool -H mira101.ipmi.sepia.ceph.com -U inktank -I lanplus sol activate
[SOL Session operational.  Use ~? for help]
[ 9116.919063] INFO: task systemd:32250 blocked for more than 120 seconds.
[ 9116.925739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9116.933721] systemd         D 000000000000001b     0 32250      0 0x00000080
[ 9116.941004]  ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8
[ 9116.948789]  ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308
[ 9116.956497]  ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b
[ 9116.964199] Call Trace:
[ 9116.966721]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 9116.971844]  [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph]
[ 9116.978265]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[ 9116.984258]  [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0
[ 9116.989560]  [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph]
[ 9116.995636]  [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30
[ 9117.001219]  [<ffffffff811e20d2>] iterate_supers+0xb2/0x110
[ 9117.006872]  [<ffffffff8120fae4>] sys_sync+0x64/0xb0
[ 9117.011941]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 9218.991528] libceph: mds0 172.21.5.138:6812 socket closed (con state OPEN)
[ 9220.038994] libceph: connect 172.21.5.138:6812 error -101
[ 9220.044413] libceph: mds0 172.21.5.138:6812 connect error
[ 9221.038566] libceph: connect 172.21.5.138:6812 error -101
[ 9221.043982] libceph: mds0 172.21.5.138:6812 connect error
[ 9223.037427] libceph: connect 172.21.5.138:6812 error -101
[ 9223.042839] libceph: mds0 172.21.5.138:6812 connect error
[ 9227.035265] libceph: connect 172.21.5.138:6812 error -101
[ 9227.040678] libceph: mds0 172.21.5.138:6812 connect error
[ 9231.784910] libceph: mon0 172.21.5.138:6789 socket closed (con state OPEN)
[ 9231.791802] libceph: mon0 172.21.5.138:6789 session lost, hunting for new mon
[ 9231.798965] libceph: connect 172.21.5.138:6789 error -101
[ 9231.804489] libceph: mon0 172.21.5.138:6789 connect error
[ 9235.046992] libceph: connect 172.21.5.138:6812 error -101
[ 9235.052416] libceph: mds0 172.21.5.138:6812 connect error
[ 9236.953992] INFO: task systemd:32250 blocked for more than 120 seconds.
[ 9236.960638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9236.968467] systemd         D 000000000000001b     0 32250      0 0x00000080
[ 9236.975619]  ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8
[ 9236.983260]  ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308
[ 9236.990856]  ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b
[ 9236.998497] Call Trace:
[ 9237.000952]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 9237.005936]  [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph]
[ 9237.012203]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[ 9237.018042]  [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0
[ 9237.023191]  [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph]
[ 9237.029112]  [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30
[ 9237.034613]  [<ffffffff811e20d2>] iterate_supers+0xb2/0x110
[ 9237.040190]  [<ffffffff8120fae4>] sys_sync+0x64/0xb0
[ 9237.045161]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 9242.083366] libceph: connect 172.21.5.138:6789 error -101
[ 9242.088780] libceph: mon0 172.21.5.138:6789 connect error
[ 9251.038495] libceph: connect 172.21.5.138:6812 error -101
[ 9251.043912] libceph: mds0 172.21.5.138:6812 connect error
[ 9252.093949] libceph: connect 172.21.5.138:6789 error -101
[ 9252.099365] libceph: mon0 172.21.5.138:6789 connect error
[ 9262.104746] libceph: connect 172.21.5.138:6789 error -101
[ 9262.110161] libceph: mon0 172.21.5.138:6789 connect error


Related issues

Related to Linux kernel client - Bug #13189: Reboot systerm can not set up. Closed 09/21/2015
Related to Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken network New 08/07/2017

History

#1 Updated by Greg Farnum about 4 years ago

  • Status changed from New to Won't Fix

If the cluster is unavailable, we can't do a clean shutdown. I guess we could try and distinguish between dirty requests and simple information ones, but that would be more difficult than it sounds.

If you do a force unmount I believe it all goes away; that's part of the reason teuthology includes the -f flag.

#3 Updated by Ilya Dryomov about 4 years ago

Also http://www.spinics.net/lists/ceph-devel/msg27376.html, http://tracker.ceph.com/issues/13189.

In the cephfs case, if you know your cluster is gone, you can do umount -f. In the local-FS-on-rbd case, umount -f on a local FS won't help, so it's much worse. That's the reason I picked on your teuthology-nuke pull request, Vasu.

Blindly aborting outstanding requests is bad, but, at least in the rbd case, if the init system wasn't set up properly and shut the network down before umounting, or if the cluster is just gone, we are past the point of return and might as well abort. It's on my TODO list.

#4 Updated by Alex Gorbachev over 1 year ago

This is still an issue today with RHEL 7.2, any recommendations on workarounds?

#5 Updated by Ilya Dryomov about 1 year ago

  • Assignee set to Ilya Dryomov

For kcephfs, I believe umount -f is now more aggressive and aborts OSD requests in addition to MDS requests, but that code is definitely not in RHEL 7.2. Just as before, you need to do umount -f before you do reboot though. Otherwise the kernel client wouldn't know that you wanted your dirty state discarded and would still hang on to it.

There aren't really any workarounds beyond "if you know that your cluster is inaccessible or gone, umount -f before rebooting", but it would be good if you could describe your scenario in more detail.

#6 Updated by Ilya Dryomov about 1 year ago

  • Related to Bug #13189: Reboot systerm can not set up. added

#7 Updated by Ilya Dryomov about 1 year ago

  • Related to Bug #20927: libceph does not give up reconnecting on broken network added

Also available in: Atom PDF