Actions
Bug #15887
closedclient reboot stuck if the ceph node is not reachable or shutdown
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
1) mount cephfs on client,
2) shutdown osd+mon node or make it not reachable
3) while client is accessing the mount(simple ls on dir), reboot the client
It will be stuck forever until it can reach the ceph nodes or unless hard reset is done
Expected behavior:
reboot should work when ceph nodes are not reachable
[ubuntu@mira101 ~]$ ceph -v
ceph version 10.2.0-910-gab42bc5 (ab42bc5925cc1aaaa837522c4cbcf60afb2ac764)
[ubuntu@mira101 ~]$ cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [ubuntu@mira101 ~]$ uname -a Linux mira101 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [vakulkar@vakulkar ~]$ ipmitool -H mira101.ipmi.sepia.ceph.com -U inktank -I lanplus sol activate [SOL Session operational. Use ~? for help] [ 9116.919063] INFO: task systemd:32250 blocked for more than 120 seconds. [ 9116.925739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9116.933721] systemd D 000000000000001b 0 32250 0 0x00000080 [ 9116.941004] ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8 [ 9116.948789] ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308 [ 9116.956497] ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b [ 9116.964199] Call Trace: [ 9116.966721] [<ffffffff8163a909>] schedule+0x29/0x70 [ 9116.971844] [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph] [ 9116.978265] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 [ 9116.984258] [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0 [ 9116.989560] [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph] [ 9116.995636] [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30 [ 9117.001219] [<ffffffff811e20d2>] iterate_supers+0xb2/0x110 [ 9117.006872] [<ffffffff8120fae4>] sys_sync+0x64/0xb0 [ 9117.011941] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 9218.991528] libceph: mds0 172.21.5.138:6812 socket closed (con state OPEN) [ 9220.038994] libceph: connect 172.21.5.138:6812 error -101 [ 9220.044413] libceph: mds0 172.21.5.138:6812 connect error [ 9221.038566] libceph: connect 172.21.5.138:6812 error -101 [ 9221.043982] libceph: mds0 172.21.5.138:6812 connect error [ 9223.037427] libceph: connect 172.21.5.138:6812 error -101 [ 9223.042839] libceph: mds0 172.21.5.138:6812 connect error [ 9227.035265] libceph: connect 172.21.5.138:6812 error -101 [ 9227.040678] libceph: mds0 172.21.5.138:6812 connect error [ 9231.784910] libceph: mon0 172.21.5.138:6789 socket closed (con state OPEN) [ 9231.791802] libceph: mon0 172.21.5.138:6789 session lost, hunting for new mon [ 9231.798965] libceph: connect 172.21.5.138:6789 error -101 [ 9231.804489] libceph: mon0 172.21.5.138:6789 connect error [ 9235.046992] libceph: connect 172.21.5.138:6812 error -101 [ 9235.052416] libceph: mds0 172.21.5.138:6812 connect error [ 9236.953992] INFO: task systemd:32250 blocked for more than 120 seconds. [ 9236.960638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 9236.968467] systemd D 000000000000001b 0 32250 0 0x00000080 [ 9236.975619] ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8 [ 9236.983260] ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308 [ 9236.990856] ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b [ 9236.998497] Call Trace: [ 9237.000952] [<ffffffff8163a909>] schedule+0x29/0x70 [ 9237.005936] [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph] [ 9237.012203] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 [ 9237.018042] [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0 [ 9237.023191] [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph] [ 9237.029112] [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30 [ 9237.034613] [<ffffffff811e20d2>] iterate_supers+0xb2/0x110 [ 9237.040190] [<ffffffff8120fae4>] sys_sync+0x64/0xb0 [ 9237.045161] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [ 9242.083366] libceph: connect 172.21.5.138:6789 error -101 [ 9242.088780] libceph: mon0 172.21.5.138:6789 connect error [ 9251.038495] libceph: connect 172.21.5.138:6812 error -101 [ 9251.043912] libceph: mds0 172.21.5.138:6812 connect error [ 9252.093949] libceph: connect 172.21.5.138:6789 error -101 [ 9252.099365] libceph: mon0 172.21.5.138:6789 connect error [ 9262.104746] libceph: connect 172.21.5.138:6789 error -101 [ 9262.110161] libceph: mon0 172.21.5.138:6789 connect error
Actions