Project

General

Profile

Actions

Bug #15887

closed

client reboot stuck if the ceph node is not reachable or shutdown

Added by Vasu Kulkarni almost 8 years ago. Updated about 5 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

1) mount cephfs on client,
2) shutdown osd+mon node or make it not reachable
3) while client is accessing the mount(simple ls on dir), reboot the client
It will be stuck forever until it can reach the ceph nodes or unless hard reset is done

Expected behavior:
reboot should work when ceph nodes are not reachable

[ubuntu@mira101 ~]$ ceph -v
ceph version 10.2.0-910-gab42bc5 (ab42bc5925cc1aaaa837522c4cbcf60afb2ac764)

[ubuntu@mira101 ~]$ cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
[ubuntu@mira101 ~]$ uname -a
Linux mira101 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

[vakulkar@vakulkar ~]$ ipmitool -H mira101.ipmi.sepia.ceph.com -U inktank -I lanplus sol activate
[SOL Session operational.  Use ~? for help]
[ 9116.919063] INFO: task systemd:32250 blocked for more than 120 seconds.
[ 9116.925739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9116.933721] systemd         D 000000000000001b     0 32250      0 0x00000080
[ 9116.941004]  ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8
[ 9116.948789]  ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308
[ 9116.956497]  ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b
[ 9116.964199] Call Trace:
[ 9116.966721]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 9116.971844]  [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph]
[ 9116.978265]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[ 9116.984258]  [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0
[ 9116.989560]  [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph]
[ 9116.995636]  [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30
[ 9117.001219]  [<ffffffff811e20d2>] iterate_supers+0xb2/0x110
[ 9117.006872]  [<ffffffff8120fae4>] sys_sync+0x64/0xb0
[ 9117.011941]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 9218.991528] libceph: mds0 172.21.5.138:6812 socket closed (con state OPEN)
[ 9220.038994] libceph: connect 172.21.5.138:6812 error -101
[ 9220.044413] libceph: mds0 172.21.5.138:6812 connect error
[ 9221.038566] libceph: connect 172.21.5.138:6812 error -101
[ 9221.043982] libceph: mds0 172.21.5.138:6812 connect error
[ 9223.037427] libceph: connect 172.21.5.138:6812 error -101
[ 9223.042839] libceph: mds0 172.21.5.138:6812 connect error
[ 9227.035265] libceph: connect 172.21.5.138:6812 error -101
[ 9227.040678] libceph: mds0 172.21.5.138:6812 connect error
[ 9231.784910] libceph: mon0 172.21.5.138:6789 socket closed (con state OPEN)
[ 9231.791802] libceph: mon0 172.21.5.138:6789 session lost, hunting for new mon
[ 9231.798965] libceph: connect 172.21.5.138:6789 error -101
[ 9231.804489] libceph: mon0 172.21.5.138:6789 connect error
[ 9235.046992] libceph: connect 172.21.5.138:6812 error -101
[ 9235.052416] libceph: mds0 172.21.5.138:6812 connect error
[ 9236.953992] INFO: task systemd:32250 blocked for more than 120 seconds.
[ 9236.960638] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9236.968467] systemd         D 000000000000001b     0 32250      0 0x00000080
[ 9236.975619]  ffff88042784fe48 0000000000000082 ffff880426a35080 ffff88042784ffd8
[ 9236.983260]  ffff88042784ffd8 ffff88042784ffd8 ffff880426a35080 ffff8803ac8f5308
[ 9236.990856]  ffff8803ac8f5630 0000000000000000 ffff8804272a1000 000000000000001b
[ 9236.998497] Call Trace:
[ 9237.000952]  [<ffffffff8163a909>] schedule+0x29/0x70
[ 9237.005936]  [<ffffffffa04529c3>] ceph_mdsc_sync+0x3a3/0x600 [ceph]
[ 9237.012203]  [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30
[ 9237.018042]  [<ffffffff8120f9b0>] ? do_fsync+0xa0/0xa0
[ 9237.023191]  [<ffffffffa0430652>] ceph_sync_fs+0x62/0xd0 [ceph]
[ 9237.029112]  [<ffffffff8120f9d0>] sync_fs_one_sb+0x20/0x30
[ 9237.034613]  [<ffffffff811e20d2>] iterate_supers+0xb2/0x110
[ 9237.040190]  [<ffffffff8120fae4>] sys_sync+0x64/0xb0
[ 9237.045161]  [<ffffffff81645909>] system_call_fastpath+0x16/0x1b
[ 9242.083366] libceph: connect 172.21.5.138:6789 error -101
[ 9242.088780] libceph: mon0 172.21.5.138:6789 connect error
[ 9251.038495] libceph: connect 172.21.5.138:6812 error -101
[ 9251.043912] libceph: mds0 172.21.5.138:6812 connect error
[ 9252.093949] libceph: connect 172.21.5.138:6789 error -101
[ 9252.099365] libceph: mon0 172.21.5.138:6789 connect error
[ 9262.104746] libceph: connect 172.21.5.138:6789 error -101
[ 9262.110161] libceph: mon0 172.21.5.138:6789 connect error


Related issues 2 (1 open1 closed)

Related to Linux kernel client - Bug #13189: Reboot systerm can not set up.ClosedIlya Dryomov09/21/2015

Actions
Related to Linux kernel client - Bug #20927: libceph does not give up reconnecting on broken networkNew08/07/2017

Actions
Actions

Also available in: Atom PDF