Support #23254
closedClinet side write freeze for 15 Sec when one storage node rebooted
0%
Description
I am running ceph version 12.2.2, with bluestore. In my system I am running 3 storage nodes with 2 OSD each. Monitors and Mgr running on 3 controller
A. Test details -
1. Launch a fedora-24 VM. start writing with dd on attached cinder volume. and other window monitor the resultant file size.
2. Restart one storage node @ Tue Mar 6 14:22:59 EET 2018
B. Observation
1. File size is constant for 15 sec, after that size grows
Also tried with osd_heartbeat_grace = 2 and osd_mon_heartbeat_interval = 5 but write freeze time doesn't improve
System settings
[root@controller-1 ~]# sudo ceph -s
cluster:
id: 08762c5c-52fd-4d29-91ec-987d6ece068e
health: HEALTH_OK
services:
mon: 3 daemons, quorum controller-1,controller-2,controller-3
mgr: controller-3(active), standbys: controller-1, controller-2
osd: 6 osds: 6 up, 6 in
data:
pools: 4 pools, 736 pgs
objects: 34581 objects, 134 GB
usage: 275 GB used, 1959 GB / 2235 GB avail
pgs: 736 active+clean
[root@controller-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.18271 root default
-5 0.72757 host storage-1
0 ssd 0.36378 osd.0 up 1.00000 1.00000
3 ssd 0.36378 osd.3 up 1.00000 1.00000
-3 0.72757 host storage-2
1 ssd 0.36378 osd.1 up 1.00000 1.00000
4 ssd 0.36378 osd.4 up 1.00000 1.00000
-7 0.72757 host storage-3
2 ssd 0.36378 osd.2 up 1.00000 1.00000
5 ssd 0.36378 osd.5 up 1.00000 1.00000
Files
Updated by Greg Farnum about 6 years ago
- Tracker changed from Bug to Support
Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?
Updated by Suvendu Mitra about 6 years ago
- File ceph_w.txt ceph_w.txt added
Greg Farnum wrote:
Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?
Here is the data you requested
1. Issue reboot from Storage-1
[root@storage-1 ~]# date; reboot -f
Mon Mar 19 10:42:01 EET 2018
Rebooting.
2. Portion of Ceph -w output on other window
2018-03-19 10:39:14.438744 mon.controller-1 [INF] mon.1 192.168.1.21:6789/0
2018-03-19 10:39:14.438817 mon.controller-1 [INF] mon.2 192.168.1.22:6789/0
2018-03-19 10:42:17.509515 mon.controller-1 [INF] osd.0 failed (root=default,host=storage-1) (2 reporters from different host after 20.000223 >= grace 20.000000)
2018-03-19 10:42:17.509701 mon.controller-1 [INF] osd.3 failed (root=default,host=storage-1) (2 reporters from different host after 20.000354 >= grace 20.000000)
2018-03-19 10:42:18.197765 mon.controller-1 [WRN] Health check failed: 2 osds down (OSD_DOWN)
2018-03-19 10:42:18.197821 mon.controller-1 [WRN] Health check failed: 1 host (2 osds) down (OSD_HOST_DOWN)
2018-03-19 10:42:21.213696 mon.controller-1 [WRN] Health check failed: Degraded data redundancy: 3078/17364 objects degraded (17.726%), 88 pgs unclean, 253 pgs degraded (PG_DEGRADED)
So it takes 17 second to detect the OSD failure.
On client side I see 17 sec write freeze. Also attached full "ceph -w" output for reference
Updated by Greg Farnum about 6 years ago
- Status changed from New to Closed
This is expected behavior, especially if you're knocking out a monitor at the same time as your OSD.
If you're trying to reduce the detection time, you're missing some heartbeat settings on the OSD. They should be documented as a group.
Updated by Suvendu Mitra about 6 years ago
No in this case Monitor is not shutdown only OSD node is rebooted. You have misunderstood this case. if you some setting for to minimize the freeze please let us know that.
Updated by Greg Farnum about 6 years ago
The monitor needs to see the change to the osd_heartbeat_interval and osd_heartbeat_grace settings, not just the OSDs. (And the grace should be larger than the interval.)
The output line about "grace 20" tells us that the monitor is still seeing the defaults.