Project

General

Profile

Actions

Support #23254

closed

Clinet side write freeze for 15 Sec when one storage node rebooted

Added by Suvendu Mitra about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

I am running ceph version 12.2.2, with bluestore. In my system I am running 3 storage nodes with 2 OSD each. Monitors and Mgr running on 3 controller

A. Test details -
1. Launch a fedora-24 VM. start writing with dd on attached cinder volume. and other window monitor the resultant file size.
2. Restart one storage node @ Tue Mar 6 14:22:59 EET 2018

B. Observation
1. File size is constant for 15 sec, after that size grows

Also tried with osd_heartbeat_grace = 2 and osd_mon_heartbeat_interval = 5 but write freeze time doesn't improve

System settings

[root@controller-1 ~]# sudo ceph -s
cluster:
id: 08762c5c-52fd-4d29-91ec-987d6ece068e
health: HEALTH_OK

services:
mon: 3 daemons, quorum controller-1,controller-2,controller-3
mgr: controller-3(active), standbys: controller-1, controller-2
osd: 6 osds: 6 up, 6 in
data:
pools: 4 pools, 736 pgs
objects: 34581 objects, 134 GB
usage: 275 GB used, 1959 GB / 2235 GB avail
pgs: 736 active+clean

[root@controller-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.18271 root default
-5 0.72757 host storage-1
0 ssd 0.36378 osd.0 up 1.00000 1.00000
3 ssd 0.36378 osd.3 up 1.00000 1.00000
-3 0.72757 host storage-2
1 ssd 0.36378 osd.1 up 1.00000 1.00000
4 ssd 0.36378 osd.4 up 1.00000 1.00000
-7 0.72757 host storage-3
2 ssd 0.36378 osd.2 up 1.00000 1.00000
5 ssd 0.36378 osd.5 up 1.00000 1.00000


Files

storage_3.tar.gz (257 KB) storage_3.tar.gz Ceph log from storage-3 Suvendu Mitra, 03/07/2018 08:35 AM
controller-1.tar.gz (951 KB) controller-1.tar.gz Ceph log from controller-1 Suvendu Mitra, 03/07/2018 08:37 AM
ceph_w.txt (23.7 KB) ceph_w.txt Suvendu Mitra, 03/19/2018 08:55 AM
Actions #1

Updated by Greg Farnum about 6 years ago

  • Tracker changed from Bug to Support

Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?

Actions #2

Updated by Suvendu Mitra about 6 years ago

Greg Farnum wrote:

Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?

Here is the data you requested
1. Issue reboot from Storage-1
[root@storage-1 ~]# date; reboot -f
Mon Mar 19 10:42:01 EET 2018
Rebooting.

2. Portion of Ceph -w output on other window
2018-03-19 10:39:14.438744 mon.controller-1 [INF] mon.1 192.168.1.21:6789/0
2018-03-19 10:39:14.438817 mon.controller-1 [INF] mon.2 192.168.1.22:6789/0
2018-03-19 10:42:17.509515 mon.controller-1 [INF] osd.0 failed (root=default,host=storage-1) (2 reporters from different host after 20.000223 >= grace 20.000000)
2018-03-19 10:42:17.509701 mon.controller-1 [INF] osd.3 failed (root=default,host=storage-1) (2 reporters from different host after 20.000354 >= grace 20.000000)
2018-03-19 10:42:18.197765 mon.controller-1 [WRN] Health check failed: 2 osds down (OSD_DOWN)
2018-03-19 10:42:18.197821 mon.controller-1 [WRN] Health check failed: 1 host (2 osds) down (OSD_HOST_DOWN)
2018-03-19 10:42:21.213696 mon.controller-1 [WRN] Health check failed: Degraded data redundancy: 3078/17364 objects degraded (17.726%), 88 pgs unclean, 253 pgs degraded (PG_DEGRADED)

So it takes 17 second to detect the OSD failure.

On client side I see 17 sec write freeze. Also attached full "ceph -w" output for reference

Actions #3

Updated by Greg Farnum about 6 years ago

  • Status changed from New to Closed

This is expected behavior, especially if you're knocking out a monitor at the same time as your OSD.

If you're trying to reduce the detection time, you're missing some heartbeat settings on the OSD. They should be documented as a group.

Actions #4

Updated by Suvendu Mitra about 6 years ago

No in this case Monitor is not shutdown only OSD node is rebooted. You have misunderstood this case. if you some setting for to minimize the freeze please let us know that.

Actions #5

Updated by Greg Farnum about 6 years ago

The monitor needs to see the change to the osd_heartbeat_interval and osd_heartbeat_grace settings, not just the OSDs. (And the grace should be larger than the interval.)

The output line about "grace 20" tells us that the monitor is still seeing the defaults.

Actions

Also available in: Atom PDF