Support #23254: Clinet side write freeze for 15 Sec when one storage node rebooted - Ceph - Ceph

Actions

Copy link

Support #23254

closed

Clinet side write freeze for 15 Sec when one storage node rebooted

Added by Suvendu Mitra about 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Tags:

Reviewed:

Affected Versions:

v12.2.2

Pull request ID:

Description

I am running ceph version 12.2.2, with bluestore. In my system I am running 3 storage nodes with 2 OSD each. Monitors and Mgr running on 3 controller

A. Test details -
1. Launch a fedora-24 VM. start writing with dd on attached cinder volume. and other window monitor the resultant file size.
2. Restart one storage node @ Tue Mar 6 14:22:59 EET 2018

B. Observation
1. File size is constant for 15 sec, after that size grows

Also tried with osd_heartbeat_grace = 2 and osd_mon_heartbeat_interval = 5 but write freeze time doesn't improve

System settings

[root@controller-1 ~]# sudo ceph -s
cluster:
id: 08762c5c-52fd-4d29-91ec-987d6ece068e
health: HEALTH_OK

services:
    mon: 3 daemons, quorum controller-1,controller-2,controller-3
    mgr: controller-3(active), standbys: controller-1, controller-2
    osd: 6 osds: 6 up, 6 in

data:
    pools:   4 pools, 736 pgs
    objects: 34581 objects, 134 GB
    usage:   275 GB used, 1959 GB / 2235 GB avail
    pgs:     736 active+clean

[root@controller-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.18271 root default
-5 0.72757 host storage-1
0 ssd 0.36378 osd.0 up 1.00000 1.00000
3 ssd 0.36378 osd.3 up 1.00000 1.00000
-3 0.72757 host storage-2
1 ssd 0.36378 osd.1 up 1.00000 1.00000
4 ssd 0.36378 osd.4 up 1.00000 1.00000
-7 0.72757 host storage-3
2 ssd 0.36378 osd.2 up 1.00000 1.00000
5 ssd 0.36378 osd.5 up 1.00000 1.00000

Files

Download all files

storage_3.tar.gz (257 KB) storage_3.tar.gz	Ceph log from storage-3	Suvendu Mitra, 03/07/2018 08:35 AM
controller-1.tar.gz (951 KB) controller-1.tar.gz	Ceph log from controller-1	Suvendu Mitra, 03/07/2018 08:37 AM
ceph_w.txt (23.7 KB) ceph_w.txt		Suvendu Mitra, 03/19/2018 08:55 AM

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Tracker changed from Bug to Support

Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?

Actions

Copy link

Updated by Suvendu Mitra about 6 years ago

File ceph_w.txt ceph_w.txt added

Greg Farnum wrote:

Did you watch "ceph -w" to see how long it took for the OSDs to get marked down, and then for the PGs to finish peering?

Here is the data you requested
1. Issue reboot from Storage-1
[root@storage-1 ~]# date; reboot -f
Mon Mar 19 10:42:01 EET 2018
Rebooting.

2. Portion of Ceph -w output on other window
2018-03-19 10:39:14.438744 mon.controller-1 [INF] mon.1 192.168.1.21:6789/0
2018-03-19 10:39:14.438817 mon.controller-1 [INF] mon.2 192.168.1.22:6789/0
2018-03-19 10:42:17.509515 mon.controller-1 [INF] osd.0 failed (root=default,host=storage-1) (2 reporters from different host after 20.000223 >= grace 20.000000)
2018-03-19 10:42:17.509701 mon.controller-1 [INF] osd.3 failed (root=default,host=storage-1) (2 reporters from different host after 20.000354 >= grace 20.000000)
2018-03-19 10:42:18.197765 mon.controller-1 [WRN] Health check failed: 2 osds down (OSD_DOWN)
2018-03-19 10:42:18.197821 mon.controller-1 [WRN] Health check failed: 1 host (2 osds) down (OSD_HOST_DOWN)
2018-03-19 10:42:21.213696 mon.controller-1 [WRN] Health check failed: Degraded data redundancy: 3078/17364 objects degraded (17.726%), 88 pgs unclean, 253 pgs degraded (PG_DEGRADED)

So it takes 17 second to detect the OSD failure.

On client side I see 17 sec write freeze. Also attached full "ceph -w" output for reference

Actions

Copy link

Updated by Greg Farnum about 6 years ago

Status changed from New to Closed

This is expected behavior, especially if you're knocking out a monitor at the same time as your OSD.

If you're trying to reduce the detection time, you're missing some heartbeat settings on the OSD. They should be documented as a group.

Actions

Copy link

Updated by Suvendu Mitra about 6 years ago

No in this case Monitor is not shutdown only OSD node is rebooted. You have misunderstood this case. if you some setting for to minimize the freeze please let us know that.

Actions

Copy link

Updated by Greg Farnum about 6 years ago

The monitor needs to see the change to the osd_heartbeat_interval and osd_heartbeat_grace settings, not just the OSDs. (And the grace should be larger than the interval.)

The output line about "grace 20" tells us that the monitor is still seeing the defaults.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Support #23254

Clinet side write freeze for 15 Sec when one storage node rebooted

Updated by Greg Farnum about 6 years ago

Updated by Suvendu Mitra about 6 years ago

Updated by Greg Farnum about 6 years ago

Updated by Suvendu Mitra about 6 years ago

Updated by Greg Farnum about 6 years ago