Bug #51463: blocked requests while stopping/starting OSDs - RADOS - Ceph

Actions

Copy link

Bug #51463

closed

blocked requests while stopping/starting OSDs

Added by Manuel Lausch almost 3 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.13

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OSDs. With Nautilus this wasn't a issue at all.
We set the slow op warning to 5 seconds because our application (which uses librados native) has a timeout of 6 seconds.

I could drill it down to this new introduced read leases: https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/

For testing I set the mentioned option "osd_pool_default_read_lease_ratio" from default 0.8 to 0.2 which obviously resolve this issue. But I don't know if there are other implications in setting this down.

As well I wonder if this read leases could or should not be invalidated in case of osd up/down events.

I could reproduce this issue with a quite small test clusters with 3 Nodes á 5 OSDs
I tested it with ceph octopus 15.2.13

Manuel

Files

Download all files

disabled_fastshutdown_stop_osd-ceph.log (23.9 KB) disabled_fastshutdown_stop_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM
default_stop_osd-ceph.log (23.3 KB) default_stop_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM
default_start_osd-ceph.log (12.7 KB) default_start_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM

Actions

Copy link

Updated by Josh Durgin almost 3 years ago

Project changed from Ceph to RADOS
Category deleted (~~OSD~~)

Actions

Copy link

Updated by Josh Durgin almost 3 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by Manuel Lausch over 2 years ago

This is still a issue- In the newest Pacific release (16.2.5) as well

The developer documentation mentioned above talks about preventing reads from stray OSDs. I'am aware that this could be a issue while stopping OSDs. But I don't understand why I get laggy PGs wen starting OSDs.

please have a look at this issue.This is a showstopper which prevents us from upgrade our clusters from luminous to octopus ore pacific.

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Is it possible for you to share your test reproducer with us? It would be great if we could run it against a vstart cluster.

Actions

Copy link

Updated by Manuel Lausch over 2 years ago

Sure.

Simple cluster with 5 nodes 125 OSDs in total
one pool replicated size 3, min_size 1

at least this in the ceph.conf
osd op complaint time = 5

running some benchmark. for example
rados bench -p testbench 100 write --no-cleanup

And now stop / start some OSDs.
In my case I will see in ceph -s output, and in the ceph.log also, some slow op alerts after stopping and starting some osds. Up to about 10 - 15 seconds blocked.
The client (ceph bench in this case) recognizes this slow operations as well.

please let me know if you need further information

Actions

Copy link

Updated by Sage Weil over 2 years ago

Status changed from New to Need More Info

I easily reproduced this with 'osd fast shutdown = false' (vstart default), but was unable to do so with 'osd fast shutdown = true' (the normal default).

Actions

Copy link Download all files

Updated by Manuel Lausch over 2 years ago

File disabled_fastshutdown_stop_osd-ceph.log disabled_fastshutdown_stop_osd-ceph.log added
File default_stop_osd-ceph.log default_stop_osd-ceph.log added
File default_start_osd-ceph.log default_start_osd-ceph.log added

Hi Sage,

I tested it with fast shutdown enabled (default) and disabled. In both cases I got slow ops (longer than 5 seconds) after stopping one OSDs.

attached three ceph.log snippets.

with enabled fast shutdown (default) it took about 2 seconds until the first "immediately failed" message appeared after stopping one OSD. On bigger clusters this took some more time which is why I disabled the fast shutdown per default.

with disabled fast_shutdown I get immediately the down message and the cluster begins its peering. And then the slow op messages begin.

After starting the OSD again I got in both cases slow ops too

Actions

Copy link

Updated by Maximilian Stinsky about 2 years ago

I think we hit the same issue while upgrading our nautilus cluster to pacific.
While I did not hit this when testing the upgrade in our lab environment, we saw a low of slow requests and laggy pg's after we upgraded the first couple of osd hosts in our production cluster.
After we finished the upgrade I can start/stop/restart osd's without any issue but while doing the upgrade we had slight impact because of the slow requests and laggy pg's.

Actions

Copy link