Project

General

Profile

Actions

Bug #51463

closed

blocked requests while stopping/starting OSDs

Added by Manuel Lausch almost 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OSDs. With Nautilus this wasn't a issue at all.
We set the slow op warning to 5 seconds because our application (which uses librados native) has a timeout of 6 seconds.

I could drill it down to this new introduced read leases: https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/

For testing I set the mentioned option "osd_pool_default_read_lease_ratio" from default 0.8 to 0.2 which obviously resolve this issue. But I don't know if there are other implications in setting this down.

As well I wonder if this read leases could or should not be invalidated in case of osd up/down events.

I could reproduce this issue with a quite small test clusters with 3 Nodes รก 5 OSDs
I tested it with ceph octopus 15.2.13

Manuel


Files

disabled_fastshutdown_stop_osd-ceph.log (23.9 KB) disabled_fastshutdown_stop_osd-ceph.log Manuel Lausch, 11/03/2021 09:25 AM
default_stop_osd-ceph.log (23.3 KB) default_stop_osd-ceph.log Manuel Lausch, 11/03/2021 09:25 AM
default_start_osd-ceph.log (12.7 KB) default_start_osd-ceph.log Manuel Lausch, 11/03/2021 09:25 AM
Actions

Also available in: Atom PDF