Bug #51463: blocked requests while stopping/starting OSDs - RADOS - Ceph

Actions

Copy link

Bug #51463

closed

blocked requests while stopping/starting OSDs

Added by Manuel Lausch almost 3 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.13

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

we run into a lot of slow requests. (IO blocked for several seconds) while stopping or starting one or more OSDs. With Nautilus this wasn't a issue at all.
We set the slow op warning to 5 seconds because our application (which uses librados native) has a timeout of 6 seconds.

I could drill it down to this new introduced read leases: https://docs.ceph.com/en/latest/dev/osd_internals/stale_read/

For testing I set the mentioned option "osd_pool_default_read_lease_ratio" from default 0.8 to 0.2 which obviously resolve this issue. But I don't know if there are other implications in setting this down.

As well I wonder if this read leases could or should not be invalidated in case of osd up/down events.

I could reproduce this issue with a quite small test clusters with 3 Nodes á 5 OSDs
I tested it with ceph octopus 15.2.13

Manuel

Files

Download all files

disabled_fastshutdown_stop_osd-ceph.log (23.9 KB) disabled_fastshutdown_stop_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM
default_stop_osd-ceph.log (23.3 KB) default_stop_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM
default_start_osd-ceph.log (12.7 KB) default_start_osd-ceph.log		Manuel Lausch, 11/03/2021 09:25 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #51463

blocked requests while stopping/starting OSDs

Updated by Josh Durgin almost 3 years ago

Updated by Josh Durgin almost 3 years ago

Updated by Manuel Lausch over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Manuel Lausch over 2 years ago

Updated by Sage Weil over 2 years ago

Updated by Manuel Lausch over 2 years ago

Updated by Maximilian Stinsky about 2 years ago

Updated by Radoslaw Zarzynski about 2 years ago

Updated by Manuel Lausch about 2 years ago

Updated by Radoslaw Zarzynski about 2 years ago