Project

General

Profile

Actions

Bug #41866

open

OSD cannot report slow operation warnings in time.

Added by Ilsoo Byun over 4 years ago. Updated over 4 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If an underlying device is blocked due to H/W issues, a thread that checks slow ops can’t report slow op warning in time in the current implementation, because the thread is also blocked.
For e.g.
1. if a DATA disk is blocked with a PG lock locked, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the PG lock or the MGRClient::lock.
2. if a WAL disk is blocked while flushing, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the BlueFS::lock.

It means that OSD can’t report slow op warnings in time.

In my opinion, how about running slow op checking code in the separate thread from the ’tick_without_osd_lock’ thread?

Actions

Also available in: Atom PDF