Actions
Bug #41866
openOSD cannot report slow operation warnings in time.
Status:
Fix Under Review
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If an underlying device is blocked due to H/W issues, a thread that checks slow ops can’t report slow op warning in time in the current implementation, because the thread is also blocked.
For e.g.
1. if a DATA disk is blocked with a PG lock locked, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the PG lock or the MGRClient::lock.
2. if a WAL disk is blocked while flushing, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the BlueFS::lock.
It means that OSD can’t report slow op warnings in time.
In my opinion, how about running slow op checking code in the separate thread from the ’tick_without_osd_lock’ thread?
Actions