Bug #41866
openOSD cannot report slow operation warnings in time.
0%
Description
If an underlying device is blocked due to H/W issues, a thread that checks slow ops can’t report slow op warning in time in the current implementation, because the thread is also blocked.
For e.g.
1. if a DATA disk is blocked with a PG lock locked, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the PG lock or the MGRClient::lock.
2. if a WAL disk is blocked while flushing, the thread that executes TrackedOp::visit_ops_in_flight method is also blocked waiting for the BlueFS::lock.
It means that OSD can’t report slow op warnings in time.
In my opinion, how about running slow op checking code in the separate thread from the ’tick_without_osd_lock’ thread?
Updated by Greg Farnum over 4 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD)
Updated by Ilsoo Byun over 4 years ago
report_callback thread is also blocked on PG::lock with MGRClient::lock locked while getting the pg stats. This in turn block the tick_wihtout_osd_lock thread.
Updated by Kefu Chai over 4 years ago
- Category set to Administration/Usability
- Status changed from New to 17
- Assignee set to Ilsoo Byun
- Pull request ID set to 30550
Updated by Kefu Chai over 4 years ago
- Status changed from 17 to Fix Under Review