Bug #759
closedosd: pgs spend a long time peering when marking osds out
0%
Description
On the playground (with lots of data), I see that some PGs spend a long time in peering state after marking an OSD as out. This isn't supposed to happen...
Updated by Sage Weil about 13 years ago
this appears to be scrubbing related:
- we get a new osdmap. handle_osd_map tries to pause the op threadpool.
- a long running scrub op takes forever to complete
- handle_osd_map finally continues.
during that whole time the main dispatch thread is blocked up, and peering gets backed up as a result.
Updated by Sage Weil about 13 years ago
- Assignee changed from Sage Weil to Samuel Just
the replica scrub needs to go in a different work queue (not op_wq). scrub_wq, or something else that's assigned to the disk threadpool disk_tp.
Updated by Samuel Just about 13 years ago
1a01e5ee1b88a217547873296e0371858be13f37 merged in a branch moving replica scrubbing to rep_scrub_wq with a new non-osdop message for initiating a replica scrub. Scrub still blocks in the disk_tp while waiting for replicas to scrub, though, working on that now.
Updated by Sage Weil about 13 years ago
- Status changed from In Progress to Resolved