osd: pgs spend a long time peering when marking osds out
On the playground (with lots of data), I see that some PGs spend a long time in peering state after marking an OSD as out. This isn't supposed to happen...
#2 Updated by Sage Weil about 9 years ago
this appears to be scrubbing related:
- we get a new osdmap. handle_osd_map tries to pause the op threadpool.
- a long running scrub op takes forever to complete
- handle_osd_map finally continues.
during that whole time the main dispatch thread is blocked up, and peering gets backed up as a result.
#4 Updated by Samuel Just about 9 years ago
1a01e5ee1b88a217547873296e0371858be13f37 merged in a branch moving replica scrubbing to rep_scrub_wq with a new non-osdop message for initiating a replica scrub. Scrub still blocks in the disk_tp while waiting for replicas to scrub, though, working on that now.