Project

General

Profile

Bug #759

osd: pgs spend a long time peering when marking osds out

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
02/02/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

On the playground (with lots of data), I see that some PGs spend a long time in peering state after marking an OSD as out. This isn't supposed to happen...


Related issues

Related to Ceph - Bug #793: osd: avoid blocking in scrub_wq Resolved 02/09/2011

History

#1 Updated by Sage Weil over 8 years ago

  • Status changed from New to In Progress

#2 Updated by Sage Weil over 8 years ago

this appears to be scrubbing related:

- we get a new osdmap. handle_osd_map tries to pause the op threadpool.
- a long running scrub op takes forever to complete
- handle_osd_map finally continues.

during that whole time the main dispatch thread is blocked up, and peering gets backed up as a result.

#3 Updated by Sage Weil over 8 years ago

  • Assignee changed from Sage Weil to Samuel Just

the replica scrub needs to go in a different work queue (not op_wq). scrub_wq, or something else that's assigned to the disk threadpool disk_tp.

#4 Updated by Samuel Just over 8 years ago

1a01e5ee1b88a217547873296e0371858be13f37 merged in a branch moving replica scrubbing to rep_scrub_wq with a new non-osdop message for initiating a replica scrub. Scrub still blocks in the disk_tp while waiting for replicas to scrub, though, working on that now.

#5 Updated by Sage Weil over 8 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF