Actions
Bug #11856
closedosd - scrubbing slot leaked
Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
ceph version: v0.80.4
platform: RHEL6
With our production cluster, we found that the 'pg repair' command was not honored for inconsistent PG, further deep dive showed that it was due to that for one of the OSD within the PG, the scrubbing slot was occupied, even through there is no scrubbing PG in the cluster.
I am thinking of two possibilities:- The unreserve scrubbing request from primary to replica was not processed properly (e.g. the message got lost?)
- For cases, the scrubbing reservation was not cleared properly.
Since I don't have much log for the scrubbing reservation/un-reservation, it is hard to tell if it is one or the other.
At a starting point, we might want to expose the scrubbing slot state from admin socket so as to make the troubleshooting easier.
Actions