Project

General

Profile

Actions

Bug #11856

closed

osd - scrubbing slot leaked

Added by Guang Yang almost 9 years ago. Updated about 8 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version: v0.80.4
platform: RHEL6

With our production cluster, we found that the 'pg repair' command was not honored for inconsistent PG, further deep dive showed that it was due to that for one of the OSD within the PG, the scrubbing slot was occupied, even through there is no scrubbing PG in the cluster.

I am thinking of two possibilities:
  1. The unreserve scrubbing request from primary to replica was not processed properly (e.g. the message got lost?)
  2. For cases, the scrubbing reservation was not cleared properly.

Since I don't have much log for the scrubbing reservation/un-reservation, it is hard to tell if it is one or the other.

At a starting point, we might want to expose the scrubbing slot state from admin socket so as to make the troubleshooting easier.

Actions

Also available in: Atom PDF