Bug #4217
osd: recovery hangs indefinitely
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
this has been popping up in qa and mistakenly interpreted as just slow, but recovery is in fact blocking indefinitely.
The job is
interactive-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 osd: debug osd: 20 debug filestore: 20 debug ms: 1 fs: ext4 log-whitelist: - slow request roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: timeout: 1200 - ceph-fuse: null - workunit: clients: client.0: - rados/test.sh
and it is currently in the stuck state on plana18.
Associated revisions
osd: clear recovery state on pg removal
This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.
Fixes: #4217
Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
History
#1 Updated by Sage Weil about 11 years ago
- Assignee set to Sage Weil
#2 Updated by Sage Weil about 11 years ago
- Status changed from 12 to Fix Under Review
wip-4217 now passes the test
#3 Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved