Project

General

Profile

Bug #4217

osd: recovery hangs indefinitely

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

this has been popping up in qa and mistakenly interpreted as just slow, but recovery is in fact blocking indefinitely.
The job is

interactive-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        debug osd: 20
        debug filestore: 20
        debug ms: 1
    fs: ext4
    log-whitelist:
    - slow request
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    timeout: 1200
- ceph-fuse: null
- workunit:
    clients:
      client.0:
      - rados/test.sh

and it is currently in the stuck state on plana18.

Associated revisions

Revision 6d8dfb18 (diff)
Added by Sage Weil about 11 years ago

osd: clear recovery state on pg removal

This ensures we release our in-progress recovery counters, which prevents
recovery from getting blocked indefinitely when a pool removal races with
recovery ops.

Fixes: #4217
Backport: bobtail
Signed-off-by: Sage Weil <>
Reviewed-by: Samuel Just <>

History

#1 Updated by Sage Weil about 11 years ago

  • Assignee set to Sage Weil

#2 Updated by Sage Weil about 11 years ago

  • Status changed from 12 to Fix Under Review

wip-4217 now passes the test

#3 Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF