Project

General

Profile

Actions

Bug #18085

closed

osd thrashing deadlock: copy-from vs max-backfills

Added by Sage Weil over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

one pg is backfilling. backfilling blocks on a copy-from operation, which blocks on the read (copy-get) because the pg is

    "state": "undersized+degraded+remapped+backfill_wait+peered",
    "snap_trimq": "[dd~1,e4~2,e8~1,eb~1,ed~7]",
    "epoch": 1401,
    "up": [
        1,
        5
    ],
    "acting": [
        5
    ],
    "backfill_targets": [
        "1" 
    ],

on the same osd. it won't backfill because osd_max_backfills=1

I think the simple fix is to just set osd_max_backfills=2 during thrashing with cache tiering?

Actions #1

Updated by Sage Weil over 7 years ago

/a/sage-2016-11-29_20:05:25-rados:thrash-master---basic-smithi/586464

osd logs are in the test dir

Actions #2

Updated by Sage Weil over 7 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Sage Weil over 7 years ago

  • Assignee set to Samuel Just
Actions #4

Updated by Sage Weil over 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF