Project

General

Profile

Actions

Fix #6278

closed

osd: throttle snap trimming

Added by Mike Dawson over 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Qemu guests on our cluster experience high I/O latency, stalls or complete halts when spindle contention is created by Ceph performing maintenance work (scrub, deep-scrub, peering, recovery, or backfilling).

Under non-scrub conditions, a graph of util from 'iostat -x' shows consistent 10-15% load across all OSDs. Client i/o performance is good.

When scrub or deep-scrub starts, util on several or most OSDs approach 100 indicating spindle contention. Client i/o tends to have significant read latency and some guest becomes quite sluggish. Some applications experience multi-second pauses.

In the case of peering, recovery, and/or backfilling we see situations where client i/o will completely stall on some instances for the entire duration of the recovery, or can take the form of i/o starting/stopping seemingly randomly during the recovery. On other instances we see, i/o proceed at a fraction of the expected norm. Or we see a combination of these conditions.

Ceph should prioritize client i/o more effectively.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #6826: Non-equal performance of 'freshly joined' OSDsDuplicate11/20/2013

Actions
Actions

Also available in: Atom PDF