Bug #6333: Recovery and/or Backfill Cause QEMU/RBD Reads to Hang - Ceph - Ceph

Actions

Copy link

Bug #6333

closed

Recovery and/or Backfill Cause QEMU/RBD Reads to Hang

Added by Mike Dawson over 10 years ago. Updated about 10 years ago.

Status:

Closed

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Some (but not all) of our qemu instances booted from rbd copy on write clones experience i/o outages during recovery and/or backfill. Re-adding two osds (3TB SATA) on separate nodes recently took ~15hours. For most of that time, the two new disks hover close to 100% spindle contention. Other OSDs have spikes of elevated utilization, but are significantly calmer. The network does not appear to be a limiting factor.

We attempt to get client i/o flowing freely again on stuck guests by lowering osd_recovery_op_priority to 1, but it does not work. Turning osd_recovery_max_active down to 1 doesn't fix client i/o either, but that may have been due to Issue #6291.

The instances that encounter outages run video surveillance software on Windows. During the recovery, we get seemingly uninterrupted i/o from a different video surveillance package running on Linux. I believe reads may be the primary issue. The Windows application seems to block on reads leaving it unable to write the video stream to rbd while the Linux application seems to have read and write separation.

Files

Download all files

ceph-issue-6333.jpg (251 KB) ceph-issue-6333.jpg		Mike Dawson, 09/17/2013 11:57 AM
ceph-issue-6333-control.jpg (240 KB) ceph-issue-6333-control.jpg		Mike Dawson, 09/17/2013 12:04 PM
ceph-issue-6333-recovery.jpg (88.8 KB) ceph-issue-6333-recovery.jpg		Mike Dawson, 09/17/2013 01:47 PM
ceph-issue-6333-recovery-rbd-perf-dump.jpg (233 KB) ceph-issue-6333-recovery-rbd-perf-dump.jpg		Mike Dawson, 09/17/2013 01:47 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #6333

Recovery and/or Backfill Cause QEMU/RBD Reads to Hang

Updated by Mike Dawson over 10 years ago

Updated by Mike Dawson over 10 years ago

Updated by Mike Dawson over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Mike Dawson over 10 years ago

Updated by Mike Dawson over 10 years ago

Updated by Sage Weil over 10 years ago

Updated by Samuel Just over 10 years ago

Updated by Sage Weil about 10 years ago