Bug #12580: Enclosure power failure pausing client IO - Ceph - Ceph

Actions

Copy link

Bug #12580

closed

Enclosure power failure pausing client IO

Added by Mallikarjun Biradar over 8 years ago. Updated over 8 years ago.

Status:

Can't reproduce

Priority:

High

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

v0.94.2

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have active client IO running on cluster. (Random write profile with 4M block size & 64 Queue depth).

One of storage enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected.

But client IO got paused (size=2 & min_size=1). After some time enclosure & hosts connected to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts & OSD's pertaining to that enclosure came up, client IO resumed.

Setup Details:
Total Number of hosts: 8
Number of Storage enclosures/chassis: 2 (each connected with 4 hosts )
Failure domain: Chassis
Replication size: 2
Min size: 1
All pools were created with chassis ruleset.

This issue seen on Giant release 0.87.2

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #12580

Enclosure power failure pausing client IO

Updated by Varada Kari over 8 years ago

Updated by Varada Kari over 8 years ago

Updated by Varada Kari over 8 years ago

Updated by Sage Weil over 8 years ago