Project

General

Profile

Actions

Bug #12580

closed

Enclosure power failure pausing client IO

Added by Mallikarjun Biradar over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have active client IO running on cluster. (Random write profile with 4M block size & 64 Queue depth).

One of storage enclosure had power loss. So all OSD's from hosts that are connected to this enclosure went down as expected.

But client IO got paused (size=2 & min_size=1). After some time enclosure & hosts connected to it came up.
And all OSD's on that hosts came up.

Till this time, cluster was not serving IO. Once all hosts & OSD's pertaining to that enclosure came up, client IO resumed.

Setup Details:
Total Number of hosts: 8
Number of Storage enclosures/chassis: 2 (each connected with 4 hosts )
Failure domain: Chassis
Replication size: 2
Min size: 1
All pools were created with chassis ruleset.

This issue seen on Giant release 0.87.2

Actions

Also available in: Atom PDF