Bug #9623: On cluster with 3 mons, stopping 2 mons made cluster in-accessible, with IO's hung/pause - Ceph - Ceph

Actions

Copy link

Bug #9623

closed

On cluster with 3 mons, stopping 2 mons made cluster in-accessible, with IO's hung/pause

Added by Mallikarjun Biradar over 9 years ago. Updated over 9 years ago.

Status:

Won't Fix

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Cluster with "n" number of monitor nodes, will be in-accessible if "n-1" number of monitors are down.
Its been observed on cluster with 3 monitor nodes using 0.84 CEPH version.

Steps to reproduce:
Test 1: By stopping monitor service
1> Create cluster with 3 monitor nodes
2> Start IO onto cluster
3> Stop monitor service on two of the monitor nodes (sudo stop ceph-mon-all)
Its observed that,
a> Cluster became in-accessible
b> IO's are in paused state
c> Mon status of 3rd node, is in "probing" state forever

test@rack6-client-4:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.rack6-client-5.asok mon_status { "name": "rack6-client-5",
"rank": 1,
"state": "probing",
"election_epoch": 60,
"quorum": [],
"outside_quorum": [
"rack6-client-5"],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": { "epoch": 1,
"fsid": "fe2afe2d-1096-4c3e-a91c-73ccfad84851",
"modified": "0.000000",
"created": "0.000000",
"mons": [ { "rank": 0,
"name": "rack6-client-4",
"addr": "10.242.43.105:6789\/0"}, { "rank": 1,
"name": "rack6-client-5",
"addr": "10.242.43.106:6789\/0"}, { "rank": 2,
"name": "rack6-client-6",
"addr": "10.242.43.107:6789\/0"}]}}

test@rack6-client-4:~$

Test 2: By exiting from quorum
1> Create cluster with 3 monitor nodes
2> Start IO onto cluster
3> Stop monitor service on two of the monitor nodes (sudo ceph --admin-daemon /var/run/ceph/ceph-mon.rack6-client-5.asok quorum exit)
Its observed that,
a> Cluster became in-accessible
b> IO's are in paused state
c> Mon status of 3rd node, is in "electing" state forever

test@rack6-client-4:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-mon.rack6-client-5.asok mon_status { "name": "rack6-client-5",
"rank": 1,
"state": "electing",
"election_epoch": 55,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": { "epoch": 1,
"fsid": "fe2afe2d-1096-4c3e-a91c-73ccfad84851",
"modified": "0.000000",
"created": "0.000000",
"mons": [ { "rank": 0,
"name": "rack6-client-4",
"addr": "10.242.43.105:6789\/0"}, { "rank": 1,
"name": "rack6-client-5",
"addr": "10.242.43.106:6789\/0"}, { "rank": 2,
"name": "rack6-client-6",
"addr": "10.242.43.107:6789\/0"}]}}

test@rack6-client-4:~$

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Assignee deleted (~~Loïc Dachary~~)

Removing myself as I may not have time to deal with this right now.

Actions

Copy link

Updated by Greg Farnum over 9 years ago

Status changed from New to Won't Fix

This is expected and intended behavior. The monitors are a Paxos system and require a quorum of more than half to be communicating in order to function. Doing otherwise would not provide the necessary consistency guarantees. Keep your monitors in good shape.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #9623

On cluster with 3 mons, stopping 2 mons made cluster in-accessible, with IO's hung/pause

Updated by Loïc Dachary over 9 years ago

Updated by Greg Farnum over 9 years ago