Bug #42830: problem returning mon to cluster - RADOS - Ceph

Actions

Copy link

Bug #42830

open

problem returning mon to cluster

Added by Nikola Ciprich over 4 years ago. Updated over 3 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v13.2.6

ceph-qa-suite:

Component(RADOS):

Monitor

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

as discussed on the list, here https://www.spinics.net/lists/ceph-users/msg55977.html

After rebooting one of the nodes, when trying to start monitor, whole cluster
seems to hang, including IO, ceph -s etc. When this mon is stopped again,
everything continues. Trying to spawn new monitor leads to the same problem
(even on different node).

All cluster nodes are centos 7 machines, I have 3 monitors (so 2 are now running), I'm
using ceph 13.2.6. monitor database is not very large, ~65MB. None of the cluster machines is overloaded.

update: after some discussion on the list, I was able to workaroud by setting mon lease timeout to 50s, waiting for monitor to join the cluster and then setting it back to 5s again.. this mon connect took hours btw! after it got OK, stopping/starting it is without flaw.

I'm quite sure there is no network issue there and since this first case, we got hit by it on another cluster.

probably good news is, that I was able to reproduce this problem by creating same test environment in VMs, with same hostnames, addresses and ceph version and copied monitor data. so if anyone would be interested, we're able to give SSH access or exact steps and data to reproduce.

if I could provide more data, please let me know. I'm also attaching ceph-mon.log with debug_mon set to 10/10.

Files

ceph-mon.nodev1d.log (190 KB) ceph-mon.nodev1d.log

Nikola Ciprich, 11/15/2019 07:30 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #42830

problem returning mon to cluster

Updated by Jérôme Poulin over 4 years ago

Updated by Jérôme Poulin over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Dan van der Ster about 4 years ago

Updated by Dan van der Ster about 4 years ago

Updated by Wido den Hollander about 4 years ago

Updated by Neha Ojha about 4 years ago

Updated by Dan van der Ster about 4 years ago

Updated by Wout van Heeswijk over 3 years ago