Project

General

Profile

Actions

Bug #8801

closed

Ceph monitors do not start after server restart

Added by AltScale Inc almost 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have two separate Ceph installations with five servers each.
Sometimes when a server is restarted the Ceph monitor on it does not
start automatically. The monitor is different each time and it does not
happen always.

We've upgraded from Cuttlefish to Dumpling and later to Emperor and it
seems that the issue is not totally resolved. Currently we are using
Emperor.

We've tried several ways [1] [2] [3] to to bring the monitor back up.
Usually only recreating the monitor helped. [3]

In Emperor usually the monitor can be started/restarted manually with:

sudo initctl start ceph-mon cluster=ceph id=_ceph_server_hostname

OS: Ubuntu 12.04
Ceph version: cuttlefish, dumpling, emperor
Kernel version: 3.5.x

[1]
sudo initctl start ceph-mon cluster=cluster_name id=nowhere-cmp-05

[2]
sudo restart ceph-mon-all
sudo initctl restart ceph-mon cluster=cluster_name id=nowhere-cmp-05

[3]
sudo initctl stop ceph-mon cluster=cluster_name id=nowhere-cmp-04
sudo ceph mon remove nowhere-cmp-04
sudo mv /var/lib/ceph/mon/ceph-nowhere-cmp-04/ /var/lib/ceph/mon/ceph-nowhere-cmp-04.bak
sudo mkdir /var/lib/ceph/mon/ceph-nowhere-cmp-04
sudo ceph auth get mon. -o /tmp/auth
sudo ceph mon getmap -o /tmp/map
sudo ceph-mon -i nowhere-cmp-04 --mkfs --monmap /tmp/map --keyring /tmp/auth
sudo ceph mon add nowhere-cmp-04 10.16.0.107:6789
sudo ceph-mon -i nowhere-cmp-04 --public-addr 10.16.0.107:6789


Files

ceph-mon-issues.log (651 KB) ceph-mon-issues.log AltScale Inc, 07/17/2014 04:41 AM
Actions #1

Updated by Joao Eduardo Luis almost 10 years ago

  • Assignee set to Joao Eduardo Luis

Can you provide logs for the monitor that doesn't start? Ideally with 'debug mon = 10'.

Actions #2

Updated by AltScale Inc almost 10 years ago

We were able to reproduce the issue with the monitors by restarting the physical server. The Ceph configuration had debug set as stated in the documentation:

[mon]
mon debug = 10

A log of the monitor is applied. The machine was restarted at 14:24 and was operational at 14:25/26. The monitor didn't start by itself. There are no logs before it was started manually with:

sudo initctl start ceph-mon cluster=ceph id=vn-cmp-04

Restarting the monitor, just shuts it down:

sudo restart ceph-mon-all

Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from New to Can't reproduce
  • Source changed from other to Community (user)

from teh logs the ceph-mon process was never started.. iw ould look in your /var/log/upstart logs?

Actions

Also available in: Atom PDF