Project

General

Profile

Documentation #47436

Cluster monitor troubleshooting documentation outdated?

Added by G. Heinrich 5 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
documentation
Target version:
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

I tried to restore an unhealthy virtual cluster by using the following documentation:

https://docs.ceph.com/docs/octopus/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

When I tried to follow the steps necessary I already stumbled on the first step ("Stop all ceph-mon daemons on all monitor hosts") because there are no ceph-services to be found on the host (problably since Octopus).

I then tried to to at least extract the monmap. But the ceph-mon needs to be installed beforehand (via cephadm?) which isn't mentioned. After I installed ceph-mon the extraction didn't work. It didn't give a warning or error but the monmap wasn't extracted at all. Since I was only experimenting on a virtual cluster I started all nodes and extracted the monmap via the command "ceph mon getmap -o /tmp/monmap" and shut the other two monitors down again.

Then, editing the monmap also required to install monmaptool beforehand (via cephadm which again isn't mentioned?) which I did and then I removed the missing two monitor nodes from it.

Afterwards tried to inject the monmap into the surviving monitor but this didn't work either. The first time I ran the command as described in the doc, resulted in this message (which was never posted again afterwards):

7f3e43545580 -1 monitor data directory at '/var/lib/ceph/mon/ceph-iz-ceph-v1-mon-01' does not exist: have you run 'mkfs'?

Am I doing everything wrong or is the documentation in need of an update? Could some these problems I encountered also be the result of one or more bugs? I don't want to spam the bug tracker so I first post it here.

Thanks a lot!

Also available in: Atom PDF