Project

General

Profile

Bug #19350

mon:rebulid Monitor failure

Added by huanwen ren about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I use ceph-objectstore-tool and ceph-monstore-tool to rebulid Monitor
in the ceph v10.2.5.

similar: http://tracker.ceph.com/issues/17400 or
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/#recovery-using-healthy-monitor-s.

After all the build steps have been completed,
I use systemctl start Monitors and process are all up,
but The cluster is not available.

In the mon log you can see the following error:

2014-11-04 01:04:36.484681 7fb6adc2c600  0 set uid:gid to 167:167 (ceph:ceph)
2014-11-04 01:04:36.484709 7fb6adc2c600  0 ceph version 10.2.5.1 (b6114251edc829a61de280acd1e41b87c91614fb), process ceph-mon, pid 10670
2014-11-04 01:04:36.484780 7fb6adc2c600  0 pidfile_write: ignore empty --pid-file
2014-11-04 01:04:36.758182 7fb6adc2c600 10 load: jerasure load: lrc load: isa
2014-11-04 01:04:36.772025 7fb6adc2c600  1 leveldb: Recovering log #48
2014-11-04 01:04:36.774560 7fb6adc2c600  1 leveldb: Level-0 table #50: started
2014-11-04 01:04:36.797257 7fb6adc2c600  1 leveldb: Level-0 table #50: 124 bytes OK
2014-11-04 01:04:36.819483 7fb6adc2c600  1 leveldb: Delete type=3 #46

2014-11-04 01:04:36.819548 7fb6adc2c600  1 leveldb: Delete type=0 #48

2014-11-04 01:04:36.844104 7fb6adc2c600 10 obtain_monmap
2014-11-04 01:04:36.844165 7fb6adc2c600 10 obtain_monmap found mkfs monmap
2014-11-04 01:04:36.844204 7fb6adc2c600  0 *mon.192.6.6.87* does not exist in monmap, will attempt to join an existing cluster
2014-11-04 01:04:36.844429 7fb6adc2c600  0 using public_addr 192.6.6.87:0/0 -> 192.6.6.87:6789/0
2014-11-04 01:04:36.846430 7fb6adc2c600  0 starting mon.192.6.6.87 rank -1 at 192.6.6.87:6789/0 mon_data /var/lib/ceph/mon/ceph-192.6.6.87 fsid 377ba100-3d43-d115-a5e0-bfc57248ffb2
2014-11-04 01:04:36.862851 7fb6adc2c600  1 mon.192.6.6.87@-1(probing) e0 preinit fsid 377ba100-3d43-d115-a5e0-bfc57248ffb2
2014-11-04 01:04:36.862928 7fb6adc2c600 10 mon.192.6.6.87@-1(probing) e0 check_fsid cluster_uuid contains '377ba100-3d43-d115-a5e0-bfc57248ffb2'
2014-11-04 01:04:36.862963 7fb6adc2c600 10 mon.192.6.6.87@-1(probing) e0 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
2014-11-04 01:04:36.862974 7fb6adc2c600 10 mon.192.6.6.87@-1(probing) e0 apply_compatset_features_to_quorum_requirements required_features 0
......
......
......
monmap is e0: 3 mons at {192.6.6.88=192.6.6.88:6789/0,noname-a=192.6.6.86:6789/0,*noname-b=192.6.6.87:6789/0*}

You can see the name of the local mon(ip=192.6.6.87) is noname-b,
so that's not yet enough for a new quorum, waiting.

solution:
Adding monmap.rename() when ceph-mon can get mkfs:monmap from store.

History

#1 Updated by huanwen ren about 7 years ago

I added the following solution can be solved:
https://github.com/ceph/ceph/pull/14081

#2 Updated by huanwen ren about 7 years ago

Add mon_initial_members can solve this problem in the ceph.conf.
so I close it: https://github.com/ceph/ceph/pull/14081

#3 Updated by Nathan Cutler about 7 years ago

  • Status changed from New to Closed

I guess this can be closed, then. Let me know if that was the wrong thing to do.

Also available in: Atom PDF