Project

General

Profile

Bug #5195

"ceph-deploy mon create" fails when adding additional monitors

Added by Robert Sander almost 6 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
Start date:
05/29/2013
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

When trying to add another monitor to an existing cluster with "ceph-deploy mon create <hostname>" the operation fails.

The logfile on the new machine contains:

mon.ceph05-test does not exist in monmap, will attempt to join an existing cluster
no public_addr or public_network specified, and mon.ceph05-test not present in monmap or ceph.conf

I had to add the new monitor to the local ceph.conf file and push that with "ceph-deploy --overwrite-conf config push <host>" to all cluster hosts and I had to issue "ceph mon add <host> <ip>" on one of the existing cluster monitors.

After that ceph-deploy was able to add a new monitor.

Please add these steps to ceph-deploy or the documentation.


Related issues

Related to Ceph - Bug #5205: mon: FAILED assert(ret == 0) on config's set_val_or_die() from pick_addresses() Resolved 05/30/2013

Associated revisions

Revision eb86eebe (diff)
Added by Sage Weil almost 6 years ago

common/pick_addresses: behave even after internal_safe_to_start_threads

ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.

Work around this by observing the change while we adjust the value. We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.

Fixes: #5195, #5205
Backport: cuttlefish
Signed-off-by: Sage Weil <>
Reviewed-by: Dan Mick <>

Revision 4d57c12f (diff)
Added by Sage Weil almost 6 years ago

common/pick_addresses: behave even after internal_safe_to_start_threads

ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.

Work around this by observing the change while we adjust the value. We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.

Fixes: #5195, #5205
Backport: cuttlefish
Signed-off-by: Sage Weil <>
Reviewed-by: Dan Mick <>
(cherry picked from commit eb86eebe1ba42f04b46f7c3e3419b83eb6fe7f9a)

Revision 5511daf3 (diff)
Added by Gary Lowell almost 6 years ago

doc: public network statement needed on new monitors.

When using ceph-deploy to create a new monitor on a host that is not
in the initial set of hosts defined by the ceph-deploy new command,
a "public network" statement needs to be added to the ceph.conf file.
Fixes #5195.

Signed-off-by: Gary Lowell <>

History

#1 Updated by Sage Weil almost 6 years ago

  • Priority changed from Normal to High

#2 Updated by Sage Weil almost 6 years ago

  • Priority changed from High to Urgent

#3 Updated by Sage Weil almost 6 years ago

  • Assignee set to Anonymous

#4 Updated by Anonymous almost 6 years ago

  • Status changed from New to In Progress

The problem occurs when a monitor is added on a host that was not in the initial list of cluster members.

Sequence of steps to reproduce:

./ceph-deploy new gary-centos-04
./ceph-deploy mon create gary-centos-04
./ceph-deploy gatherkeys

./ceph-deploy mon create gary-centos-03

Traceback (most recent call last):
File "./ceph-deploy", line 8, in <module>
load_entry_point('ceph-deploy==1.1', 'console_scripts', 'ceph-deploy')()
File "/home/ubuntu/ceph-deploy/ceph_deploy/cli.py", line 112, in main
return args.func(args)
File "/home/ubuntu/ceph-deploy/ceph_deploy/mon.py", line 236, in mon
mon_create(args)
File "/home/ubuntu/ceph-deploy/ceph_deploy/mon.py", line 140, in mon_create
init=init,
File "/home/ubuntu/ceph-deploy/virtualenv/lib/python2.6/site-packages/pushy-0.5.1-py2.6.egg/pushy/protocol/proxy.py", line 255, in <lambda>
(conn.operator(type_, self, args, kwargs))
File "/home/ubuntu/ceph-deploy/virtualenv/lib/python2.6/site-packages/pushy-0.5.1-py2.6.egg/pushy/protocol/connection.py", line 66, in operator
return self.send_request(type_, (object, args, kwargs))
File "/home/ubuntu/ceph-deploy/virtualenv/lib/python2.6/site-packages/pushy-0.5.1-py2.6.egg/pushy/protocol/baseconnection.py", line 323, in send_request
return self.__handle(m)
File "/home/ubuntu/ceph-deploy/virtualenv/lib/python2.6/site-packages/pushy-0.5.1-py2.6.egg/pushy/protocol/baseconnection.py", line 639, in __handle
raise e
ceph-mon: set fsid to 1e1ddb0d-bcf6-4cc0-861d-7728c625cdd7
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-gary-centos-03 for mon.gary-centos-03

Log file contains:

2013-06-22 00:59:57.914750 7ff4b743e780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 26437
2013-06-22 00:59:57.930214 7ff4b743e780 0 mon.gary-centos-03 does not exist in monmap, will attempt to join an existing cluster
2013-06-22 00:59:57.930590 7ff4b743e780 -1 no public_addr or public_network specified, and mon.gary-centos-03 not present in monmap or ceph.conf

/etc/ceph/ceph.conf contains:

[global]
filestore_xattr_use_omap = true
mon_host = 10.214.140.139
osd_journal_size = 1024
mon_initial_members = gary-centos-04
auth_supported = cephx
fsid = 1e1ddb0d-bcf6-4cc0-861d-7728c625cdd7

The error looks like it's coming from the ceph-mon daemon and not the init script.

Also, it looks like it is not sufficient to add the new host to the ceph.conf file on the target host, it needs to be added everywhere and the any running monitors restarted. Doing further testing to verify.

#5 Updated by Sage Weil almost 6 years ago

oh, right. in this case i think teh thing to do is add 'public network = 1.2.3.0/24' or whatever to the ceph.conf so the ceph-mon knows what ip to bind to. can you verify that fixes it? and then we need to update the documentation accordingly...

#6 Updated by Robert Sander almost 6 years ago

Sage Weil wrote:

oh, right. in this case i think teh thing to do is add 'public network = 1.2.3.0/24' or whatever to the ceph.conf so the ceph-mon knows what ip to bind to. can you verify that fixes it?

My monitor hosts only have one interface.

#7 Updated by Sage Weil almost 6 years ago

Yeah,I think as things currently stand though the Mon looks for that option to be defined. We can probably fix it in this case but let's first confirm that setting public network resolves the problem?

#8 Updated by Anonymous almost 6 years ago

With the public_network statement added to ceph.conf it look slike it gets further gut hits an assert. Stack trace appended below:

[root@gary-centos-03 ~]# ip route show
default via 10.214.128.1 dev eth0
10.214.128.0/20 dev eth0 proto kernel scope link src 10.214.140.138
169.254.0.0/16 dev eth0 scope link metric 1002

[root@gary-centos-03 ~]# cat /etc/ceph/ceph.conf
[global]
filestore_xattr_use_omap = true
mon_host = 10.214.140.139
osd_journal_size = 1024
public_network = 10.214.128.0/20
mon_initial_members = gary-centos-04
auth_supported = cephx
fsid = a7fbee26-0f0a-4580-a9bc-c98ea6a5e5fd

[root@gary-centos-03 ~]# /usr/bin/ceph-mon -d --debug_mon 10 -i gary-centos-03 --pid-file /var/run/ceph/mon.gary-centos-03.pid -c /etc/ceph/ceph.conf
2013-06-24 17:17:41.005558 7fc09694f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 24161
2013-06-24 17:17:41.009942 7fc09694f780 10 needs_conversion
2013-06-24 17:17:41.016742 7fc09694f780 10 obtain_monmap
2013-06-24 17:17:41.017223 7fc09694f780 10 obtain_monmap found mkfs monmap
2013-06-24 17:17:41.017473 7fc09694f780 0 mon.gary-centos-03 does not exist in monmap, will attempt to join an existing cluster
common/config.cc: In function 'void md_config_t::set_val_or_die(const char*, const char*)' thread 7fc09694f780 time 2013-06-24 17:17:41.018007
common/config.cc: 621: FAILED assert(ret 0)
ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x65f856]
2: /usr/bin/ceph-mon() [0x6690be]
3: (pick_addresses(CephContext*)+0x9b) [0x66949b]
4: (main()+0x389f) [0x476f7f]
5: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
6: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
2013-06-24 17:17:41.019620 7fc09694f780 -1 common/config.cc: In function 'void md_config_t::set_val_or_die(const char*, const char*)' thread 7fc09694f780 time 2013-06-24 17:17:41.018007
common/config.cc: 621: FAILED assert(ret 0)

ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x65f856]
2: /usr/bin/ceph-mon() [0x6690be]
3: (pick_addresses(CephContext*)+0x9b) [0x66949b]
4: (main()+0x389f) [0x476f7f]
5: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
6: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- begin dump of recent events ---
-23> 2013-06-24 17:17:41.003583 7fc09694f780 5 asok(0x20e0000) register_command perfcounters_dump hook 0x2090010
-22> 2013-06-24 17:17:41.003618 7fc09694f780 5 asok(0x20e0000) register_command 1 hook 0x2090010
-21> 2013-06-24 17:17:41.003623 7fc09694f780 5 asok(0x20e0000) register_command perf dump hook 0x2090010
-20> 2013-06-24 17:17:41.003634 7fc09694f780 5 asok(0x20e0000) register_command perfcounters_schema hook 0x2090010
-19> 2013-06-24 17:17:41.003639 7fc09694f780 5 asok(0x20e0000) register_command 2 hook 0x2090010
-18> 2013-06-24 17:17:41.003643 7fc09694f780 5 asok(0x20e0000) register_command perf schema hook 0x2090010
-17> 2013-06-24 17:17:41.003647 7fc09694f780 5 asok(0x20e0000) register_command config show hook 0x2090010
-16> 2013-06-24 17:17:41.003650 7fc09694f780 5 asok(0x20e0000) register_command config set hook 0x2090010
-15> 2013-06-24 17:17:41.003653 7fc09694f780 5 asok(0x20e0000) register_command log flush hook 0x2090010
-14> 2013-06-24 17:17:41.003656 7fc09694f780 5 asok(0x20e0000) register_command log dump hook 0x2090010
-13> 2013-06-24 17:17:41.003660 7fc09694f780 5 asok(0x20e0000) register_command log reopen hook 0x2090010
-12> 2013-06-24 17:17:41.005558 7fc09694f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 24161
-11> 2013-06-24 17:17:41.009440 7fc09694f780 5 asok(0x20e0000) init /var/run/ceph/ceph-mon.gary-centos-03.asok
-10> 2013-06-24 17:17:41.009488 7fc09694f780 5 asok(0x20e0000) bind_and_listen /var/run/ceph/ceph-mon.gary-centos-03.asok
-9> 2013-06-24 17:17:41.009806 7fc09694f780 5 asok(0x20e0000) register_command 0 hook 0x20880b8
-8> 2013-06-24 17:17:41.009831 7fc09694f780 5 asok(0x20e0000) register_command version hook 0x20880b8
-7> 2013-06-24 17:17:41.009851 7fc09694f780 5 asok(0x20e0000) register_command git_version hook 0x20880b8
-6> 2013-06-24 17:17:41.009865 7fc09694f780 5 asok(0x20e0000) register_command help hook 0x20900d0
-5> 2013-06-24 17:17:41.009942 7fc09694f780 10 needs_conversion
-4> 2013-06-24 17:17:41.011576 7fc092a3d700 5 asok(0x20e0000) entry start
-3> 2013-06-24 17:17:41.016742 7fc09694f780 10 obtain_monmap
-2> 2013-06-24 17:17:41.017223 7fc09694f780 10 obtain_monmap found mkfs monmap
-1> 2013-06-24 17:17:41.017473 7fc09694f780 0 mon.gary-centos-03 does not exist in monmap, will attempt to join an existing cluster
0> 2013-06-24 17:17:41.019620 7fc09694f780 -1 common/config.cc: In function 'void md_config_t::set_val_or_die(const char*, const char*)' thread 7fc09694f780 time 2013-06-24 17:17:41.018007
common/config.cc: 621: FAILED assert(ret == 0)

ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x65f856]
2: /usr/bin/ceph-mon() [0x6690be]
3: (pick_addresses(CephContext*)+0x9b) [0x66949b]
4: (main()+0x389f) [0x476f7f]
5: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
6: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
10/10 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 10000
max_new 1000
log_file
--
end dump of recent events ---
terminate called after throwing an instance of 'ceph::FailedAssertion'
  • Caught signal (Aborted)
    in thread 7fc09694f780
    ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
    1: /usr/bin/ceph-mon() [0x58ddd9]
    2: (()+0xf500) [0x7fc095fb7500]
    3: (gsignal()+0x35) [0x7fc094bda8a5]
    4: (abort()+0x175) [0x7fc094bdc085]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x12d) [0x7fc095492a5d]
    6: (()+0xbcbe6) [0x7fc095490be6]
    7: (()+0xbcc13) [0x7fc095490c13]
    8: (()+0xbcd0e) [0x7fc095490d0e]
    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x622049]
    10: /usr/bin/ceph-mon() [0x65f856]
    11: /usr/bin/ceph-mon() [0x6690be]
    12: (pick_addresses(CephContext*)+0x9b) [0x66949b]
    13: (main()+0x389f) [0x476f7f]
    14: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
    15: /usr/bin/ceph-mon() [0x4725a9]
    2013-06-24 17:17:41.029535 7fc09694f780 -1
    Caught signal (Aborted) *
    in thread 7fc09694f780
ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x58ddd9]
2: (()+0xf500) [0x7fc095fb7500]
3: (gsignal()+0x35) [0x7fc094bda8a5]
4: (abort()+0x175) [0x7fc094bdc085]
5: (_gnu_cxx::_verbose_terminate_handler()+0x12d) [0x7fc095492a5d]
6: (()+0xbcbe6) [0x7fc095490be6]
7: (()+0xbcc13) [0x7fc095490c13]
8: (()+0xbcd0e) [0x7fc095490d0e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x622049]
10: /usr/bin/ceph-mon() [0x65f856]
11: /usr/bin/ceph-mon() [0x6690be]
12: (pick_addresses(CephContext*)+0x9b) [0x66949b]
13: (main()+0x389f) [0x476f7f]
14: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
15: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- begin dump of recent events ---
0> 2013-06-24 17:17:41.029535 7fc09694f780 -1 ** Caught signal (Aborted) *
in thread 7fc09694f780

ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x58ddd9]
2: (()+0xf500) [0x7fc095fb7500]
3: (gsignal()+0x35) [0x7fc094bda8a5]
4: (abort()+0x175) [0x7fc094bdc085]
5: (_gnu_cxx::_verbose_terminate_handler()+0x12d) [0x7fc095492a5d]
6: (()+0xbcbe6) [0x7fc095490be6]
7: (()+0xbcc13) [0x7fc095490c13]
8: (()+0xbcd0e) [0x7fc095490d0e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x622049]
10: /usr/bin/ceph-mon() [0x65f856]
11: /usr/bin/ceph-mon() [0x6690be]
12: (pick_addresses(CephContext*)+0x9b) [0x66949b]
13: (main()+0x389f) [0x476f7f]
14: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
15: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
10/10 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
2/-2 (syslog threshold)
99/99 (stderr threshold)
max_recent 10000
max_new 1000
log_file
--
end dump of recent events ---
Aborted

#9 Updated by Joao Eduardo Luis almost 6 years ago

Gary Lowell wrote:

With the public_network statement added to ceph.conf it look slike it gets further gut hits an assert. Stack trace appended below:

[root@gary-centos-03 ~]# ip route show
default via 10.214.128.1 dev eth0
10.214.128.0/20 dev eth0 proto kernel scope link src 10.214.140.138
169.254.0.0/16 dev eth0 scope link metric 1002

[root@gary-centos-03 ~]# cat /etc/ceph/ceph.conf
[global]
filestore_xattr_use_omap = true
mon_host = 10.214.140.139
osd_journal_size = 1024
public_network = 10.214.128.0/20
mon_initial_members = gary-centos-04
auth_supported = cephx
fsid = a7fbee26-0f0a-4580-a9bc-c98ea6a5e5fd

[root@gary-centos-03 ~]# /usr/bin/ceph-mon -d --debug_mon 10 -i gary-centos-03 --pid-file /var/run/ceph/mon.gary-centos-03.pid -c /etc/ceph/ceph.conf
2013-06-24 17:17:41.005558 7fc09694f780 0 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 24161
2013-06-24 17:17:41.009942 7fc09694f780 10 needs_conversion
2013-06-24 17:17:41.016742 7fc09694f780 10 obtain_monmap
2013-06-24 17:17:41.017223 7fc09694f780 10 obtain_monmap found mkfs monmap
2013-06-24 17:17:41.017473 7fc09694f780 0 mon.gary-centos-03 does not exist in monmap, will attempt to join an existing cluster
common/config.cc: In function 'void md_config_t::set_val_or_die(const char*, const char*)' thread 7fc09694f780 time 2013-06-24 17:17:41.018007
common/config.cc: 621: FAILED assert(ret == 0)
ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
1: /usr/bin/ceph-mon() [0x65f856]
2: /usr/bin/ceph-mon() [0x6690be]
3: (pick_addresses(CephContext*)+0x9b) [0x66949b]
4: (main()+0x389f) [0x476f7f]
5: (__libc_start_main()+0xfd) [0x7fc094bc6cdd]
6: /usr/bin/ceph-mon() [0x4725a9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2013-06-24 17:17:41.019620 7fc09694f780 -1 common/config.cc: In function 'void md_config_t::set_val_or_die(const char*, const char*)' thread 7fc09694f780 time 2013-06-24 17:17:41.018007

#5205

#10 Updated by Sage Weil almost 6 years ago

  • Project changed from devops to Ceph
  • Category deleted (ceph-deploy)

#11 Updated by Sage Weil almost 6 years ago

  • Status changed from In Progress to Need Review

#12 Updated by Anonymous almost 6 years ago

I ran a test with the wip-5195 branch. That fixed the issue with the assert.

There now seems to be a authentication problem. When the monitor is created via ceph-deploy, the ceph-mon daemon complains in the log abut failing to verify authorize reply. The ceph-create-keys process hangs waiting to communicate with ceph-mon. I suspect there is some bit of configuration setup that I'm missing.

Running ceph-mon with debug gives the following:

[root@gary-centos-03 ceph-deploy]# /usr/bin/ceph-mon d --debug_mon 10 -i gary-centos-03 --pid-file /var/run/ceph/mon.gary-centos-03.pid -c /etc/ceph/ceph.conf
2013-06-24 20:39:26.880442 7f4fa19d8780 0 ceph version 0.64-464-ga961ac9 (a961ac98168ee87f17793212a62d47f4d22905fc), process ceph-mon, pid 25286
2013-06-24 20:39:26.888820 7f4fa19d8780 10 needs_conversion
2013-06-24 20:39:26.899311 7f4fa19d8780 10 obtain_monmap
2013-06-24 20:39:26.899900 7f4fa19d8780 10 obtain_monmap found mkfs monmap
2013-06-24 20:39:26.900307 7f4fa19d8780 0 mon.gary-centos-03 does not exist in monmap, will attempt to join an existing cluster
starting mon.gary-centos-03 rank -1 at 10.214.140.138:6789/0 mon_data /var/lib/ceph/mon/ceph-gary-centos-03 fsid a7fbee26-0f0a-4580-a9bc-c98ea6a5e5fd
2013-06-24 20:39:26.901481 7f4fa19d8780 1 mon.gary-centos-03@-1(probing) e0 preinit fsid a7fbee26-0f0a-4580-a9bc-c98ea6a5e5fd
2013-06-24 20:39:26.901752 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 check_fsid cluster_uuid contains 'a7fbee26-0f0a-4580-a9bc-c98ea6a5e5fd'
2013-06-24 20:39:26.902030 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?)}
2013-06-24 20:39:26.902242 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 has_ever_joined = 0
2013-06-24 20:39:26.902428 7f4fa19d8780 1 mon.gary-centos-03@-1(probing) e0 initial_members gary-centos-04, filtering seed monmap
2013-06-24 20:39:26.902670 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 monmap is e0: 1 mons at {gary-centos-04=0.0.0.0:0/1}
2013-06-24 20:39:26.902855 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 extra probe peers 10.214.140.139:6789/0
2013-06-24 20:39:26.903113 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 init_paxos
2013-06-24 20:39:26.903351 7f4fa19d8780 10 mon.gary-centos-03@-1(probing).log v0 update_from_paxos
2013-06-24 20:39:26.903558 7f4fa19d8780 10 mon.gary-centos-03@-1(probing).log v0 update_from_paxos version 0 summary v 0
2013-06-24 20:39:26.903759 7f4fa19d8780 10 mon.gary-centos-03@-1(probing).auth v0 update_from_paxos
2013-06-24 20:39:26.903999 7f4fa19d8780 10 mon.gary-centos-03@-1(probing).health(0) init
2013-06-24 20:39:26.904198 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 loading initial keyring to bootstrap authentication for mkfs
2013-06-24 20:39:26.904986 7f4fa19d8780 2 mon.gary-centos-03@-1(probing) e0 init
2013-06-24 20:39:26.908220 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 bootstrap
2013-06-24 20:39:26.908477 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 unregister_cluster_logger - not registered
2013-06-24 20:39:26.908643 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 cancel_probe_timeout (none scheduled)
2013-06-24 20:39:26.908816 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 reset_sync
2013-06-24 20:39:26.909021 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 reset
2013-06-24 20:39:26.909211 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 timecheck_finish
2013-06-24 20:39:26.909405 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 cancel_probe_timeout (none scheduled)
2013-06-24 20:39:26.909582 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 reset_probe_timeout 0x3460180 after 2 seconds
2013-06-24 20:39:26.909805 7f4fa19d8780 10 mon.gary-centos-03@-1(probing) e0 probing other monitors
2013-06-24 20:39:26.911062 7f4fa19be700 0 -
10.214.140.138:6789/0 >> 0.0.0.0:0/1 pipe(0x35b8280 sd=20 :0 s=1 pgs=0 cs=0 l=0).fault
2013-06-24 20:39:26.912913 7f4f9a8ac700 10 mon.gary-centos-03@-1(probing) e0 ms_get_authorizer for mon
2013-06-24 20:39:26.915341 7f4f9a8ac700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2013-06-24 20:39:26.915609 7f4f9a8ac700 0 -- 10.214.140.138:6789/0 >> 10.214.140.139:6789/0 pipe(0x35b8a00 sd=19 :49751 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-06-24 20:39:26.915989 7f4f9a8ac700 0 -- 10.214.140.138:6789/0 >> 10.214.140.139:6789/0 pipe(0x35b8a00 sd=19 :49751 s=1 pgs=0 cs=0 l=0).fault
2013-06-24 20:39:26.921804 7f4f9a8ac700 10 mon.gary-centos-03@-1(probing) e0 ms_get_authorizer for mon
2013-06-24 20:39:26.929790 7f4f9a8ac700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
2013-06-24 20:39:26.930345 7f4f9a8ac700 0 -- 10.214.140.138:6789/0 >> 10.214.140.139:6789/0 pipe(0x35b8a00 sd=19 :49754 s=1 pgs=0 cs=0 l=0).failed verifying authorize reply
2013-06-24 20:39:27.149758 7f4f9a8ac700 10 mon.gary-centos-03@-1(probing) e0 ms_get_authorizer for mon
2013-06-24 20:39:27.157459 7f4f9a8ac700 0 cephx: verify_reply couldn't decrypt with error: error decoding block for decryption

#13 Updated by Anonymous almost 6 years ago

After clean-up and re-installation of wip-5195 it worked.

2013-06-24 19:13:52.853138 7f3f92b9e700 0 log [INF] : monmap e2: 2 mons at {gary-centos-03=10.214.140.138:6789/0,gary-centos-04=10.214.140.139:6789/0}

Adding a monitor that was not previously listed in the ceph-deploy new just requires the public network statement in the ceph.conf of the node hosting the new monitor.

Is this something that should just be added to the documentation ?

#14 Updated by Sage Weil almost 6 years ago

yeah, let's update the docs ("the new monitor needs to know what address to bind to") and close this bug. Yay!

#15 Updated by Anonymous almost 6 years ago

  • Status changed from Need Review to Resolved

Updated documentation to add a note about needing the public network statement.

#16 Updated by Bobby Yakov almost 5 years ago

Still having this issue with firefly, is it possible it was re-introduced>?
see SUPPORT #8861 just opened.

#17 Updated by Matthew Rees over 4 years ago

Same here. I have run through the latest quick start documentation and am using Ubuntu 14.04.1 and Ceph firefly with ceph-deploy 1.5.17. Here are the applicable logs (I have only included those for ceph-node-2 as those for ceph-node-3 are duplicates):

ceph-deploy mon create ceph-node-2 ceph-node-3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.17): /usr/bin/ceph-deploy mon create ceph-node-2 ceph-node-3
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-node-2 ceph-node-3
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph-node-2 ...
[ceph-node-2][DEBUG ] connection detected need for sudo
[ceph-node-2][DEBUG ] connected to host: ceph-node-2
[ceph-node-2][DEBUG ] detect platform information from remote host
[ceph-node-2][DEBUG ] detect machine type
[ceph_deploy.mon][INFO ] distro info: Ubuntu 14.04 trusty
[ceph-node-2][DEBUG ] determining if provided host has same hostname in remote
[ceph-node-2][DEBUG ] get remote short hostname
[ceph-node-2][DEBUG ] deploying mon to ceph-node-2
[ceph-node-2][DEBUG ] get remote short hostname
[ceph-node-2][DEBUG ] remote hostname: ceph-node-2
[ceph-node-2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-node-2][DEBUG ] create the mon path if it does not exist
[ceph-node-2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-node-2/done
[ceph-node-2][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph-node-2/done
[ceph-node-2][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph-node-2.mon.keyring
[ceph-node-2][DEBUG ] create the monitor keyring file
[ceph-node-2][INFO ] Running command: sudo ceph-mon --cluster ceph --mkfs -i ceph-node-2 --keyring /var/lib/ceph/tmp/ceph-ceph-node-2.mon.keyring
[ceph-node-2][DEBUG ] ceph-mon: set fsid to b6c3d00f-513e-40e5-9389-71481dc323e9
[ceph-node-2][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-ceph-node-2 for mon.ceph-node-2
[ceph-node-2][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-ceph-node-2.mon.keyring
[ceph-node-2][DEBUG ] create a done file to avoid re-doing the mon deployment
[ceph-node-2][DEBUG ] create the init path if it does not exist
[ceph-node-2][DEBUG ] locating the `service` executable...
[ceph-node-2][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=ceph-node-2
[ceph-node-2][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node-2.asok mon_status
[ceph-node-2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph-node-2][WARNIN] monitor: mon.ceph-node-2, might not be running yet
[ceph-node-2][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node-2.asok mon_status
[ceph-node-2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph-node-2][WARNIN] ceph-node-2 is not defined in `mon initial members`
[ceph-node-2][WARNIN] monitor ceph-node-2 does not exist in monmap
[ceph-node-2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
[ceph-node-2][WARNIN] monitors may not be able to form quorum

The issue seems to come down to needing a declaration of public_network in your ceph.conf when adding new monitors to your cluster, even if your nodes only have a single network interface.

The only applicable entry in the (quick start) documentation that I can find is @ http://ceph.com/docs/master/start/quick-ceph-deploy/#create-a-cluster -> item number 3

It seems to imply that adding public_network is only needed if you have more than one network interface, and the wording should probably be changed to require the entry regardless of the number of network interfaces.

My solution was to add public_network to my ceph.conf on my admin ceph-deploy node and then issuing the following command: ceph-deploy --overwrite-conf mon create ceph-node-2 ceph-node-3

I hope this helps. I will also post this in bug #8861

#18 Updated by Joe Quint about 4 years ago

I resolved the issue by adding the following to the ceph.conf file
public network = 10.0.2.0/24

I concur that the quick start guide should be updated. It implies that you don't have to do that step.

The error that I received was...
[ceph_deploy.mon][WARNING] waiting 10 seconds before retrying
[cephallinone][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.cephallinone.asok mon_status
[cephallinone][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph_deploy.mon][WARNING] mon.cephallinone monitor is not yet in quorum, tries left: 3
[ceph_deploy.mon][WARNING] waiting 10 seconds before retrying
[cephallinone][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.cephallinone.asok mon_status
[cephallinone][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph_deploy.mon][WARNING] mon.cephallinone monitor is not yet in quorum, tries left: 2
[ceph_deploy.mon][WARNING] waiting 15 seconds before retrying
[cephallinone][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.cephallinone.asok mon_status
[cephallinone][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph_deploy.mon][WARNING] mon.cephallinone monitor is not yet in quorum, tries left: 1

Also available in: Atom PDF