Project

General

Profile

Actions

Bug #12442

closed

how to use the command “ceph ping mon.ID”?

Added by huanwen ren almost 9 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1.I refer to the following ceph document?
Documentation Location
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/

chapters
Initial Troubleshooting

2.Questions?
I use the command “ceph ping mon.ID” in my cluster'environment,
but suggests the "Error connecting to cluster: ObjectNotFound".
hint:Image information in attachment

Thank you for your attention!

Actions #1

Updated by huang jun almost 9 years ago

i add some debug info into ping_monitor function, it's weird that
"ceph mon stat" shows that
e1: 3 mons at {a=192.168.0.253:6789/0,b=192.168.0.253:6790/0,c=192.168.0.253:6791/0}, election epoch 18, quorum 0,1,2 a,b,c

but in the MonClient.cc::ping_monitor, we print the monmap.mon_addr, it shows
ping_monitor no such monitor 'mon.a' {noname-a=192.168.1.205:6789/0}

Actions #2

Updated by huang jun almost 9 years ago

this PR can resolve your problem
https://github.com/ceph/ceph/pull/5343
but i don't know if it's the best way to resolve this.

Actions #3

Updated by huanwen ren almost 9 years ago

@huang jun
thank you

Actions #4

Updated by Kefu Chai over 8 years ago

huanwen ren,

could you paste the plain text instead of a picture?

huang jun,

i am not sure that the huanwen's command ever reached ping_monitor(), it errered out when rados client was trying to connect the cluster. see src/ceph.in:684

sorry, i was wrong. it was indeed handled by the ping_monitor().

Actions #5

Updated by huanwen ren over 8 years ago

Kefu Chai wrote:

huanwen ren,

could you paste the plain text instead of a picture?

huang jun,

i am not sure that the huanwen's command ever reached ping_monitor(), it errered out when rados client was trying to connect the cluster. see src/ceph.in:684

sorry, i was wrong. it was indeed handled by the ping_monitor().

@Kefu Chai Chai
show the text:

[root@node173 ~]# ceph -s
    cluster bda1accf-1104-4ccd-9aba-d96ff180e304
     health HEALTH_WARN clock skew detected on mon.node181; 320 pgs down; 320 pgs peering; 320 pgs stuck inactive; 320 pgs stuck unclean; mon.node173 low disk space -- 15% avail
     monmap e1: 2 mons at {node173=10.118.202.173:6789/0,node181=10.118.202.181:6789/0}, election epoch 42, quorum 0,1 node173,node181
     osdmap e170: 6 osds: 6 up, 6 in
      pgmap v6420: 320 pgs, 3 pools, 0 bytes data, 0 objects
            293 GB used, 78835 MB / 370 GB avail
                 320 down+peering
[root@node173 ~]# ceph ping mon.node181
Error connecting to cluster: ObjectNotFound
[root@node173 ~]#
Actions #6

Updated by Kefu Chai over 8 years ago

huanwen ren,

i assume that your ceph.conf either

- has "mon host" defined or
- has it does not have a section for "mon.node183" at all

thus, the monclient failed to identify the address of "mon.node183" after its up . please note that, "ceph ping mon" tries to connect to the monitor directly using the known addresses collected from the ceph.conf.

could you post your ceph.conf also?

Actions #7

Updated by Kefu Chai over 8 years ago

  • File deleted (errors2.jpg)
Actions #8

Updated by huanwen ren over 8 years ago

The following is my ceph. Conf file

[root@node173 ceph]# vi /etc/ceph/ceph.conf 

[global]
fsid = bda1accf-1104-4ccd-9aba-d96ff180e304
mon_initial_members = node181, node173
mon_host = 10.118.202.181,10.118.202.173
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

[mon]
debug_mon = 10/10
debug_ms = 10/10
debug_monc = 10/10

Ceph ping node181 use also wrong

[root@node173 ceph]# ceph ping node181
"ping" expects a monitor to ping; try "ping mon.<id>" 
[root@node173 ceph]# 

Actions #9

Updated by Kefu Chai over 8 years ago

  • Tracker changed from Support to Bug
  • Status changed from New to 12
  • Regression set to No
Actions #10

Updated by Kefu Chai over 8 years ago

  • Status changed from 12 to Fix Under Review
Actions #11

Updated by Kefu Chai over 8 years ago

huanwen ren wrote:

Ceph ping node181 use also wrong
[...]

this is expected.

Actions #12

Updated by huanwen ren over 8 years ago

ok

Actions #13

Updated by Joao Eduardo Luis over 8 years ago

my comment on the pull request, for future reference.

I think the patch fixes the appropriate issue.

The 'noname-a' spit out by MonMap:build_initial() is a placeholder for the monitor name when we don't really have said names. The ceph_mon does populate the map somewhat appropriately using 'mon initial members' and by knowing its own id. We don't really get to do that on the clients, because then we would have to force ordering the addresses and the 'members' to match 1:1, and I don't think it actually solves the issue.

The real issue here is that 'ping' (via MonClient::ping_monitor()) short-circuits the Client's typical discovery mechanisms. The client will usually authenticate with a monitor from the initial build momnap, without regard for the monitor name -- it will just care for its ip address, given any monitor will do --, and it will then get a full monmap from the monitors.

MonClient::ping_monitor() is, by design, skipping auth. This is because auth only happens when there's a quorum, and the ping is meant to let the client know whether a given monitor is alive or not, even in the absence of quorum. Again, this means skipping authentication. No authentication, no brand new monmap. And the bug here lies with assuming a fully populated monmap exists.

This would easily be fixed by monitor-specific sections in the conf, but that's really just a workaround; not a fix.

I agree with @hjwsm1989's fix. I think the patch could be a little bit better, but that will be commented on the patch instead. I believe the solution is sound, and it doesn't touch any other portion of the monclient code. MonClient::ping_monitor() is an exception, and fixing it like this, exceptionally, is okay.

Actions #14

Updated by Phat Le Ton about 8 years ago

I had the same issue and I used this solution to fix it:
1. In [global] section
remove the line : mon_host

2. Create [mon] section
Add mon_host & mon_addr list

3. Create [mon.id] section for each monitor

This is my example:
[mon]
mon_host = Storage002, Storage003, Storage004
mon_addr = xxx.xxx.xxx.2, xxx.xxx.xxx.3, xxx.xxx.xxx.4
; You have had more options in this section
...

[mon.Storage002]
host = Storage002
mon_addr = xxx.xxx.xxx.2

[mon.Storage003]
host = Storage003
mon_addr = xxx.xxx.xxx.3

[mon.Storage004]
host = Storage004
mon_addr = xxx.xxx.xxx.4

Copy to all Ceph nodes and restart all the mon services

Actions #15

Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF