Bug #3647: forgot the auth options for Cephx and added them later: Get msg: 7ff9faaad700 monclient: hunting for new mon - Ceph - Ceph

Actions

Copy link

Bug #3647

closed

forgot the auth options for Cephx and added them later: Get msg: 7ff9faaad700 monclient: hunting for new mon

Added by Anonymous over 11 years ago. Updated almost 8 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Joao Eduardo Luis

Category:

Monitor

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Seeing errors when setting up ceph from scratch with the options in the ceph.conf file. I forgot the auth options for Cephx and added them later. then did a restart.

(04:47:39 PM) pat.beadles@newdream.net/18754661421353962283138380: here is the errors:2012-12-19 00:37:18.366578 7ff9faaad700 monclient: hunting for new mon
2012-12-19 00:37:18.377881 7ff9faaad700 monclient: hunting for new mon
2012-12-19 00:37:18.405659 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(3 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.405864 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(4 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.405979 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(5 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.406078 7ff9faaad700 monclient: hunting for new mon

(04:47:52 PM) pat.beadles@newdream.net/18754661421353962283138380: on ceph1
(04:48:27 PM) pat.beadles@newdream.net/18754661421353962283138380:   the errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700  0 can't decode unknown message type 54 MSG_AUTH=17
(04:48:36 PM) pat.beadles@newdream.net/18754661421353962283138380: have you seen this?
(04:48:41 PM) tamil.muthamizhan@newdream.net: please paste ceph.conf file
(04:48:54 PM) pat.beadles@newdream.net/18754661421353962283138380: this happens when I do "ceph -s" 
(04:49:18 PM) tamil.muthamizhan@newdream.net: interesting
(04:49:23 PM) tamil.muthamizhan@newdream.net: i have not seen this error
(04:49:42 PM) pat.beadles@newdream.net/18754661421353962283138380: auth cluster required = none
auth service required = none
auth client required = none

[osd]
    osd journal size = 1000
    filestore xattr use omap = true

    # Execute $ hostname to retrieve the name of your host,
    # and replace {hostname} with the name of your host.
    # For the monitor, replace {ip-address} with the IP
    # address of your host.

[mon.a]

    host = ceph1
    mon addr = 10.1.10.21:6789

[mon.b]

    host = ceph2
    mon addr = 10.1.10.26:6789

[mon.c]

    host = ceph3
    mon addr = 10.1.10.22:6789

[osd.0]
    host = ceph1
    osd data = /var/lib/ceph/osd/ceph-0
    devs = /dev/sdc1
    osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.1]
    host = ceph1
        osd data = /var/lib/ceph/osd/ceph-1
        devs = /dev/sdd1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.2]
    host = ceph2
    osd data = /var/lib/ceph/osd/ceph-2
        devs = /dev/sdc1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.3]
    host = ceph2
    osd data = /var/lib/ceph/osd/ceph-3
        devs = /dev/sdd1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.4]
    host = ceph3
    osd data = /var/lib/ceph/osd/ceph-4
        #devs = /dev/sdc1
        #osd mkfs type = ext4
    #osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.5]
    host = ceph3
    osd data = /var/lib/ceph/osd/ceph-5
        devs = /dev/sdd1
        osd mkfs type = ext4
    #osd mount options ext4 = "rw,noatime,user_xattr" 

[mds.a]
    host = ceph1

(04:50:16 PM) tamil.muthamizhan@newdream.net: are you putting the cephx options under [global]?
(04:50:20 PM) tamil.muthamizhan@newdream.net: i dont see global here
(04:50:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ sudo service ceph -a status
=== mon.a === 
mon.a: running {"version":"0.55.1-294-g0dd1302"}
=== mon.b === 
mon.b: running {"version":"0.55.1-294-g0dd1302"}
=== mon.c === 
mon.c: running {"version":"0.55.1-294-g0dd1302"}
=== mds.a === 
mds.a: running {"version":"0.55.1-294-g0dd1302"}
=== osd.0 === 
osd.0: running {"version":"0.55.1-294-g0dd1302"}
=== osd.1 === 
osd.1: running {"version":"0.55.1-294-g0dd1302"}
=== osd.2 === 
osd.2: running {"version":"0.55.1-294-g0dd1302"}
=== osd.3 === 
osd.3: running {"version":"0.55.1-294-g0dd1302"}
=== osd.4 === 
osd.4: running {"version":"0.55.1-294-g0dd1302"}
=== osd.5 === 
osd.5: running {"version":"0.55.1-294-g0dd1302"}

(04:51:12 PM) tamil.muthamizhan@newdream.net: sudo ceph -s
(04:51:15 PM) tamil.muthamizhan@newdream.net: ?
(04:52:06 PM) tamil.muthamizhan@newdream.net: ok, so all daemons are running file?
(04:52:08 PM) tamil.muthamizhan@newdream.net: fine*?
(04:54:30 PM) pat.beadles@newdream.net/18754661421353962283138380: yes the auth is at the top of the file
(04:54:54 PM) tamil.muthamizhan@newdream.net: ok
(04:55:07 PM) tamil.muthamizhan@newdream.net: shall I take a look into your cluster?
(04:55:41 PM) pat.beadles@newdream.net/18754661421353962283138380: If I do ceph health, it gives me an ok. 
(04:55:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ ceph health
2012-12-19 00:55:01.177928 mon <- [health]
2012-12-19 00:55:01.178756 mon.0 -> 'HEALTH_OK' (0)
(04:56:18 PM) tamil.muthamizhan@newdream.net: so when you issue "sudo ceph -s" on the other nodes, you see that starnge auth message?
(04:56:26 PM) tamil.muthamizhan@newdream.net: strange*
(04:57:07 PM) tamil.muthamizhan@newdream.net: is there any logs at /var/log/ceph/osd/ceph-0
(04:57:09 PM) tamil.muthamizhan@newdream.net: ?
(04:57:18 PM) pat.beadles@newdream.net/18754661421353962283138380: the other 2 nodes respond correctly
(04:57:32 PM) tamil.muthamizhan@newdream.net:  is there any logs at /var/log/ceph/osd/ceph-<osd_no>
(04:57:36 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph3:/var/log/ceph$ ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at {a=10.1.10.21:6789/0,b=10.1.10.26:6789/0,c=10.1.10.22:6789/0}, election epoch 24, quorum 0,1,2 a,b,c
   osdmap e78: 6 osds: 6 up, 6 in
    pgmap v868: 1344 pgs: 1344 active+clean; 8730 bytes data, 7712 MB used, 50457 MB / 61241 MB avail
   mdsmap e21: 1/1/1 up {0=a=up:active}

(04:58:34 PM) pat.beadles@newdream.net/18754661421353962283138380: yes.  would you be able to get to my VMs if I give you the IPs?
(04:58:42 PM) tamil.muthamizhan@newdream.net: you mentionedthe errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700  0 can't decode unknown message type 54 MSG_AUTH=17

Files

Download all files

Patlogs.tgz (2 MB) Patlogs.tgz		Anonymous, 12/18/2012 06:04 PM
ceph1_dmesg (31.7 KB) ceph1_dmesg		Anonymous, 12/18/2012 06:08 PM

Actions

Copy link

Updated by Anonymous over 11 years ago

File ceph1_dmesg ceph1_dmesg added
Description updated (diff)

added output for dmesg on ceph1

Actions

Copy link

Updated by Anonymous over 11 years ago

Description updated (diff)

Below works, but "ceph -s" does not

ubuntu@ceph1:~$ ceph health
2012-12-19 18:27:51.090414 mon <- [health]
2012-12-19 18:27:51.092927 mon.0 -> 'HEALTH_OK' (0)

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Description updated (diff)

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Pat, do you still have the VMs in this state? If so, can I take a look?

Actions

Copy link

Updated by Ian Colle over 11 years ago

Assignee set to Joao Eduardo Luis

Actions

Copy link

Updated by Anonymous over 11 years ago

I am in Sunnyvale and the VMs reside on my desktop. I have snapshotted and created a tar file of my 3 node cluster. I will not be able to upload to this bug. What burnupi could a send it to?

Actions

Copy link

Updated by Anonymous over 11 years ago

I just did an "scp" to burnupi40.front.sepia.ceph.com:/home/ubuntu/3647.vm.tgz

Actions

Copy link

Updated by Joao Eduardo Luis over 11 years ago

Pat, just a little triage before I dive into this full head on, could you please try the following for each monitor?

ceph -m ip:port -s

If this happens to fail for all three monitors, it means that we probably have a real issue; if it works, as I expect it to, for at least mon.a, then it may very well be a misconfiguration issue.

Just a bit of background: it is possible for 'ceph health' to work but not 'ceph -s'. The ceph tool's monclient will communicate with a random monitor. So if you have at least one misconfigured monitor, it is possible that you will try to contact that one and then things my go sideways.

Actions

Copy link

Updated by Anonymous over 11 years ago

ubuntu@ceph3:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:11.535510 7f6310954780 -1 ceph_tool_common_init failed.

ubuntu@ceph2:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:08.889162 7f811cc06780 -1 ceph_tool_common_init failed.

ubuntu@ceph1:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (Success)
2012-12-19 02:44:20.759057 7ffc47a50780 -1 ceph_tool_common_init failed.

Actions

Copy link

#10

Updated by Joao Eduardo Luis over 11 years ago

err... that should have been each monitor's ip and port.

as in

ceph -m 10.0.0.1:6789 -s

or whatever ip and ports the monitors are on.

Actions

Copy link

#11

Updated by Sage Weil over 11 years ago

Status changed from New to Can't reproduce

Actions

Copy link

#12

Updated by stephane beuret almost 8 years ago

I see this when I perform ceph s
2016-07-22 19:08:42.844997 6c700470 0 - :/1260326585 >> 192.168.100.151:6789/0 pipe(0x6c405b30 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x6c400ce8).fault

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3647

forgot the auth options for Cephx and added them later: Get msg: 7ff9faaad700 monclient: hunting for new mon

Updated by Anonymous over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Ian Colle over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Anonymous over 11 years ago

Updated by Joao Eduardo Luis over 11 years ago

Updated by Sage Weil over 11 years ago

Updated by stephane beuret almost 8 years ago