Project

General

Profile

Actions

Bug #3647

closed

forgot the auth options for Cephx and added them later: Get msg: 7ff9faaad700 monclient: hunting for new mon

Added by Anonymous over 11 years ago. Updated almost 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seeing errors when setting up ceph from scratch with the options in the ceph.conf file. I forgot the auth options for Cephx and added them later. then did a restart.

(04:47:39 PM) pat.beadles@newdream.net/18754661421353962283138380: here is the errors:2012-12-19 00:37:18.366578 7ff9faaad700 monclient: hunting for new mon
2012-12-19 00:37:18.377881 7ff9faaad700 monclient: hunting for new mon
2012-12-19 00:37:18.405659 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(3 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.405864 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(4 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.405979 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(5 v0) v1 because of no pipe on con 0x7ff9ec0084a0
2012-12-19 00:37:18.406078 7ff9faaad700 monclient: hunting for new mon

(04:47:52 PM) pat.beadles@newdream.net/18754661421353962283138380: on ceph1
(04:48:27 PM) pat.beadles@newdream.net/18754661421353962283138380:   the errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700  0 can't decode unknown message type 54 MSG_AUTH=17
(04:48:36 PM) pat.beadles@newdream.net/18754661421353962283138380: have you seen this?
(04:48:41 PM) tamil.muthamizhan@newdream.net: please paste ceph.conf file
(04:48:54 PM) pat.beadles@newdream.net/18754661421353962283138380: this happens when I do "ceph -s" 
(04:49:18 PM) tamil.muthamizhan@newdream.net: interesting
(04:49:23 PM) tamil.muthamizhan@newdream.net: i have not seen this error
(04:49:42 PM) pat.beadles@newdream.net/18754661421353962283138380: auth cluster required = none
auth service required = none
auth client required = none

[osd]
    osd journal size = 1000
    filestore xattr use omap = true

    # Execute $ hostname to retrieve the name of your host,
    # and replace {hostname} with the name of your host.
    # For the monitor, replace {ip-address} with the IP
    # address of your host.

[mon.a]

    host = ceph1
    mon addr = 10.1.10.21:6789

[mon.b]

    host = ceph2
    mon addr = 10.1.10.26:6789

[mon.c]

    host = ceph3
    mon addr = 10.1.10.22:6789

[osd.0]
    host = ceph1
    osd data = /var/lib/ceph/osd/ceph-0
    devs = /dev/sdc1
    osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.1]
    host = ceph1
        osd data = /var/lib/ceph/osd/ceph-1
        devs = /dev/sdd1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.2]
    host = ceph2
    osd data = /var/lib/ceph/osd/ceph-2
        devs = /dev/sdc1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.3]
    host = ceph2
    osd data = /var/lib/ceph/osd/ceph-3
        devs = /dev/sdd1
        osd mkfs type = ext4
    osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.4]
    host = ceph3
    osd data = /var/lib/ceph/osd/ceph-4
        #devs = /dev/sdc1
        #osd mkfs type = ext4
    #osd mount options ext4 = "rw,noatime,user_xattr" 

[osd.5]
    host = ceph3
    osd data = /var/lib/ceph/osd/ceph-5
        devs = /dev/sdd1
        osd mkfs type = ext4
    #osd mount options ext4 = "rw,noatime,user_xattr" 

[mds.a]
    host = ceph1

(04:50:16 PM) tamil.muthamizhan@newdream.net: are you putting the cephx options under [global]?
(04:50:20 PM) tamil.muthamizhan@newdream.net: i dont see global here
(04:50:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ sudo service ceph -a status
=== mon.a === 
mon.a: running {"version":"0.55.1-294-g0dd1302"}
=== mon.b === 
mon.b: running {"version":"0.55.1-294-g0dd1302"}
=== mon.c === 
mon.c: running {"version":"0.55.1-294-g0dd1302"}
=== mds.a === 
mds.a: running {"version":"0.55.1-294-g0dd1302"}
=== osd.0 === 
osd.0: running {"version":"0.55.1-294-g0dd1302"}
=== osd.1 === 
osd.1: running {"version":"0.55.1-294-g0dd1302"}
=== osd.2 === 
osd.2: running {"version":"0.55.1-294-g0dd1302"}
=== osd.3 === 
osd.3: running {"version":"0.55.1-294-g0dd1302"}
=== osd.4 === 
osd.4: running {"version":"0.55.1-294-g0dd1302"}
=== osd.5 === 
osd.5: running {"version":"0.55.1-294-g0dd1302"}

(04:51:12 PM) tamil.muthamizhan@newdream.net: sudo ceph -s
(04:51:15 PM) tamil.muthamizhan@newdream.net: ?
(04:52:06 PM) tamil.muthamizhan@newdream.net: ok, so all daemons are running file?
(04:52:08 PM) tamil.muthamizhan@newdream.net: fine*?
(04:54:30 PM) pat.beadles@newdream.net/18754661421353962283138380: yes the auth is at the top of the file
(04:54:54 PM) tamil.muthamizhan@newdream.net: ok
(04:55:07 PM) tamil.muthamizhan@newdream.net: shall I take a look into your cluster?
(04:55:41 PM) pat.beadles@newdream.net/18754661421353962283138380: If I do ceph health, it gives me an ok. 
(04:55:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ ceph health
2012-12-19 00:55:01.177928 mon <- [health]
2012-12-19 00:55:01.178756 mon.0 -> 'HEALTH_OK' (0)
(04:56:18 PM) tamil.muthamizhan@newdream.net: so when you issue "sudo ceph -s" on the other nodes, you see that starnge auth message?
(04:56:26 PM) tamil.muthamizhan@newdream.net: strange*
(04:57:07 PM) tamil.muthamizhan@newdream.net: is there any logs at /var/log/ceph/osd/ceph-0
(04:57:09 PM) tamil.muthamizhan@newdream.net: ?
(04:57:18 PM) pat.beadles@newdream.net/18754661421353962283138380: the other 2 nodes respond correctly
(04:57:32 PM) tamil.muthamizhan@newdream.net:  is there any logs at /var/log/ceph/osd/ceph-<osd_no>
(04:57:36 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph3:/var/log/ceph$ ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at {a=10.1.10.21:6789/0,b=10.1.10.26:6789/0,c=10.1.10.22:6789/0}, election epoch 24, quorum 0,1,2 a,b,c
   osdmap e78: 6 osds: 6 up, 6 in
    pgmap v868: 1344 pgs: 1344 active+clean; 8730 bytes data, 7712 MB used, 50457 MB / 61241 MB avail
   mdsmap e21: 1/1/1 up {0=a=up:active}

(04:58:34 PM) pat.beadles@newdream.net/18754661421353962283138380: yes.  would you be able to get to my VMs if I give you the IPs?
(04:58:42 PM) tamil.muthamizhan@newdream.net: you mentionedthe errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700  0 can't decode unknown message type 54 MSG_AUTH=17

Files

Patlogs.tgz (2 MB) Patlogs.tgz Anonymous, 12/18/2012 06:04 PM
ceph1_dmesg (31.7 KB) ceph1_dmesg Anonymous, 12/18/2012 06:08 PM
Actions #1

Updated by Anonymous over 11 years ago

added output for dmesg on ceph1

Actions #2

Updated by Anonymous over 11 years ago

  • Description updated (diff)

Below works, but "ceph -s" does not

ubuntu@ceph1:~$ ceph health
2012-12-19 18:27:51.090414 mon <- [health]
2012-12-19 18:27:51.092927 mon.0 -> 'HEALTH_OK' (0)

Actions #3

Updated by Joao Eduardo Luis over 11 years ago

  • Description updated (diff)
Actions #4

Updated by Joao Eduardo Luis over 11 years ago

Pat, do you still have the VMs in this state? If so, can I take a look?

Actions #5

Updated by Ian Colle over 11 years ago

  • Assignee set to Joao Eduardo Luis
Actions #6

Updated by Anonymous over 11 years ago

I am in Sunnyvale and the VMs reside on my desktop. I have snapshotted and created a tar file of my 3 node cluster. I will not be able to upload to this bug. What burnupi could a send it to?

Actions #7

Updated by Anonymous over 11 years ago

I just did an "scp" to burnupi40.front.sepia.ceph.com:/home/ubuntu/3647.vm.tgz

Actions #8

Updated by Joao Eduardo Luis over 11 years ago

Pat, just a little triage before I dive into this full head on, could you please try the following for each monitor?

ceph -m ip:port -s

If this happens to fail for all three monitors, it means that we probably have a real issue; if it works, as I expect it to, for at least mon.a, then it may very well be a misconfiguration issue.

Just a bit of background: it is possible for 'ceph health' to work but not 'ceph -s'. The ceph tool's monclient will communicate with a random monitor. So if you have at least one misconfigured monitor, it is possible that you will try to contact that one and then things my go sideways.

Actions #9

Updated by Anonymous over 11 years ago

ubuntu@ceph3:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:11.535510 7f6310954780 -1 ceph_tool_common_init failed.

ubuntu@ceph2:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:08.889162 7f811cc06780 -1 ceph_tool_common_init failed.

ubuntu@ceph1:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (Success)
2012-12-19 02:44:20.759057 7ffc47a50780 -1 ceph_tool_common_init failed.

Actions #10

Updated by Joao Eduardo Luis over 11 years ago

err... that should have been each monitor's ip and port.

as in

ceph -m 10.0.0.1:6789 -s

or whatever ip and ports the monitors are on.

Actions #11

Updated by Sage Weil over 11 years ago

  • Status changed from New to Can't reproduce
Actions #12

Updated by stephane beuret almost 8 years ago

I see this when I perform ceph s
2016-07-22 19:08:42.844997 6c700470 0 -
:/1260326585 >> 192.168.100.151:6789/0 pipe(0x6c405b30 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x6c400ce8).fault

Actions

Also available in: Atom PDF