Bug #3647
closedforgot the auth options for Cephx and added them later: Get msg: 7ff9faaad700 monclient: hunting for new mon
0%
Description
Seeing errors when setting up ceph from scratch with the options in the ceph.conf file. I forgot the auth options for Cephx and added them later. then did a restart.
(04:47:39 PM) pat.beadles@newdream.net/18754661421353962283138380: here is the errors:2012-12-19 00:37:18.366578 7ff9faaad700 monclient: hunting for new mon 2012-12-19 00:37:18.377881 7ff9faaad700 monclient: hunting for new mon 2012-12-19 00:37:18.405659 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(3 v0) v1 because of no pipe on con 0x7ff9ec0084a0 2012-12-19 00:37:18.405864 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(4 v0) v1 because of no pipe on con 0x7ff9ec0084a0 2012-12-19 00:37:18.405979 7ff9faaad700 -- 10.1.10.21:0/12730 send_message dropped message observe(5 v0) v1 because of no pipe on con 0x7ff9ec0084a0 2012-12-19 00:37:18.406078 7ff9faaad700 monclient: hunting for new mon (04:47:52 PM) pat.beadles@newdream.net/18754661421353962283138380: on ceph1 (04:48:27 PM) pat.beadles@newdream.net/18754661421353962283138380: the errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700 0 can't decode unknown message type 54 MSG_AUTH=17 (04:48:36 PM) pat.beadles@newdream.net/18754661421353962283138380: have you seen this? (04:48:41 PM) tamil.muthamizhan@newdream.net: please paste ceph.conf file (04:48:54 PM) pat.beadles@newdream.net/18754661421353962283138380: this happens when I do "ceph -s" (04:49:18 PM) tamil.muthamizhan@newdream.net: interesting (04:49:23 PM) tamil.muthamizhan@newdream.net: i have not seen this error (04:49:42 PM) pat.beadles@newdream.net/18754661421353962283138380: auth cluster required = none auth service required = none auth client required = none [osd] osd journal size = 1000 filestore xattr use omap = true # Execute $ hostname to retrieve the name of your host, # and replace {hostname} with the name of your host. # For the monitor, replace {ip-address} with the IP # address of your host. [mon.a] host = ceph1 mon addr = 10.1.10.21:6789 [mon.b] host = ceph2 mon addr = 10.1.10.26:6789 [mon.c] host = ceph3 mon addr = 10.1.10.22:6789 [osd.0] host = ceph1 osd data = /var/lib/ceph/osd/ceph-0 devs = /dev/sdc1 osd mkfs type = ext4 osd mount options ext4 = "rw,noatime,user_xattr" [osd.1] host = ceph1 osd data = /var/lib/ceph/osd/ceph-1 devs = /dev/sdd1 osd mkfs type = ext4 osd mount options ext4 = "rw,noatime,user_xattr" [osd.2] host = ceph2 osd data = /var/lib/ceph/osd/ceph-2 devs = /dev/sdc1 osd mkfs type = ext4 osd mount options ext4 = "rw,noatime,user_xattr" [osd.3] host = ceph2 osd data = /var/lib/ceph/osd/ceph-3 devs = /dev/sdd1 osd mkfs type = ext4 osd mount options ext4 = "rw,noatime,user_xattr" [osd.4] host = ceph3 osd data = /var/lib/ceph/osd/ceph-4 #devs = /dev/sdc1 #osd mkfs type = ext4 #osd mount options ext4 = "rw,noatime,user_xattr" [osd.5] host = ceph3 osd data = /var/lib/ceph/osd/ceph-5 devs = /dev/sdd1 osd mkfs type = ext4 #osd mount options ext4 = "rw,noatime,user_xattr" [mds.a] host = ceph1 (04:50:16 PM) tamil.muthamizhan@newdream.net: are you putting the cephx options under [global]? (04:50:20 PM) tamil.muthamizhan@newdream.net: i dont see global here (04:50:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ sudo service ceph -a status === mon.a === mon.a: running {"version":"0.55.1-294-g0dd1302"} === mon.b === mon.b: running {"version":"0.55.1-294-g0dd1302"} === mon.c === mon.c: running {"version":"0.55.1-294-g0dd1302"} === mds.a === mds.a: running {"version":"0.55.1-294-g0dd1302"} === osd.0 === osd.0: running {"version":"0.55.1-294-g0dd1302"} === osd.1 === osd.1: running {"version":"0.55.1-294-g0dd1302"} === osd.2 === osd.2: running {"version":"0.55.1-294-g0dd1302"} === osd.3 === osd.3: running {"version":"0.55.1-294-g0dd1302"} === osd.4 === osd.4: running {"version":"0.55.1-294-g0dd1302"} === osd.5 === osd.5: running {"version":"0.55.1-294-g0dd1302"} (04:51:12 PM) tamil.muthamizhan@newdream.net: sudo ceph -s (04:51:15 PM) tamil.muthamizhan@newdream.net: ? (04:52:06 PM) tamil.muthamizhan@newdream.net: ok, so all daemons are running file? (04:52:08 PM) tamil.muthamizhan@newdream.net: fine*? (04:54:30 PM) pat.beadles@newdream.net/18754661421353962283138380: yes the auth is at the top of the file (04:54:54 PM) tamil.muthamizhan@newdream.net: ok (04:55:07 PM) tamil.muthamizhan@newdream.net: shall I take a look into your cluster? (04:55:41 PM) pat.beadles@newdream.net/18754661421353962283138380: If I do ceph health, it gives me an ok. (04:55:46 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph1:/var/log/ceph$ ceph health 2012-12-19 00:55:01.177928 mon <- [health] 2012-12-19 00:55:01.178756 mon.0 -> 'HEALTH_OK' (0) (04:56:18 PM) tamil.muthamizhan@newdream.net: so when you issue "sudo ceph -s" on the other nodes, you see that starnge auth message? (04:56:26 PM) tamil.muthamizhan@newdream.net: strange* (04:57:07 PM) tamil.muthamizhan@newdream.net: is there any logs at /var/log/ceph/osd/ceph-0 (04:57:09 PM) tamil.muthamizhan@newdream.net: ? (04:57:18 PM) pat.beadles@newdream.net/18754661421353962283138380: the other 2 nodes respond correctly (04:57:32 PM) tamil.muthamizhan@newdream.net: is there any logs at /var/log/ceph/osd/ceph-<osd_no> (04:57:36 PM) pat.beadles@newdream.net/18754661421353962283138380: ubuntu@ceph3:/var/log/ceph$ ceph -s health HEALTH_OK monmap e1: 3 mons at {a=10.1.10.21:6789/0,b=10.1.10.26:6789/0,c=10.1.10.22:6789/0}, election epoch 24, quorum 0,1,2 a,b,c osdmap e78: 6 osds: 6 up, 6 in pgmap v868: 1344 pgs: 1344 active+clean; 8730 bytes data, 7712 MB used, 50457 MB / 61241 MB avail mdsmap e21: 1/1/1 up {0=a=up:active} (04:58:34 PM) pat.beadles@newdream.net/18754661421353962283138380: yes. would you be able to get to my VMs if I give you the IPs? (04:58:42 PM) tamil.muthamizhan@newdream.net: you mentionedthe errors on the other 2 nodes are: 2012-12-19 00:37:18.508030 7fdee4e67700 0 can't decode unknown message type 54 MSG_AUTH=17
Files
Updated by Anonymous over 11 years ago
- File ceph1_dmesg ceph1_dmesg added
- Description updated (diff)
added output for dmesg on ceph1
Updated by Anonymous over 11 years ago
- Description updated (diff)
Below works, but "ceph -s" does not
ubuntu@ceph1:~$ ceph health
2012-12-19 18:27:51.090414 mon <- [health]
2012-12-19 18:27:51.092927 mon.0 -> 'HEALTH_OK' (0)
Updated by Joao Eduardo Luis over 11 years ago
Pat, do you still have the VMs in this state? If so, can I take a look?
Updated by Anonymous over 11 years ago
I am in Sunnyvale and the VMs reside on my desktop. I have snapshotted and created a tar file of my 3 node cluster. I will not be able to upload to this bug. What burnupi could a send it to?
Updated by Anonymous over 11 years ago
I just did an "scp" to burnupi40.front.sepia.ceph.com:/home/ubuntu/3647.vm.tgz
Updated by Joao Eduardo Luis over 11 years ago
Pat, just a little triage before I dive into this full head on, could you please try the following for each monitor?
ceph -m ip:port -s
If this happens to fail for all three monitors, it means that we probably have a real issue; if it works, as I expect it to, for at least mon.a, then it may very well be a misconfiguration issue.
Just a bit of background: it is possible for 'ceph health' to work but not 'ceph -s'. The ceph tool's monclient will communicate with a random monitor. So if you have at least one misconfigured monitor, it is possible that you will try to contact that one and then things my go sideways.
Updated by Anonymous over 11 years ago
ubuntu@ceph3:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:11.535510 7f6310954780 -1 ceph_tool_common_init failed.
ubuntu@ceph2:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (servname not supported for ai_socktype)
unable to parse addrs in 'ip:port'
2012-12-19 02:39:08.889162 7f811cc06780 -1 ceph_tool_common_init failed.
ubuntu@ceph1:/etc/ceph$ ceph -m ip:port -s
server name not found: ip (Success)
2012-12-19 02:44:20.759057 7ffc47a50780 -1 ceph_tool_common_init failed.
Updated by Joao Eduardo Luis over 11 years ago
err... that should have been each monitor's ip and port.
as in
ceph -m 10.0.0.1:6789 -s
or whatever ip and ports the monitors are on.
Updated by Sage Weil over 11 years ago
- Status changed from New to Can't reproduce
Updated by stephane beuret almost 8 years ago
I see this when I perform ceph s :/1260326585 >> 192.168.100.151:6789/0 pipe(0x6c405b30 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x6c400ce8).fault
2016-07-22 19:08:42.844997 6c700470 0 -