Project

General

Profile

Actions

Bug #3590

closed

mds segfault at MonClient::wait_auth_rotating during upgrade

Added by Tamilarasi muthamizhan over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cephx
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Upgrade from ceph argonaut v0.48.2 to v0.55

Test steps:

1. Running a local cluster burnupi06[mds], burnupi07[osds], burnupi08[mon] on argonaut 0.48.2 [ceph.conf has no mention about ceph auth, so by default it was cephx disabled]

2. upgrading the mds node to v0.55 and restarting mds daemon segfaults at MonClient::wait_auth_rotating

ceph version 0.55-159-gefc6614 (efc66148837d614dc9489e3675aaa05bb2d9b9c2)
1: /usr/bin/ceph-mds() [0x815a9a]
2: (()+0xfcb0) [0x7fac57cf1cb0]
3: (MonClient::wait_auth_rotating(double)+0xb6) [0x73b066]
4: (MDS::init(int)+0x1433) [0x4ca033]
5: (main()+0xfed) [0x4a9c4d]
6: (__libc_start_main()+0xed) [0x7fac563cd76d]
7: /usr/bin/ceph-mds() [0x4abda9]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ceph.conf

[osd]
osd journal size = 1000
filestore_xattr_use_omap = true

[osd.0]
host = burnupi07

[osd.1]
host = burnupi07

[mon.a]
host = burnupi08
mon addr = 10.214.134.36:6789

[mds.a]
host = burnupi06

3. Tried to set ceph auth by including the line "auth client required = cephx,none" on mds node and restarted mds but segfaults again.

However, continued upgrading osd and mon nodes to v0.55 with "auth client required = cephx,none" set in ceph.conf and it is working fine

Actions #1

Updated by Tamilarasi muthamizhan over 11 years ago

I restarted the running cluster after removing the entry "auth client required=cephx,none" in ceph.conf and it is still working fine.

So, the issue is not with the auth setting in ceph.conf , it is about the order in which the nodes have to be upgraded.

Actions #2

Updated by Greg Farnum over 11 years ago

Can you reproduce this with debug ms = 10, debug auth = 10, debug mds = 10 on?

Actions #3

Updated by Sage Weil over 11 years ago

Anything new here?

Actions #4

Updated by Anonymous over 11 years ago

Not from my end. I'm not working at Inktank, today. I'll be in tomorrow and I hope to make progress on it then.

Actions #5

Updated by Tamilarasi muthamizhan over 11 years ago

I am planning to reproduce it with debug on and show it to Greg

Actions #6

Updated by Yehuda Sadeh over 11 years ago

  • Assignee changed from Anonymous to Yehuda Sadeh
Actions #7

Updated by Tamilarasi muthamizhan over 11 years ago

I was able to reproduce this issue on burnupi06[running mds], burnupi07[osds], burnupi08[mon] cluster with debugs on.

Leaving the cluster nodes as it is for anyone who wants to take a look at it.

pasting below the mds log:

ubuntu@burnupi06:/etc/ceph$ tail -f /var/log/ceph/ceph-mds.a.log 
2012-12-10 19:42:37.986712 7f070b5ab700 10 -- 10.214.133.8:6800/54651 reaper done
2012-12-10 19:42:37.986722 7f070b5ab700 10 -- 10.214.133.8:6800/54651 reaper_entry done
2012-12-10 19:42:37.986745 7f070f52b780 10 -- 10.214.133.8:6800/54651 wait: closing pipes
2012-12-10 19:42:37.986752 7f070f52b780 10 -- 10.214.133.8:6800/54651 reaper
2012-12-10 19:42:37.986756 7f070f52b780 10 -- 10.214.133.8:6800/54651 reaper done
2012-12-10 19:42:37.986757 7f070f52b780 10 -- 10.214.133.8:6800/54651 wait: waiting for pipes  to close
2012-12-10 19:42:37.986760 7f070f52b780 10 -- 10.214.133.8:6800/54651 >> :/0 pipe(0x31a5000 sd=-1 pgs=0 cs=0 l=0).discard_queue
2012-12-10 19:42:37.986766 7f070f52b780 10 -- 10.214.133.8:6800/54651 wait: done.
2012-12-10 19:42:37.986768 7f070f52b780  1 -- 10.214.133.8:6800/54651 shutdown complete.
2012-12-10 19:42:37.986809 7f070f52b780  0 stopped.

2012-12-10 19:45:48.224683 7f66ce07f780  1 -- 0.0.0.0:6800/55867 messenger.start
2012-12-10 19:45:48.224722 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper_entry start
2012-12-10 19:45:48.224741 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper
2012-12-10 19:45:48.224745 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper done
2012-12-10 19:45:48.224804 7f66ce07f780  5 adding auth protocol: cephx
2012-12-10 19:45:48.224810 7f66ce07f780  5 adding auth protocol: cephx
2012-12-10 19:45:48.224870 7f66ce07f780 10 mds.-1.0 168    MDSCacheObject
2012-12-10 19:45:48.224885 7f66ce07f780 10 mds.-1.0 2144    CInode
2012-12-10 19:45:48.224886 7f66ce07f780 10 mds.-1.0 16     elist<>::item   *7=112
2012-12-10 19:45:48.224887 7f66ce07f780 10 mds.-1.0 360     inode_t 
2012-12-10 19:45:48.224889 7f66ce07f780 10 mds.-1.0 56      nest_info_t 
2012-12-10 19:45:48.224890 7f66ce07f780 10 mds.-1.0 32      frag_info_t 
2012-12-10 19:45:48.224895 7f66ce07f780 10 mds.-1.0 40     SimpleLock   *5=200
2012-12-10 19:45:48.224896 7f66ce07f780 10 mds.-1.0 48     ScatterLock  *3=144
2012-12-10 19:45:48.224897 7f66ce07f780 10 mds.-1.0 472    CDentry
2012-12-10 19:45:48.224898 7f66ce07f780 10 mds.-1.0 16     elist<>::item
2012-12-10 19:45:48.224899 7f66ce07f780 10 mds.-1.0 40     SimpleLock
2012-12-10 19:45:48.224900 7f66ce07f780 10 mds.-1.0 1016    CDir 
2012-12-10 19:45:48.224901 7f66ce07f780 10 mds.-1.0 16     elist<>::item   *2=32
2012-12-10 19:45:48.224902 7f66ce07f780 10 mds.-1.0 192     fnode_t 
2012-12-10 19:45:48.224903 7f66ce07f780 10 mds.-1.0 56      nest_info_t *2
2012-12-10 19:45:48.224904 7f66ce07f780 10 mds.-1.0 32      frag_info_t *2
2012-12-10 19:45:48.224906 7f66ce07f780 10 mds.-1.0 168    Capability 
2012-12-10 19:45:48.224906 7f66ce07f780 10 mds.-1.0 32     xlist<>::item   *2=64
2012-12-10 19:45:48.224909 7f66ce07f780 10 -- 0.0.0.0:6800/55867 ready 0.0.0.0:6800/55867
2012-12-10 19:45:48.224936 7f66ce07f780  1 accepter.accepter.start
2012-12-10 19:45:48.224962 7f66c90fd700 10 accepter.accepter starting
2012-12-10 19:45:48.225133 7f66ce07f780  2 auth: KeyRing::load: loaded key file /var/lib/ceph/mds/ceph-a/keyring
2012-12-10 19:45:48.225188 7f66ce07f780  5 adding auth protocol: cephx
2012-12-10 19:45:48.225207 7f66ce07f780 10 -- 0.0.0.0:6800/55867 connect_rank to 10.214.134.36:6789/0, creating pipe and registering
2012-12-10 19:45:48.225239 7f66ce07f780 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).register_pipe
2012-12-10 19:45:48.225256 7f66ce07f780 10 -- 0.0.0.0:6800/55867 get_connection mon.0 10.214.134.36:6789/0 new 0x2e806c0
2012-12-10 19:45:48.225245 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).writer: state = connecting policy.server=0
2012-12-10 19:45:48.225266 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).connect 0
2012-12-10 19:45:48.225282 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :0 pgs=0 cs=0 l=1).connecting to 10.214.134.36:6789/0
2012-12-10 19:45:48.225307 7f66ce07f780  1 -- 0.0.0.0:6800/55867 --> 10.214.134.36:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x2e87400 con 0x2e95580
2012-12-10 19:45:48.225699 7f66ce07b700  1 -- 10.214.133.8:6800/55867 learned my addr 10.214.133.8:6800/55867
2012-12-10 19:45:48.225720 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=0 cs=0 l=1).connect sent my addr 10.214.133.8:6800/55867
2012-12-10 19:45:48.225731 7f66ce07b700 10 mds.-1.0 MDS::ms_get_authorizer type=mon
2012-12-10 19:45:48.225735 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=0 cs=0 l=1).connect sending gseq=1 cseq=0 proto=15
2012-12-10 19:45:48.225919 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).connect success 1, lossy = 1, features 327679
2012-12-10 19:45:48.225978 7f66c98fe700  0 mds.-1.0 ms_handle_connect on 10.214.134.36:6789/0
2012-12-10 19:45:48.225998 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
2012-12-10 19:45:48.226032 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
2012-12-10 19:45:48.226308 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).reader wants 24 from dispatch throttler 0/104857600
2012-12-10 19:45:48.226329 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).aborted = 0
2012-12-10 19:45:48.226345 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).No session security set
2012-12-10 19:45:48.226353 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).reader got message 1 0x2e87600 auth_reply(proto 0 -95 Operation not supported) v1
2012-12-10 19:45:48.226381 7f66c98fe700  1 -- 10.214.133.8:6800/55867 <== mon.0 10.214.134.36:6789/0 1 ==== auth_reply(proto 0 -95 Operation not supported) v1 ==== 24+0+0 (3632875112 0 0) 0x2e87600 con 0x2e95580
2012-12-10 19:45:48.226385 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
2012-12-10 19:45:48.226402 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).write_ack 1
2012-12-10 19:45:48.226407 7f66c98fe700 10 -- 10.214.133.8:6800/55867 dispatch_throttle_release 24 to dispatch throttler 24/104857600
2012-12-10 19:45:48.226415 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
2012-12-10 19:45:48.226970 7f66ce07f780 -1 *** Caught signal (Segmentation fault) **
 in thread 7f66ce07f780

 ceph version 0.55-192-ge4d0aea (e4d0aeace187afee6313d803511ba595e91dd7ed)
 1: /usr/bin/ceph-mds() [0x816dea]
 2: (()+0xfcb0) [0x7f66cdc62cb0]
 3: (MonClient::wait_auth_rotating(double)+0xb6) [0x73c216]
 4: (MDS::init(int)+0x1433) [0x4c9fa3]
 5: (main()+0xfed) [0x4a9b9d]
 6: (__libc_start_main()+0xed) [0x7f66cc33e76d]
 7: /usr/bin/ceph-mds() [0x4abcf9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -87> 2012-12-10 19:45:48.222426 7f66ce07f780  5 asok(0x2e81000) register_command perfcounters_dump hook 0x2e76010
   -86> 2012-12-10 19:45:48.222449 7f66ce07f780  5 asok(0x2e81000) register_command 1 hook 0x2e76010
   -85> 2012-12-10 19:45:48.222453 7f66ce07f780  5 asok(0x2e81000) register_command perf dump hook 0x2e76010
   -84> 2012-12-10 19:45:48.222458 7f66ce07f780  5 asok(0x2e81000) register_command perfcounters_schema hook 0x2e76010
   -83> 2012-12-10 19:45:48.222462 7f66ce07f780  5 asok(0x2e81000) register_command 2 hook 0x2e76010
   -82> 2012-12-10 19:45:48.222464 7f66ce07f780  5 asok(0x2e81000) register_command perf schema hook 0x2e76010
   -81> 2012-12-10 19:45:48.222468 7f66ce07f780  5 asok(0x2e81000) register_command config show hook 0x2e76010
   -80> 2012-12-10 19:45:48.222472 7f66ce07f780  5 asok(0x2e81000) register_command config set hook 0x2e76010
   -79> 2012-12-10 19:45:48.222475 7f66ce07f780  5 asok(0x2e81000) register_command log flush hook 0x2e76010
   -78> 2012-12-10 19:45:48.222479 7f66ce07f780  5 asok(0x2e81000) register_command log dump hook 0x2e76010
   -77> 2012-12-10 19:45:48.222483 7f66ce07f780  5 asok(0x2e81000) register_command log reopen hook 0x2e76010
   -76> 2012-12-10 19:45:48.223534 7f66ce07f780  0 ceph version 0.55-192-ge4d0aea (e4d0aeace187afee6313d803511ba595e91dd7ed), process ceph-mds, pid 55867
   -75> 2012-12-10 19:45:48.223914 7f66ce07f780 10 -- :/0 rank.bind :/0
   -74> 2012-12-10 19:45:48.223921 7f66ce07f780 10 accepter.accepter.bind
   -73> 2012-12-10 19:45:48.223935 7f66ce07f780 10 accepter.accepter.bind bound on random port 0.0.0.0:6800/0
   -72> 2012-12-10 19:45:48.223945 7f66ce07f780 10 accepter.accepter.bind bound to 0.0.0.0:6800/0
   -71> 2012-12-10 19:45:48.223955 7f66ce07f780  1 accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/55867 need_addr=1
   -70> 2012-12-10 19:45:48.224368 7f66ce07f780  1 finished global_init_daemonize
   -69> 2012-12-10 19:45:48.224524 7f66ce07f780  5 asok(0x2e81000) init /var/run/ceph/ceph-mds.a.asok
   -68> 2012-12-10 19:45:48.224540 7f66ce07f780  5 asok(0x2e81000) bind_and_listen /var/run/ceph/ceph-mds.a.asok
   -67> 2012-12-10 19:45:48.224572 7f66ce07f780  5 asok(0x2e81000) register_command 0 hook 0x2e75038
   -66> 2012-12-10 19:45:48.224579 7f66ce07f780  5 asok(0x2e81000) register_command version hook 0x2e75038
   -65> 2012-12-10 19:45:48.224585 7f66ce07f780  5 asok(0x2e81000) register_command git_version hook 0x2e75038
   -64> 2012-12-10 19:45:48.224590 7f66ce07f780  5 asok(0x2e81000) register_command help hook 0x2e76050
   -63> 2012-12-10 19:45:48.224631 7f66ce07f780 10 monclient(hunting): build_initial_monmap
   -62> 2012-12-10 19:45:48.224620 7f66cb101700  5 asok(0x2e81000) entry start
   -61> 2012-12-10 19:45:48.224683 7f66ce07f780  1 -- 0.0.0.0:6800/55867 messenger.start
   -60> 2012-12-10 19:45:48.224722 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper_entry start
   -59> 2012-12-10 19:45:48.224741 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper
   -58> 2012-12-10 19:45:48.224745 7f66ca900700 10 -- 0.0.0.0:6800/55867 reaper done
   -57> 2012-12-10 19:45:48.224804 7f66ce07f780  5 adding auth protocol: cephx
   -56> 2012-12-10 19:45:48.224810 7f66ce07f780  5 adding auth protocol: cephx
   -55> 2012-12-10 19:45:48.224870 7f66ce07f780 10 mds.-1.0 168    MDSCacheObject
   -54> 2012-12-10 19:45:48.224885 7f66ce07f780 10 mds.-1.0 2144    CInode
   -53> 2012-12-10 19:45:48.224886 7f66ce07f780 10 mds.-1.0 16     elist<>::item   *7=112
   -52> 2012-12-10 19:45:48.224887 7f66ce07f780 10 mds.-1.0 360     inode_t 
   -51> 2012-12-10 19:45:48.224889 7f66ce07f780 10 mds.-1.0 56      nest_info_t 
   -50> 2012-12-10 19:45:48.224890 7f66ce07f780 10 mds.-1.0 32      frag_info_t 
   -49> 2012-12-10 19:45:48.224895 7f66ce07f780 10 mds.-1.0 40     SimpleLock   *5=200
   -48> 2012-12-10 19:45:48.224896 7f66ce07f780 10 mds.-1.0 48     ScatterLock  *3=144
   -47> 2012-12-10 19:45:48.224897 7f66ce07f780 10 mds.-1.0 472    CDentry
   -46> 2012-12-10 19:45:48.224898 7f66ce07f780 10 mds.-1.0 16     elist<>::item
   -45> 2012-12-10 19:45:48.224899 7f66ce07f780 10 mds.-1.0 40     SimpleLock
   -44> 2012-12-10 19:45:48.224900 7f66ce07f780 10 mds.-1.0 1016    CDir 
   -43> 2012-12-10 19:45:48.224901 7f66ce07f780 10 mds.-1.0 16     elist<>::item   *2=32
   -42> 2012-12-10 19:45:48.224902 7f66ce07f780 10 mds.-1.0 192     fnode_t 
   -41> 2012-12-10 19:45:48.224903 7f66ce07f780 10 mds.-1.0 56      nest_info_t *2
   -40> 2012-12-10 19:45:48.224904 7f66ce07f780 10 mds.-1.0 32      frag_info_t *2
   -39> 2012-12-10 19:45:48.224906 7f66ce07f780 10 mds.-1.0 168    Capability 
   -38> 2012-12-10 19:45:48.224906 7f66ce07f780 10 mds.-1.0 32     xlist<>::item   *2=64
   -37> 2012-12-10 19:45:48.224909 7f66ce07f780 10 -- 0.0.0.0:6800/55867 ready 0.0.0.0:6800/55867
   -36> 2012-12-10 19:45:48.224936 7f66ce07f780  1 accepter.accepter.start
   -35> 2012-12-10 19:45:48.224959 7f66ce07f780 10 monclient(hunting): init
   -34> 2012-12-10 19:45:48.224962 7f66c90fd700 10 accepter.accepter starting
   -33> 2012-12-10 19:45:48.225133 7f66ce07f780  2 auth: KeyRing::load: loaded key file /var/lib/ceph/mds/ceph-a/keyring
   -32> 2012-12-10 19:45:48.225188 7f66ce07f780  5 adding auth protocol: cephx
   -31> 2012-12-10 19:45:48.225192 7f66ce07f780 10 monclient(hunting): auth_supported 2 method cephx
   -30> 2012-12-10 19:45:48.225201 7f66ce07f780 10 monclient(hunting): _reopen_session
   -29> 2012-12-10 19:45:48.225207 7f66ce07f780 10 -- 0.0.0.0:6800/55867 connect_rank to 10.214.134.36:6789/0, creating pipe and registering
   -28> 2012-12-10 19:45:48.225239 7f66ce07f780 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).register_pipe
   -27> 2012-12-10 19:45:48.225256 7f66ce07f780 10 -- 0.0.0.0:6800/55867 get_connection mon.0 10.214.134.36:6789/0 new 0x2e806c0
   -26> 2012-12-10 19:45:48.225245 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).writer: state = connecting policy.server=0
   -25> 2012-12-10 19:45:48.225266 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=-1 :0 pgs=0 cs=0 l=1).connect 0
   -24> 2012-12-10 19:45:48.225282 7f66ce07b700 10 -- 0.0.0.0:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :0 pgs=0 cs=0 l=1).connecting to 10.214.134.36:6789/0
   -23> 2012-12-10 19:45:48.225281 7f66ce07f780 10 monclient(hunting): _pick_new_mon picked mon.a con 0x2e95580 addr 10.214.134.36:6789/0
   -22> 2012-12-10 19:45:48.225302 7f66ce07f780 10 monclient(hunting): _send_mon_message to mon.a at 10.214.134.36:6789/0
   -21> 2012-12-10 19:45:48.225307 7f66ce07f780  1 -- 0.0.0.0:6800/55867 --> 10.214.134.36:6789/0 -- auth(proto 0 26 bytes epoch 0) v1 -- ?+0 0x2e87400 con 0x2e95580
   -20> 2012-12-10 19:45:48.225320 7f66ce07f780 10 monclient(hunting): renew_subs
   -19> 2012-12-10 19:45:48.225699 7f66ce07b700  1 -- 10.214.133.8:6800/55867 learned my addr 10.214.133.8:6800/55867
   -18> 2012-12-10 19:45:48.225720 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=0 cs=0 l=1).connect sent my addr 10.214.133.8:6800/55867
   -17> 2012-12-10 19:45:48.225731 7f66ce07b700 10 mds.-1.0 MDS::ms_get_authorizer type=mon
   -16> 2012-12-10 19:45:48.225735 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=0 cs=0 l=1).connect sending gseq=1 cseq=0 proto=15
   -15> 2012-12-10 19:45:48.225919 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).connect success 1, lossy = 1, features 327679
   -14> 2012-12-10 19:45:48.225978 7f66c98fe700  0 mds.-1.0 ms_handle_connect on 10.214.134.36:6789/0
   -13> 2012-12-10 19:45:48.225998 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
   -12> 2012-12-10 19:45:48.226032 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
   -11> 2012-12-10 19:45:48.226308 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).reader wants 24 from dispatch throttler 0/104857600
   -10> 2012-12-10 19:45:48.226329 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).aborted = 0
    -9> 2012-12-10 19:45:48.226345 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).No session security set
    -8> 2012-12-10 19:45:48.226353 7f66c78fa700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).reader got message 1 0x2e87600 auth_reply(proto 0 -95 Operation not supported) v1
    -7> 2012-12-10 19:45:48.226381 7f66c98fe700  1 -- 10.214.133.8:6800/55867 <== mon.0 10.214.134.36:6789/0 1 ==== auth_reply(proto 0 -95 Operation not supported) v1 ==== 24+0+0 (3632875112 0 0) 0x2e87600 con 0x2e95580
    -6> 2012-12-10 19:45:48.226385 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
    -5> 2012-12-10 19:45:48.226401 7f66c98fe700 10 monclient(hunting): no handler for protocol 0
    -4> 2012-12-10 19:45:48.226403 7f66c98fe700 10 monclient(hunting): none of our auth protocols are supported by the server
    -3> 2012-12-10 19:45:48.226402 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).write_ack 1
    -2> 2012-12-10 19:45:48.226407 7f66c98fe700 10 -- 10.214.133.8:6800/55867 dispatch_throttle_release 24 to dispatch throttler 24/104857600
    -1> 2012-12-10 19:45:48.226415 7f66ce07b700 10 -- 10.214.133.8:6800/55867 >> 10.214.134.36:6789/0 pipe(0x2e806c0 sd=16 :57301 pgs=7 cs=1 l=1).writer: state = open policy.server=0
     0> 2012-12-10 19:45:48.226970 7f66ce07f780 -1 *** Caught signal (Segmentation fault) **
 in thread 7f66ce07f780

 ceph version 0.55-192-ge4d0aea (e4d0aeace187afee6313d803511ba595e91dd7ed)
 1: /usr/bin/ceph-mds() [0x816dea]
 2: (()+0xfcb0) [0x7f66cdc62cb0]
 3: (MonClient::wait_auth_rotating(double)+0xb6) [0x73c216]
 4: (MDS::init(int)+0x1433) [0x4c9fa3]
 5: (main()+0xfed) [0x4a9b9d]
 6: (__libc_start_main()+0xed) [0x7f66cc33e76d]
 7: /usr/bin/ceph-mds() [0x4abcf9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
  10/10 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
  10/10 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
  10/10 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent    100000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.a.log
--- end dump of recent events ---

ubuntu@burnupi06:~$ sudo service ceph start
=== mds.a === 
Starting Ceph mds.a on burnupi06...
starting mds.a at :/0
ubuntu@burnupi06:~$ sudo service ceph status
=== mds.a === 
mds.a: not running.

Actions #8

Updated by Yehuda Sadeh over 11 years ago

mon is running argonaut, mds is running 0.55+. The problem is that mds doesn't handle gracefully a case where mon rejects the connection due to auth protocol mismatch. This happens due to changes in the default selected auth protocol.

Actions #9

Updated by Yehuda Sadeh over 11 years ago

  • Status changed from New to Resolved

Fixed, 29307d3b32e523da7f25a6c3bc904749790b1e18. Fix would just make it exit gracefully.

Actions

Also available in: Atom PDF