Bug #540
closedCephxClientHandler::handle_response
0%
Description
Saw this crash today after upgrading to the latest unstable:
Core was generated by `/usr/bin/cosd -i 5 -c /etc/ceph/ceph.conf'. Program terminated with signal 6, Aborted. #0 0x00007f9bd0120a75 in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007f9bd0120a75 in raise () from /lib/libc.so.6 #1 0x00007f9bd0124545 in abort () from /lib/libc.so.6 #2 0x00007f9bd09d68e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #3 0x00007f9bd09d4d16 in ?? () from /usr/lib/libstdc++.so.6 #4 0x00007f9bd09d4d43 in std::terminate() () from /usr/lib/libstdc++.so.6 #5 0x00007f9bd09d4e3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #6 0x00000000005c8a98 in ceph::__ceph_assert_fail (assertion=0x60b537 "0", file=<value optimized out>, line=162, func=<value optimized out>) at common/assert.cc:30 #7 0x00000000005e432c in CephxClientHandler::handle_response (this=<value optimized out>, ret=-792682944, indata=...) at auth/cephx/CephxClientHandler.cc:162 #8 0x00000000005ce27b in MonClient::handle_auth (this=0x7ffff01d6420, m=0x2255400) at mon/MonClient.cc:375 #9 0x00000000005d0733 in MonClient::ms_dispatch (this=0x7ffff01d6420, m=0x2255400) at mon/MonClient.cc:227 #10 0x0000000000463b39 in Messenger::ms_deliver_dispatch (this=0x11b7000) at msg/Messenger.h:97 #11 SimpleMessenger::dispatch_entry (this=0x11b7000) at msg/SimpleMessenger.cc:342 #12 0x000000000045abec in SimpleMessenger::DispatchThread::entry (this=0x11b7488) at msg/SimpleMessenger.h:570 #13 0x000000000046f1ba in Thread::_entry_func (arg=0x6ec2) at ./common/Thread.h:39 #14 0x00007f9bd121b9ca in start_thread () from /lib/libpthread.so.0 #15 0x00007f9bd01d370d in clone () from /lib/libc.so.6 #16 0x0000000000000000 in ?? ()
Last loglines:
2010-11-03 10:09:22.780031 7f9bc348f710 -- [2001:16f8:10:2::c3c3:3b6a]:6801/28353 >> [2001:16f8:10:2::c3c3:ab76]:6801/14744 pipe(0x26d7780 sd=12 pgs=167 cs=2 l=0).connect claims to be [2001:16f8:10:2::c3c3:ab76]:6801/17673 not [2001:16f8:10:2::c3c3:ab76]:6801/14744 - wrong node! 2010-11-03 10:09:23.553928 7f9bc3f9a710 -- [2001:16f8:10:2::c3c3:3b6a]:6801/28353 >> [2001:16f8:10:2::c3c3:2e56]:6801/21758 pipe(0x26a4a00 sd=12 pgs=201 cs=2 l=0).connect claims to be [2001:16f8:10:2::c3c3:2e56]:6801/3071 not [2001:16f8:10:2::c3c3:2e56]:6801/21758 - wrong node! 2010-11-03 10:09:26.016180 7f9bc94b0710 cephx client: handle_response ret = 0 2010-11-03 10:09:26.016245 7f9bc94b0710 cephx client: got initial server challenge 15029606718942791501 2010-11-03 10:09:26.016260 7f9bc94b0710 cephx client: build_request 2010-11-03 10:09:26.016270 7f9bc94b0710 cephx client: validate_tickets: want=37 need=32 have=5 2010-11-03 10:09:26.016287 7f9bc94b0710 cephx: validate_tickets want 37 have 5 need 32 2010-11-03 10:09:26.016298 7f9bc94b0710 cephx client: want=37 need=32 have=5 2010-11-03 10:09:26.016373 7f9bc94b0710 cephx client: old ticket len=96 2010-11-03 10:09:26.016386 7f9bc94b0710 cephx client: get auth session key: client_challenge 5687951232298511269 2010-11-03 10:09:26.998599 7f9bc7cad710 -- [2001:16f8:10:2::c3c3:3b6a]:6800/28353 >> [2001:16f8:10:2::c3c3:3f9b]:6789/0 pipe(0x369e000 sd=-1 pgs=0 cs=0 l=0).fault first fault 2010-11-03 10:09:35.966178 7f9bc94b0710 cephx client: handle_response ret = 0 2010-11-03 10:09:35.966245 7f9bc94b0710 cephx client: got initial server challenge 282574488338433 2010-11-03 10:09:35.966259 7f9bc94b0710 cephx client: build_request 2010-11-03 10:09:35.966268 7f9bc94b0710 cephx client: validate_tickets: want=37 need=32 have=5 2010-11-03 10:09:35.966286 7f9bc94b0710 cephx: validate_tickets want 37 have 5 need 32 2010-11-03 10:09:35.966297 7f9bc94b0710 cephx client: want=37 need=32 have=5 2010-11-03 10:09:35.966370 7f9bc94b0710 cephx client: old ticket len=96 2010-11-03 10:09:35.966382 7f9bc94b0710 cephx client: get auth session key: client_challenge 10140645574015473838 2010-11-03 10:09:35.966405 7f9bc94b0710 cephx client: handle_response ret = 0 2010-11-03 10:09:35.966416 7f9bc94b0710 cephx client: unknown request_type 27649 auth/cephx/CephxClientHandler.cc: In function 'virtual int CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)': auth/cephx/CephxClientHandler.cc:162: FAILED assert(0) ceph version 0.23~rc (commit:e304a2451a80f11117cb01031734e72077c88ce0) 1: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x5ce27b] 2: (MonClient::ms_dispatch(Message*)+0x143) [0x5d0733] 3: (SimpleMessenger::dispatch_entry()+0x749) [0x463b39] 4: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45abec] 5: (Thread::_entry_func(void*)+0xa) [0x46f1ba] 6: (()+0x69ca) [0x7f9bd121b9ca] 7: (clone()+0x6d) [0x7f9bd01d370d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (ABRT) *** ceph version 0.23~rc (commit:e304a2451a80f11117cb01031734e72077c88ce0) 1: (sigabrt_handler(int)+0x7d) [0x5db9fd] 2: (()+0x33af0) [0x7f9bd0120af0] 3: (gsignal()+0x35) [0x7f9bd0120a75] 4: (abort()+0x180) [0x7f9bd01245c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f9bd09d68e5] 6: (()+0xcad16) [0x7f9bd09d4d16] 7: (()+0xcad43) [0x7f9bd09d4d43] 8: (()+0xcae3e) [0x7f9bd09d4e3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x448) [0x5c8a98] 10: (CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)+0x13c) [0x5e432c] 11: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x5ce27b] 12: (MonClient::ms_dispatch(Message*)+0x143) [0x5d0733] 13: (SimpleMessenger::dispatch_entry()+0x749) [0x463b39] 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45abec] 15: (Thread::_entry_func(void*)+0xa) [0x46f1ba] 16: (()+0x69ca) [0x7f9bd121b9ca] 17: (clone()+0x6d) [0x7f9bd01d370d]
I've used cdebugpack to gather all the relevant information, it has been uploaded to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_cephx_client_handler
Restarted the OSD, it didn't crash again afterwards.
Updated by Sage Weil over 13 years ago
- Category changed from OSD to cephx
- Target version set to v0.23.1
Wido just saw this:
2010-11-11 14:28:32.693427 7fb2adcfd710 cephx client: unknown request_type 11521 auth/cephx/CephxClientHandler.cc: In function 'virtual int CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)': auth/cephx/CephxClientHandler.cc:162: FAILED assert(0) ceph version 0.24~rc (commit:3d6e915583ac95263732982462b2e69bb2411db7) 1: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x4e629b] 2: (MonClient::ms_dispatch(Message*)+0x143) [0x4e8e03] 3: (SimpleMessenger::dispatch_entry()+0x749) [0x46c599] 4: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45779c] 5: (Thread::_entry_func(void*)+0xa) [0x477bfa] 6: (()+0x69ca) [0x7fb2b6ea79ca] 7: (clone()+0x6d) [0x7fb2b21b870d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. auth/cephx/CephxClientHandler.cc: In function 'virtual int CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)': auth/cephx/CephxClientHandler.cc:162: FAILED assert(0) ceph version 0.24~rc (commit:3d6e915583ac95263732982462b2e69bb2411db7) 1: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x4e629b] 2: (MonClient::ms_dispatch(Message*)+0x143) [0x4e8e03] 3: (SimpleMessenger::dispatch_entry()+0x749) [0x46c599] 4: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45779c] 5: (Thread::_entry_func(void*)+0xa) [0x477bfa] 6: (()+0x69ca) [0x7fb2b6ea79ca] 7: (clone()+0x6d) [0x7fb2b21b870d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' *** Caught signal (ABRT) *** ceph version 0.24~rc (commit:3d6e915583ac95263732982462b2e69bb2411db7) 1: (sigabrt_handler(int)+0x7d) [0x4f95ed] 2: (()+0x33af0) [0x7fb2b2105af0] 3: (gsignal()+0x35) [0x7fb2b2105a75] 4: (abort()+0x180) [0x7fb2b21095c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fb2b29bb8e5] 6: (()+0xcad16) [0x7fb2b29b9d16] 7: (()+0xcad43) [0x7fb2b29b9d43] 8: (()+0xcae3e) [0x7fb2b29b9e3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x448) [0x4e0d28] 10: (CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)+0x13c) [0x502d3c] 11: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x4e629b] 12: (MonClient::ms_dispatch(Message*)+0x143) [0x4e8e03] 13: (SimpleMessenger::dispatch_entry()+0x749) [0x46c599] 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45779c] 15: (Thread::_entry_func(void*)+0xa) [0x477bfa] 16: (()+0x69ca) [0x7fb2b6ea79ca] 17: (clone()+0x6d) [0x7fb2b21b870d] Aborted
Updated by Wido den Hollander over 13 years ago
I saw that on a test machine of mine. The 'ceph -w' command was hanging for about 10 seconds and then exited with this message.
The cluster had:- 1 Monitor
- 1 MDS
- 3 OSD
All on the same machine.
During this message, I was importing a 4GB RBD image via "rbd import"
Updated by Wido den Hollander over 13 years ago
Just saw it again on the same cluster, this time osd2 crashed when upgrading to this morning's unstable:
2010-11-12 12:18:22.715982 --- 17142 opened log /var/log/ceph/osd.2.log --- ceph version 0.24~rc (commit:ce6d63947fc9dc479376184e50683bdd9dd8cf81) 2010-11-12 12:18:23.815544 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount FIEMAP ioctl is NOT supported 2010-11-12 12:18:23.815763 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount detected btrfs 2010-11-12 12:18:23.815781 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs CLONE_RANGE ioctl is supported 2010-11-12 12:18:24.397478 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs SNAP_CREATE is supported 2010-11-12 12:18:24.497010 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs SNAP_DESTROY is supported 2010-11-12 12:18:24.497488 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs START_SYNC got 0 Success 2010-11-12 12:18:24.497505 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs START_SYNC is supported (transid 6438) 2010-11-12 12:18:26.408953 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs WAIT_SYNC is supported 2010-11-12 12:18:26.441939 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs SNAP_CREATE_ASYNC got 0 Success 2010-11-12 12:18:26.442057 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount btrfs SNAP_CREATE_ASYNC is supported 2010-11-12 12:18:26.497175 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount found snaps <7384,56732,56737> 2010-11-12 12:18:28.320230 7f68288d8720 journal read_entry 77537280 : seq 56733 5222 bytes 2010-11-12 12:18:28.388666 7f68288d8720 journal read_entry 77549568 : seq 56734 5222 bytes 2010-11-12 12:18:28.389460 7f68288d8720 journal read_entry 77561856 : seq 56735 21606 bytes 2010-11-12 12:18:28.389626 7f68288d8720 journal read_entry 77590528 : seq 56736 5222 bytes 2010-11-12 12:18:28.389645 7f68288d8720 journal read_entry 77602816 : seq 56737 5222 bytes 2010-11-12 12:18:28.624392 7f68288d8720 filestore(/var/lib/ceph/osd.2) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_ASYNC detected and 'filestore btrfs snap' mode is enabled 2010-11-12 12:18:57.550419 7f681fa06710 osd2 62 handle_osd_map fsid 12d34577-3824-fa22-1cd5-887be3a7dbd4 != 00000000-0000-0000-0000-000000000000 2010-11-12 12:18:57.550890 7f681fa06710 cephx client: unknown request_type 33537 auth/cephx/CephxClientHandler.cc: In function 'virtual int CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)': auth/cephx/CephxClientHandler.cc:162: FAILED assert(0) ceph version 0.24~rc (commit:ce6d63947fc9dc479376184e50683bdd9dd8cf81) 1: (CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)+0x13c) [0x5e604c] 2: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x5d064b] 3: (MonClient::ms_dispatch(Message*)+0x143) [0x5d31b3] 4: (SimpleMessenger::dispatch_entry()+0x749) [0x462a19] 5: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45a68c] 6: (Thread::_entry_func(void*)+0xa) [0x46e09a] 7: (()+0x69ca) [0x7f68282c59ca] 8: (clone()+0x6d) [0x7f682702b70d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (ABRT) *** ceph version 0.24~rc (commit:ce6d63947fc9dc479376184e50683bdd9dd8cf81) 1: (sigabrt_handler(int)+0x7d) [0x5dd71d] 2: (()+0x33af0) [0x7f6826f78af0] 3: (gsignal()+0x35) [0x7f6826f78a75] 4: (abort()+0x180) [0x7f6826f7c5c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f682782e8e5] 6: (()+0xcad16) [0x7f682782cd16] 7: (()+0xcad43) [0x7f682782cd43] 8: (()+0xcae3e) [0x7f682782ce3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x448) [0x5ca5d8] 10: (CephxClientHandler::handle_response(int, ceph::buffer::list::iterator&)+0x13c) [0x5e604c] 11: (MonClient::handle_auth(MAuthReply*)+0x7b) [0x5d064b] 12: (MonClient::ms_dispatch(Message*)+0x143) [0x5d31b3] 13: (SimpleMessenger::dispatch_entry()+0x749) [0x462a19] 14: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x45a68c] 15: (Thread::_entry_func(void*)+0xa) [0x46e09a] 16: (()+0x69ca) [0x7f68282c59ca] 17: (clone()+0x6d) [0x7f682702b70d]
Updated by Sage Weil over 13 years ago
- Status changed from New to Resolved
couldn't reproduce this, but fixed two smallish things that may have been responsible for this:
61dd4f03e6e15d59951ff69d95807a3026f5b9c4 and c327c6a2064f3a14f0221b53dc0c87d744d0e977
Please let us know if it pops up again!