Project

General

Profile

Bug #1633

osd crash in CryptoKey::decrypt

Added by Josh Durgin almost 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From teuthology:~/log/osd.7.log.gz:

2011-10-19 01:26:42.326634 7f979db56720 journal _open /tmp/cephtest/data/osd.7.journal fd 14: 104857600 bytes, block size 4096 bytes, directio = 1
2011-10-19 01:26:45.093995 7f978aa8c700 -- 10.3.14.199:6808/9536 >> 10.3.14.198:6801/4306 pipe(0x2d18c80 sd=25 pgs=0 cs=0 l=0).fault first fault
2011-10-19 01:26:45.095727 7f978a284700 -- 10.3.14.199:6808/9536 >> 10.3.14.202:6803/2894 pipe(0x2d05280 sd=31 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state 1
2011-10-19 01:26:45.624194 7f978a284700 data -> (7.0)
2011-10-19 01:26:45.624222 7f978a284700 rbd -> (7.0)
2011-10-19 01:26:48.218924 7f978ab8d700 -- 10.3.14.199:6808/9536 >> 10.3.14.201:6802/9417 pipe(0x2d18780 sd=24 pgs=0 cs=0 l=0).fault first fault
2011-10-19 01:26:48.218991 7f978a688700 -- 10.3.14.199:0/9536 >> 10.3.14.201:6805/9417 pipe(0x2fd4000 sd=33 pgs=0 cs=0 l=0).fault first fault
2011-10-19 01:26:49.619996 7f978ab8d700 -- 10.3.14.199:6808/9536 >> 10.3.14.201:6802/9417 pipe(0x2d18780 sd=24 pgs=0 cs=0 l=0).connect claims to be 10.3.14.201:6802/9511 not 10.3.14.201:6802/9417 - wrong node!
*** Caught signal (Aborted) **
 in thread 0x7f978a688700
 ceph version 0.36-327-g3e92aac (commit:3e92aace21ecc766f14ac5a2c6377570988f1a3b)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x67c7a4]
 2: (()+0xfb40) [0x7f979d736b40]
 3: (gsignal()+0x35) [0x7f979bf0bba5]
 4: (abort()+0x180) [0x7f979bf0f6b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f979c7af6bd]
 6: (()+0xb9906) [0x7f979c7ad906]
 7: (()+0xb9933) [0x7f979c7ad933]
 8: (()+0xba28f) [0x7f979c7ae28f]
 9: (CryptoKey::decrypt(ceph::buffer::list const&, ceph::buffer::list&, std::string&) const+0x5b) [0x67625b]
 10: (void decode_decrypt_enc_bl<CephXServiceTicketInfo>(CephXServiceTicketInfo&, CryptoKey, ceph::buffer::list&, std::string&)+0x4e) [0x77f54e]
 11: (cephx_verify_authorizer(CephContext*, KeyStore*, ceph::buffer::list::iterator&, CephXServiceTicketInfo&, ceph::buffer::list&)+0x3a3) [0x77a773]
 12: (CephxAuthorizeHandler::verify_authorizer(CephContext*, KeyStore*, ceph::buffer::list&, ceph::buffer::list&, EntityName&, unsigned long&, AuthCapsInfo&, unsigned long*)+0x3bf) [0x7836af]
 13: (OSD::ms_verify_authorizer(Connection*, int, int, ceph::buffer::list&, ceph::buffer::list&, bool&)+0xdf) [0x5508df]
 14: (SimpleMessenger::verify_authorizer(Connection*, int, int, ceph::buffer::list&, ceph::buffer::list&, bool&)+0x71) [0x617e01]
 15: (SimpleMessenger::Pipe::accept()+0x1f2b) [0x636abb]
 16: (SimpleMessenger::Pipe::reader()+0x17c1) [0x63a821]
 17: (SimpleMessenger::Pipe::Reader::entry()+0x15) [0x4a3f55]
 18: (Thread::_entry_func(void*)+0x12) [0x615372]
 19: (()+0x7971) [0x7f979d72e971]
 20: (clone()+0x6d) [0x7f979bfbe92d]

Related issues

Duplicated by Ceph - Bug #1684: mon: crash in CryptoKey::encrypt Duplicate 11/04/2011

Associated revisions

Revision 383dfa33 (diff)
Added by Sage Weil over 9 years ago

crypto: make crypto handlers non-static

These were static in auth/Crypto.cc, which was mostly fine, except when
we got a signal shutting everything down for the gcov stuff, like so:

Thread 21 (Thread 2164):
#0 0x00007f31a800b3cd in open64 () from /lib/libpthread.so.0
#1 0x000000000081dee0 in __gcov_open ()
#2 0x000000000081e3fd in gcov_exit ()
#3 0x00007f31a67e64f2 in exit () from /lib/libc.so.6
#4 0x000000000054e1ca in handle_signal (signal=<value optimized out>) at osd/OSD.cc:600
#5 <signal handler called>
#6 0x00007f31a8007a9a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#7 0x0000000000636d7b in Wait (this=0x2241000) at ./common/Cond.h:48
#8 SimpleMessenger::wait (this=0x2241000) at msg/SimpleMessenger.cc:2637
#9 0x00000000004a4e35 in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:343

and a racing thread would, say, accept a connection and then crash, like
so:

#0 0x00007f31a800ba0b in raise () from /lib/libpthread.so.0
#1 0x0000000000696eeb in reraise_fatal (signum=2164) at global/signal_handler.cc:59
#2 0x00000000006976cc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3 <signal handler called>
#4 0x00007f31a67e0ba5 in raise () from /lib/libc.so.6
#5 0x00007f31a67e46b0 in abort () from /lib/libc.so.6
#6 0x00007f31a70846bd in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7 0x00007f31a7082906 in ?? () from /usr/lib/libstdc++.so.6
#8 0x00007f31a7082933 in std::terminate() () from /usr/lib/libstdc++.so.6
#9 0x00007f31a708328f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#10 0x0000000000690e5b in CryptoKey::decrypt (this=0x7f3195a67510, in=..., out=..., error=...) at auth/Crypto.cc:404
#11 0x000000000079ccee in void decode_decrypt_enc_bl<CephXServiceTicketInfo>(CephXServiceTicketInfo&, CryptoKey, ceph::buffer::list&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#12 0x0000000000795ca3 in cephx_verify_authorizer (cct=0x2232000, keys=<value optimized out>, indata=...,
ticket_info=<value optimized out>, reply_bl=<value optimized out>) at auth/cephx/CephxProtocol.cc:438
#13 0x00000000007a17cf in CephxAuthorizeHandler::verify_authorizer (this=<value optimized out>, cct=0x2232000, keys=0x2256000,
authorizer_data=<value optimized out>, authorizer_reply=..., entity_name=..., global_id=@0x7f3195a67848, caps_info=...,
auid=0x7f3195a67840) at auth/cephx/CephxAuthorizeHandler.cc:21
#14 0x00000000005577ff in OSD::ms_verify_authorizer (this=0x2267000, con=0x230da00, peer_type=<value optimized out>,
protocol=<value optimized out>, authorizer_data=<value optimized out>, authorizer_reply=<value optimized out>,
isvalid=@0x7f3195a67c0f) at osd/OSD.cc:2723
#15 0x0000000000611ce1 in ms_deliver_verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at msg/Messenger.h:145
#16 SimpleMessenger::verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f)
at msg/SimpleMessenger.cc:2419
#17 0x00000000006309ab in SimpleMessenger::Pipe::accept (this=0x22ce280) at msg/SimpleMessenger.cc:756
#18 0x0000000000634711 in SimpleMessenger::Pipe::reader (this=0x22ce280) at msg/SimpleMessenger.cc:1546
#19 0x00000000004a7085 in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:208
#20 0x000000000060f252 in Thread::_entry_func (arg=0x874) at common/Thread.cc:42
#21 0x00007f31a8003971 in start_thread () from /lib/libpthread.so.0
#22 0x00007f31a689392d in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()

Instead, put these on the heap. Set them up in the ceph::crypto::init()
method, and tear them down in ceph::crypto::shutdown().

Fixes: #1633
Signed-off-by: Sage Weil <>

History

#1 Updated by Josh Durgin almost 10 years ago

  • Target version set to v0.38

Happened again while thrashing in teuthology:~/log/osd.3.log.gz

#2 Updated by Sage Weil almost 10 years ago

  • Target version changed from v0.38 to v0.39

#3 Updated by Sage Weil almost 10 years ago

  • translation missing: en.field_position set to 928

#5 Updated by Sage Weil over 9 years ago

have a core but no matching binary :(. need to reproduce again, and save the build tarball.

#6 Updated by Josh Durgin over 9 years ago

Happened again today. I put the core and tarball on the gcov gitbuilder in ~ubuntu/bug_1633.

#7 Updated by Sage Weil over 9 years ago

  • Status changed from New to Resolved
  • Assignee set to Sage Weil

Also available in: Atom PDF