Actions
Bug #48821
closedosd crash in OSD::heartbeat when dereferencing null session
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
For an unhealthy (unstable) cluster with flip-flopping osds we observed crashes like this:
ceph version 15.2.5-667-g1a579d5bf2 (1a579d5bf275b4ab4e62bd1094ba0e11bc672d01) octopus (stable) 1: (()+0x132d0) [0x7fd0a6c282d0] 2: (OSD::heartbeat()+0x514) [0x56448b7c44f4] 3: (OSD::heartbeat_entry()+0x83) [0x56448b7c51d3] 4: (OSD::T_Heartbeat::entry()+0xd) [0x56448b83fcad] 5: (()+0x84f9) [0x7fd0a6c1d4f9] 6: (clone()+0x3f) [0x7fd0a59c9fbf]
Some details from the debugger:
#bt #0 0x00007f571a1c5170 in raise () from ./lib64/libpthread.so.0 #1 0x00005641e03cf450 in reraise_fatal (signum=11) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/global/signal_handler.cc:81 #2 handle_fatal_signal (signum=11) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/global/signal_handler.cc:326 #3 <signal handler called> #4 boost::intrusive_ptr<HeartbeatStamps>::operator-> (this=0x1c0) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:200 #5 OSD::heartbeat (this=this@entry=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5695 #6 0x00005641dfdd61d3 in OSD::heartbeat_entry (this=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5568 #7 0x00005641dfe50cad in OSD::T_Heartbeat::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.h:1483 #8 0x00007f571a1ba4f9 in start_thread () from ./lib64/libpthread.so.0 #9 0x00007f5718f66fbf in clone () from ./lib64/libc.so.6 #fr 5 #5 OSD::heartbeat (this=this@entry=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5695 5695 s->stamps->sent_ping(&delta_ub); #l 5690 if (i->second.hb_interval_start == utime_t()) 5691 i->second.hb_interval_start = now; 5692 5693 Session *s = static_cast<Session*>(i->second.con_back->get_priv().get()); 5694 std::optional<ceph::signedspan> delta_ub; 5695 s->stamps->sent_ping(&delta_ub); 5696 5697 i->second.con_back->send_message( 5698 new MOSDPing(monc->get_fsid(), 5699 service.get_osdmap_epoch(), #fr 4 #4 boost::intrusive_ptr<HeartbeatStamps>::operator-> (this=0x1c0) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:200 200 return px; #p this $8 = (const boost::intrusive_ptr<HeartbeatStamps> * const) 0x1c0
So it crashes trying to dereferrence a session pointer which is null (probably reset by ms_handle_reset?).
Actions