Actions
Bug #48821
closedosd crash in OSD::heartbeat when dereferencing null session
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
For an unhealthy (unstable) cluster with flip-flopping osds we observed crashes like this:
ceph version 15.2.5-667-g1a579d5bf2 (1a579d5bf275b4ab4e62bd1094ba0e11bc672d01) octopus (stable) 1: (()+0x132d0) [0x7fd0a6c282d0] 2: (OSD::heartbeat()+0x514) [0x56448b7c44f4] 3: (OSD::heartbeat_entry()+0x83) [0x56448b7c51d3] 4: (OSD::T_Heartbeat::entry()+0xd) [0x56448b83fcad] 5: (()+0x84f9) [0x7fd0a6c1d4f9] 6: (clone()+0x3f) [0x7fd0a59c9fbf]
Some details from the debugger:
#bt #0 0x00007f571a1c5170 in raise () from ./lib64/libpthread.so.0 #1 0x00005641e03cf450 in reraise_fatal (signum=11) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/global/signal_handler.cc:81 #2 handle_fatal_signal (signum=11) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/global/signal_handler.cc:326 #3 <signal handler called> #4 boost::intrusive_ptr<HeartbeatStamps>::operator-> (this=0x1c0) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:200 #5 OSD::heartbeat (this=this@entry=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5695 #6 0x00005641dfdd61d3 in OSD::heartbeat_entry (this=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5568 #7 0x00005641dfe50cad in OSD::T_Heartbeat::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.h:1483 #8 0x00007f571a1ba4f9 in start_thread () from ./lib64/libpthread.so.0 #9 0x00007f5718f66fbf in clone () from ./lib64/libc.so.6 #fr 5 #5 OSD::heartbeat (this=this@entry=0x5641eb90c000) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/src/osd/OSD.cc:5695 5695 s->stamps->sent_ping(&delta_ub); #l 5690 if (i->second.hb_interval_start == utime_t()) 5691 i->second.hb_interval_start = now; 5692 5693 Session *s = static_cast<Session*>(i->second.con_back->get_priv().get()); 5694 std::optional<ceph::signedspan> delta_ub; 5695 s->stamps->sent_ping(&delta_ub); 5696 5697 i->second.con_back->send_message( 5698 new MOSDPing(monc->get_fsid(), 5699 service.get_osdmap_epoch(), #fr 4 #4 boost::intrusive_ptr<HeartbeatStamps>::operator-> (this=0x1c0) at /usr/src/debug/ceph-15.2.5.667+g1a579d5bf2-3.3.1.x86_64/build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:200 200 return px; #p this $8 = (const boost::intrusive_ptr<HeartbeatStamps> * const) 0x1c0
So it crashes trying to dereferrence a session pointer which is null (probably reset by ms_handle_reset?).
Updated by Mykola Golub over 3 years ago
The fix seems just to check that the session pointer is not null before trying to use it. If the problem is not deeper...
Updated by Neha Ojha over 3 years ago
sounds right, would you like to create a quick PR for this?
Updated by Mykola Golub over 3 years ago
- Status changed from New to In Progress
- Backport set to octopus
Updated by Mykola Golub over 3 years ago
- Backport changed from octopus to pacific,octopus
Updated by Mykola Golub over 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 38931
Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 3 years ago
- Copied to Backport #49008: pacific: osd crash in OSD::heartbeat when dereferencing null session added
Updated by Backport Bot over 3 years ago
- Copied to Backport #49009: octopus: osd crash in OSD::heartbeat when dereferencing null session added
Updated by Loïc Dachary about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions