Bug #64373: osd: Segmentation fault on OSD shutdown - RADOS - Ceph

Actions

Copy link

Bug #64373

open

osd: Segmentation fault on OSD shutdown

Added by Igor Fedotov 3 months ago. Updated 5 days ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

low-hanging-fruit

Backport:

reef, quincy, pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

56804

Crash signature (v1):

Crash signature (v2):

Description

It looks like watch timer could trigger PG transaction on an already closed ObjectStore.
This could happen when fast shutdown is enabled, BlueStore uses non-rotational drives and is being shutdown and some Watch left deregistered.
If Watch timeout occurs during the shutdown process PG attempts to issue transaction against the store and segfaults.

2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Got signal Terminated ***
2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) ***
2024-02-08T22:14:53.892-0600 7f045fcb3700  0 osd.200 2472348 Fast Shutdown: - cct->_conf->osd_fast_shutdown = 1, null-fm = 1
2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) ***
2024-02-08T22:14:53.892-0600 7f045fcb3700  0 osd.200 2472348 prepare_to_stop telling mon we are shutting down and dead
Stopping Ceph object storage daemon osd.200...
2024-02-08T22:14:54.132-0600 7f0453bb0700  0 osd.200 2472348 got_stop_ack starting shutdown
2024-02-08T22:14:54.132-0600 7f045fcb3700  0 osd.200 2472348 prepare_to_stop starting shutdown
2024-02-08T22:15:01.956-0600 7f045fcb3700  1 bdev(0x55dc906de400 /var/lib/ceph/osd/ceph-200/block) close
*** Caught signal (Segmentation fault) **
in thread 7f0447e1f700 thread_name:safe_timer
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
 1: /lib64/libpthread.so.0(+0x168c0) [0x7f0465a0c8c0]
 2: (BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, std::__cxx11::list<Context*, std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x406) [0x55dc8dc10816]
 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x21f) [0x55dc8dc9065f]
 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x55dc8d87255f]
 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7c4) [0x55dc8daa60d4]
 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x50d) [0x55dc8d7f043d]
 7: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x56) [0x55dc8d7f2206]
 8: (PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0x988) [0x55dc8d7f48b8]
 9: (HandleWatchTimeout::complete(int)+0x11a) [0x55dc8d75f3ca]
 10: (CommonSafeTimer<std::mutex>::timer_thread()+0x128) [0x55dc8dddc798]
 11: (CommonSafeTimerThread<std::mutex>::entry()+0xd) [0x55dc8dddd7dd]
 12: /lib64/libpthread.so.0(+0xa6ea) [0x7f0465a006ea]
 13: clone()

Actions

Copy link

Updated by Igor Fedotov 3 months ago

Backport set to reef, quincy, pacific

Actions

Copy link

Updated by Igor Fedotov 3 months ago

Originally the issue found in a production cluster running v17.2.5

But the issue is reproducible using vstart cluster using the following steps too:

- Put osds on SSD drives
- set osd fast shutdown = true
- insert sleep(60) after store->mount() call using the following

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 9e6b3fd9d92..4f1f238d979 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -4593,6 +4593,9 @@ int OSD::shutdown()
     std::lock_guard lock(osd_lock);
     // TBD: assert in allocator that nothing is being add
     store->umount();
+dout(0) << " sleeping " << dendl;
+sleep(60);
+dout(0) << " awake " << dendl;

     utime_t end_time = ceph_clock_now();
     dout(0) <<"Fast Shutdown duration total     :" << end_time              - start_time_func       << " seconds" << dendl;

- start watching object using "rados watch" command and terminate it with ctrl-C
  - stop (e.g. via kill &lt;pid&gt;) OSD which keeps primary PG for a watched object immediately after.
  - if Watch hasn't been deregistered before the kill - OSD crashes. The tricky thing could be to get the timeout after OSD terminate request...

Actions

Copy link