Bug #64373
openosd: Segmentation fault on OSD shutdown
0%
Description
It looks like watch timer could trigger PG transaction on an already closed ObjectStore.
This could happen when fast shutdown is enabled, BlueStore uses non-rotational drives and is being shutdown and some Watch left deregistered.
If Watch timeout occurs during the shutdown process PG attempts to issue transaction against the store and segfaults.
2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Got signal Terminated *** 2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) *** 2024-02-08T22:14:53.892-0600 7f045fcb3700 0 osd.200 2472348 Fast Shutdown: - cct->_conf->osd_fast_shutdown = 1, null-fm = 1 2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) *** 2024-02-08T22:14:53.892-0600 7f045fcb3700 0 osd.200 2472348 prepare_to_stop telling mon we are shutting down and dead Stopping Ceph object storage daemon osd.200... 2024-02-08T22:14:54.132-0600 7f0453bb0700 0 osd.200 2472348 got_stop_ack starting shutdown 2024-02-08T22:14:54.132-0600 7f045fcb3700 0 osd.200 2472348 prepare_to_stop starting shutdown 2024-02-08T22:15:01.956-0600 7f045fcb3700 1 bdev(0x55dc906de400 /var/lib/ceph/osd/ceph-200/block) close *** Caught signal (Segmentation fault) ** in thread 7f0447e1f700 thread_name:safe_timer ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable) 1: /lib64/libpthread.so.0(+0x168c0) [0x7f0465a0c8c0] 2: (BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, std::__cxx11::list<Context*, std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x406) [0x55dc8dc10816] 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x21f) [0x55dc8dc9065f] 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x55dc8d87255f] 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7c4) [0x55dc8daa60d4] 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x50d) [0x55dc8d7f043d] 7: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x56) [0x55dc8d7f2206] 8: (PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0x988) [0x55dc8d7f48b8] 9: (HandleWatchTimeout::complete(int)+0x11a) [0x55dc8d75f3ca] 10: (CommonSafeTimer<std::mutex>::timer_thread()+0x128) [0x55dc8dddc798] 11: (CommonSafeTimerThread<std::mutex>::entry()+0xd) [0x55dc8dddd7dd] 12: /lib64/libpthread.so.0(+0xa6ea) [0x7f0465a006ea] 13: clone()
Updated by Igor Fedotov 3 months ago
Originally the issue found in a production cluster running v17.2.5
But the issue is reproducible using vstart cluster using the following steps too:
- Put osds on SSD drives
- set osd fast shutdown = true
- insert sleep(60) after store->mount() call using the following
diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 9e6b3fd9d92..4f1f238d979 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -4593,6 +4593,9 @@ int OSD::shutdown()
std::lock_guard lock(osd_lock);
// TBD: assert in allocator that nothing is being add
store->umount();
+dout(0) << " sleeping " << dendl;
+sleep(60);
+dout(0) << " awake " << dendl;
utime_t end_time = ceph_clock_now();
dout(0) <<"Fast Shutdown duration total :" << end_time - start_time_func << " seconds" << dendl;
- start watching object using "rados watch" command and terminate it with ctrl-C
- stop (e.g. via kill <pid>) OSD which keeps primary PG for a watched object immediately after.
- if Watch hasn't been deregistered before the kill - OSD crashes. The tricky thing could be to get the timeout after OSD terminate request...
Updated by Igor Fedotov 3 months ago
The actual fix should stop OSDService::watch_timer before store's umount or prevent store from handling requests once stopped in a graceful manner.
Updated by Radoslaw Zarzynski 3 months ago
- Translation missing: en.field_tag_list set to low-hanging-fruit
- Tags set to low-hanging-fruit
low-hanging-fruit
as the RCA is provided above.
Updated by Igor Fedotov 5 days ago
- Status changed from New to Fix Under Review
- Pull request ID set to 56804