Project

General

Profile

Actions

Bug #64373

open

osd: Segmentation fault on OSD shutdown

Added by Igor Fedotov 3 months ago. Updated 5 days ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
low-hanging-fruit
Backport:
reef, quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It looks like watch timer could trigger PG transaction on an already closed ObjectStore.
This could happen when fast shutdown is enabled, BlueStore uses non-rotational drives and is being shutdown and some Watch left deregistered.
If Watch timeout occurs during the shutdown process PG attempts to issue transaction against the store and segfaults.

2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Got signal Terminated ***
2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) ***
2024-02-08T22:14:53.892-0600 7f045fcb3700  0 osd.200 2472348 Fast Shutdown: - cct->_conf->osd_fast_shutdown = 1, null-fm = 1
2024-02-08T22:14:53.892-0600 7f045fcb3700 -1 osd.200 2472348 *** Immediate shutdown (osd_fast_shutdown=true) ***
2024-02-08T22:14:53.892-0600 7f045fcb3700  0 osd.200 2472348 prepare_to_stop telling mon we are shutting down and dead
Stopping Ceph object storage daemon osd.200...
2024-02-08T22:14:54.132-0600 7f0453bb0700  0 osd.200 2472348 got_stop_ack starting shutdown
2024-02-08T22:14:54.132-0600 7f045fcb3700  0 osd.200 2472348 prepare_to_stop starting shutdown
2024-02-08T22:15:01.956-0600 7f045fcb3700  1 bdev(0x55dc906de400 /var/lib/ceph/osd/ceph-200/block) close
*** Caught signal (Segmentation fault) **
in thread 7f0447e1f700 thread_name:safe_timer
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
 1: /lib64/libpthread.so.0(+0x168c0) [0x7f0465a0c8c0]
 2: (BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, std::__cxx11::list<Context*, std::allocator<Context*> >*, boost::intrusive_ptr<TrackedOp>)+0x406) [0x55dc8dc10816]
 3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x21f) [0x55dc8dc9065f]
 4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x4f) [0x55dc8d87255f]
 5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7c4) [0x55dc8daa60d4]
 6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x50d) [0x55dc8d7f043d]
 7: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x56) [0x55dc8d7f2206]
 8: (PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0x988) [0x55dc8d7f48b8]
 9: (HandleWatchTimeout::complete(int)+0x11a) [0x55dc8d75f3ca]
 10: (CommonSafeTimer<std::mutex>::timer_thread()+0x128) [0x55dc8dddc798]
 11: (CommonSafeTimerThread<std::mutex>::entry()+0xd) [0x55dc8dddd7dd]
 12: /lib64/libpthread.so.0(+0xa6ea) [0x7f0465a006ea]
 13: clone()
Actions #1

Updated by Igor Fedotov 3 months ago

  • Backport set to reef, quincy, pacific
Actions #2

Updated by Igor Fedotov 3 months ago

Originally the issue found in a production cluster running v17.2.5

But the issue is reproducible using vstart cluster using the following steps too:

- Put osds on SSD drives
- set osd fast shutdown = true
- insert sleep(60) after store->mount() call using the following

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 9e6b3fd9d92..4f1f238d979 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -4593,6 +4593,9 @@ int OSD::shutdown()
     std::lock_guard lock(osd_lock);
     // TBD: assert in allocator that nothing is being add
     store->umount();
+dout(0) << " sleeping " << dendl;
+sleep(60);
+dout(0) << " awake " << dendl;

     utime_t end_time = ceph_clock_now();
     dout(0) <<"Fast Shutdown duration total     :" << end_time              - start_time_func       << " seconds" << dendl;

- start watching object using "rados watch" command and terminate it with ctrl-C
- stop (e.g. via kill &lt;pid&gt;) OSD which keeps primary PG for a watched object immediately after.
- if Watch hasn't been deregistered before the kill - OSD crashes. The tricky thing could be to get the timeout after OSD terminate request...
Actions #3

Updated by Igor Fedotov 3 months ago

The actual fix should stop OSDService::watch_timer before store's umount or prevent store from handling requests once stopped in a graceful manner.

Actions #4

Updated by Radoslaw Zarzynski 3 months ago

  • Description updated (diff)
Actions #5

Updated by Radoslaw Zarzynski 3 months ago

  • Translation missing: en.field_tag_list set to low-hanging-fruit
  • Tags set to low-hanging-fruit

low-hanging-fruit as the RCA is provided above.

Actions #6

Updated by Igor Fedotov 5 days ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 56804
Actions

Also available in: Atom PDF