Bug #55577
openOSD crashes on devicehealth scraping
0%
Description
- Linux kernel version: `5.17.5-arch1-1`
- Ceph version: `17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)`
- Rook version: `v1.9.2`
The OSD crash as soon as the devicehealth is scraped, e.g., by executing `ceph device scrape-daemon-health-metrics osd.2`.
The OSD container gets restarted by kubernetes because the liveness probe command (`ceph --admin-daemon /run/ceph/ceph-osd.2.asok status`) fails three times in a row.
Within the pod, I'm able to execute the `smartctl` command without any errors.
Last few lines of the OSD log:
```
debug -7> 2022-05-09T10:54:38.559+0000 7f975380e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-09T10:54:08.561261+0000)
debug -6> 2022-05-09T10:54:39.559+0000 7f975380e700 10 monclient: tick
debug -5> 2022-05-09T10:54:39.559+0000 7f975380e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-09T10:54:09.561363+0000)
debug -4> 2022-05-09T10:54:40.559+0000 7f975380e700 10 monclient: tick
debug -3> 2022-05-09T10:54:40.559+0000 7f975380e700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2022-05-09T10:54:10.561470+0000)
debug -2> 2022-05-09T10:54:41.439+0000 7f9768245700 10 monclient: get_auth_request con 0x55c8063da000 auth_method 0
debug -1> 2022-05-09T10:54:41.439+0000 7f9768a46700 10 monclient: get_auth_request con 0x55c7e90f1c00 auth_method 0
debug 0> 2022-05-09T10:54:41.443+0000 7f974b7fe700 -1 ** Caught signal (Segmentation fault) *
in thread 7f974b7fe700 thread_name:safe_timer
ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable)
1: /lib64/libpthread.so.0(+0x12ce0) [0x7f976ca3fce0]
2: (BlueStore::_txc_create(BlueStore::Collection*, BlueStore::OpSequencer*, std::__cxx11::list<Context*, std::allocator<Context*> >, boost::intrusive_ptr<TrackedOp>)+0x3ae) [0x55c7e502855e]
3: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle)+0x260) [0x55c7e508d3e0]
4: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x55) [0x55c7e4c8d2e5]
5: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0xca8) [0x55c7e4ea7f98]
6: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xc90) [0x55c7e4bf3da0]
7: (PrimaryLogPG::simple_opc_submit(std::unique_ptr<PrimaryLogPG::OpContext, std::default_delete<PrimaryLogPG::OpContext> >)+0x120) [0x55c7e4bf6000]
8: (PrimaryLogPG::handle_watch_timeout(std::shared_ptr<Watch>)+0xb99) [0x55c7e4bf83f9]
9: (HandleWatchTimeout::complete(int)+0x11b) [0x55c7e4b7a4ab]
10: (CommonSafeTimer<std::mutex>::timer_thread()+0x11a) [0x55c7e51f8afa]
11: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x55c7e51fa121]
12: /lib64/libpthread.so.0(+0x81cf) [0x7f976ca351cf]
13: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 rbd_pwl
0/ 5 journaler
0/ 5 objectcacher
0/ 5 immutable_obj_cache
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/ 5 rgw_datacache
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 fuse
2/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
1/ 5 prioritycache
0/ 5 test
0/ 5 cephfs_mirror
0/ 5 cephsqlite
0/ 5 seastore
0/ 5 seastore_onode
0/ 5 seastore_odata
0/ 5 seastore_omap
0/ 5 seastore_tm
0/ 5 seastore_cleaner
0/ 5 seastore_lba
0/ 5 seastore_cache
0/ 5 seastore_journal
0/ 5 seastore_device
0/ 5 alienstore
1/ 5 mclock
2/-2 (syslog threshold) pthread ID / name mapping for recent threads ---
99/99 (stderr threshold)
--
7f9746ff5700 /
```