Project

General

Profile

Actions

Bug #51652

open

heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pacific

Added by Yuri Weinstein almost 3 years ago. Updated 6 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

Description

Run: https://pulpito.ceph.com/teuthology-2021-07-10_14:15:22-upgrade:pacific-p2p-pacific-distro-basic-smithi/
Job: 6262272
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2021-07-10_14:15:22-upgrade:pacific-p2p-pacific-distro-basic-smithi/6262272/teuthology.log

2021-07-10T15:19:21.538+0000 7f157aff9700 20 osd.0 pg_epoch: 1331 pg[8.e( v 1281'2407 (1203'2377,1281'2407] lb MIN local-lis/les=1241/1242 n=126 ec=472/472 lis/c=1241/1241 les/c/f=1242/1242/0 sis=1318) [] r=-1 lpr=1318 DELETING pi=[124
1,1318)/1 crt=1281'2407 lcod 1262'2406 mlcod 0'0 unknown NOTIFY mbc={}] do_delete_work deleting 30 objects
2021-07-10T15:19:21.538+0000 7f157aff9700 10 log is not dirty
2021-07-10T15:19:21.538+0000 7f157aff9700 20 osd.0 1331 dispatch_context not up in osdmap
2021-07-10T15:19:21.542+0000 7f157aff9700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f157aff9700' had timed out after 15.000000954s
2021-07-10T15:19:21.542+0000 7f157aff9700  1 heartbeat_map clear_timeout 'OSD::osd_op_tp thread 0x7f157aff9700' had suicide timed out after 150.000000000s
2021-07-10T15:19:21.546+0000 7f157aff9700 -1 *** Caught signal (Aborted) **
 in thread 7f157aff9700 thread_name:tp_osd_tp

 ceph version 16.2.5-14-g6872adc9 (6872adc98a8f647027b52e7382347dcce81ecdf6) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f159ef103c0]
 2: pthread_kill()
 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x48a) [0x5563501729aa]
 4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x63) [0x556350172bc3]
 5: (FileStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x8f0) [0x55634ff230f0]
 6: (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ceph::os::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x87) [0x55634fabbbe7]
 7: (OSD::dispatch_context(PeeringCtx&, PG*, std::shared_ptr<OSDMap const>, ThreadPool::TPHandle*)+0x1f3) [0x55634fa3f3a3]
 8: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x208) [0x55634fa85e98]
 9: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&)+0xc9) [0x55634fa86109]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x872) [0x55634fa70862]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x556350192a63]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x556350195884]
 13: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f159ef04609]
 14: clone()
Actions #2

Updated by Neha Ojha almost 3 years ago

  • Project changed from Ceph to RADOS
Actions #3

Updated by Neha Ojha almost 3 years ago

  • Subject changed from heartbeat timeouts while deleting objects in upgrade:pacific-p2p-pacific to heartbeat timeouts on filestore OSDs while deleting objects in upgrade:pacific-p2p-pacific
Actions #4

Updated by Neha Ojha almost 3 years ago

1-07-13T15:55:24.036+0000 7f0faa54b700 20 snap_mapper.get_snaps 5:14f0b3be:::benchmark_data_smithi129_13703_object44700:head got.empty()
2021-07-13T15:55:24.036+0000 7f0faa54b700 20 snap_mapper.remove_oid 5:14f12acc:::benchmark_data_smithi129_13703_object36459:head
2021-07-13T15:55:24.036+0000 7f0faa54b700 20 snap_mapper._remove_oid 5:14f12acc:::benchmark_data_smithi129_13703_object36459:head
2021-07-13T15:55:24.264+0000 7f0fa8547700 20 snap_mapper.get_snaps 5:38710be4:::benchmark_data_smithi129_13703_object26548:head got.empty()
2021-07-13T15:55:24.264+0000 7f0fa8547700 20 snap_mapper.remove_oid 5:3872ac91:::benchmark_data_smithi129_13703_object33436:head
2021-07-13T15:55:24.264+0000 7f0fa8547700 20 snap_mapper._remove_oid 5:3872ac91:::benchmark_data_smithi129_13703_object33436:head
2021-07-13T15:55:24.496+0000 7f0fa3d3e700 20 snap_mapper.get_snaps 5:b844627d:::benchmark_data_smithi129_13703_object23023:head got.empty()
2021-07-13T15:55:24.496+0000 7f0fa3d3e700 20 snap_mapper.remove_oid 5:b8449bb0:::benchmark_data_smithi129_13703_object19423:head
2021-07-13T15:55:24.496+0000 7f0fa3d3e700 20 snap_mapper._remove_oid 5:b8449bb0:::benchmark_data_smithi129_13703_object19423:head
2021-07-13T15:55:24.660+0000 7f0fb455f700  1 -- [v2:172.21.15.63:6808/43537,v1:172.21.15.63:6809/43537] --> [v2:172.21.15.63:3302/0,v1:172.21.15.63:6791/0] -- auth(proto 2 132 bytes epoch 0) v1 -- 0x55cd5425f500 con 0x55cd5319ac00
2021-07-13T15:55:24.660+0000 7f0fb455f700  1 -- [v2:172.21.15.63:6808/43537,v1:172.21.15.63:6809/43537] --> [v2:172.21.15.63:3302/0,v1:172.21.15.63:6791/0] -- auth(proto 2 2 bytes epoch 0) v1 -- 0x55cd6127bc80 con 0x55cd5319ac00
2021-07-13T15:55:24.740+0000 7f0fa6d44700 20 snap_mapper.get_snaps 5:f45fc302:::benchmark_data_smithi129_13703_object62570:head got.empty()
2021-07-13T15:55:24.740+0000 7f0fa6d44700 20 snap_mapper.remove_oid 5:f460d718:::benchmark_data_smithi129_13703_object53163:head
2021-07-13T15:55:24.740+0000 7f0fa6d44700 20 snap_mapper._remove_oid 5:f460d718:::benchmark_data_smithi129_13703_object53163:head
2021-07-13T15:55:24.924+0000 7f0fc52ef700 10 osd.1 1136 tick
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa2d3c700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa353d700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa3d3e700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa453f700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa5541700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa5d42700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa6543700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa4d40700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa7545700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa8547700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa8d48700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0fa9549700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0faa54b700' had timed out after 15.000000954s
2021-07-13T15:55:24.924+0000 7f0fc52ef700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0faa54b700' had suicide timed out after 150.000000000s
2021-07-13T15:55:24.940+0000 7f0faa54b700 -1 *** Caught signal (Aborted) **
 in thread 7f0faa54b700 thread_name:tp_osd_tp

 ceph version 16.2.5-22-g7dfa96d6 (7dfa96d67e29dca4dd54f4478ee9f06d66121611) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f0fcb45c3c0]
 2: pthread_cond_wait()
 3: (std::condition_variable::wait(std::unique_lock<std::mutex>&)+0x10) [0x7f0fcb2c9e50]
 4: (DBObjectMap::MapHeaderLock::MapHeaderLock(DBObjectMap*, ghobject_t const&)+0x13b) [0x55cd466fa3bb]
 5: (DBObjectMap::get_values(ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x4d) [0x55cd466e73dd]
 6: (FileStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x16d) [0x55cd4654953d]
 7: (MapCacher::MapCacher<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>::get_keys(std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x4d8) [0x55cd462d4f78]
 8: (SnapMapper::get_snaps(hobject_t const&, SnapMapper::object_snaps*)+0xfa) [0x55cd462ca48a]
 9: (SnapMapper::_remove_oid(hobject_t const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>*)+0x159) [0x55cd462cba39]
 10: (SnapMapper::remove_oid(hobject_t const&, MapCacher::Transaction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>*)+0x58) [0x55cd462cbf38]
 11: (PG::do_delete_work(ceph::os::Transaction&, ghobject_t)+0x5cc) [0x55cd46162e0c]
 12: (PeeringState::Deleting::react(PeeringState::DeleteSome const&)+0x18b) [0x55cd4634e18b]
 13: (boost::statechart::simple_state<PeeringState::Deleting, PeeringState::ToDelete, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x119) [0x55cd463bcb29]
 14: (boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x74) [0x55cd46169894]
 15: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PeeringCtx&)+0x1aa) [0x55cd4615564a]
 16: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1cd) [0x55cd460c9c6d]
 17: (OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&)+0xc9) [0x55cd460c9f19]
 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x872) [0x55cd460b4672]
 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x55cd467d6873]
 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55cd467d9694]
 21: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f0fcb450609]
 22: clone()
Actions #5

Updated by Neha Ojha over 2 years ago

/a/yuriw-2021-11-28_15:43:54-upgrade:pacific-p2p-pacific-16.2.7_RC1-distro-default-smithi/6531998

Actions #6

Updated by Yaarit Hatuka almost 2 years ago

  • Crash signature (v2) updated (diff)
  • Affected Versions v15.2.13 added
Actions #7

Updated by Laura Flores over 1 year ago

  • Translation missing: en.field_tag_list set to test-failure

/a/yuriw-2022-12-14_15:40:37-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/7116495

Actions #8

Updated by Radoslaw Zarzynski over 1 year ago

  • Priority changed from Normal to Low

Lowered the priority as @FileStore` is not only deprecated but also being removed right now.

Actions #9

Updated by Laura Flores 6 months ago

Also quincy p2p: /a/lflores-2023-10-19_19:56:28-upgrade:quincy-p2p-quincy-release-distro-default-smithi/7432647

Actions

Also available in: Atom PDF