Project

General

Profile

Bug #65632

Updated by Xuehan Xu 25 days ago

There are time gaps between the destruction of OSDMapBlockers and OSDMapBlockers unreferencing from BlockingEvents. If BlockingEvents, if a operation dump is called triggerred during this interval, a SIGSEGV would be triggerred. met. 

 <pre> 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): starting start_pg_operation 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation in await_active stage 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation active, entering await_map 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation await_map stage 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - evt epoch is 238, i have 237, will wait 
 </pre> 

 <pre> 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - OSD::committed_osd_maps: osd.2: committed_osd_maps: broadcasting osdmaps up to 238 epoch to pgs 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=1.0 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=292, detail=PGAdvanceMap(pg=1.0 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.0 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=293, detail=PGAdvanceMap(pg=2.0 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.1 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=294, detail=PGAdvanceMap(pg=2.1 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.2 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=295, detail=PGAdvanceMap(pg=2.2 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.3 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=296, detail=PGAdvanceMap(pg=2.3 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.4 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=297, detail=PGAdvanceMap(pg=2.4 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.5 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=298, detail=PGAdvanceMap(pg=2.5 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.6 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=299, detail=PGAdvanceMap(pg=2.6 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.7 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=300, detail=PGAdvanceMap(pg=2.7 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.0 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=301, detail=PGAdvanceMap(pg=3.0 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.3 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=302, detail=PGAdvanceMap(pg=3.3 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.4 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=303, detail=PGAdvanceMap(pg=3.4 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.5 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=304, detail=PGAdvanceMap(pg=3.5 from=4294967295 to=238)) 
 DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.6 from=4294967295 to=238)): created 
 INFO    2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=305, detail=PGAdvanceMap(pg=3.6 from=4294967295 to=238)) 
 </pre> 
 <pre> 
 DEBUG 2024-04-23 02:54:56,912 [shard 0:main] osd - PGShardManager::broadcast_map_to_pgs broadcasted up to 238 
 INFO    2024-04-23 02:54:56,912 [shard 0:main] osd - OSD::committed_osd_maps: osd.2: now active 
 DEBUG 2024-04-23 02:54:56,912 [shard 0:main] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "dump_ops_in_flight"} 
 Segmentation fault on shard 0. 
 Backtrace: 
  0# crimson::Blocker::dump(ceph::Formatter*) const in ceph-osd 
  1# crimson::detail::dump_blocking_event(char const*, utime_t const&, crimson::Blocker const*, ceph::Formatter*) in ceph-osd 
  2# crimson::osd::PeeringEvent<crimson::osd::RemotePeeringEvent>::dump_detail(ceph::Formatter*) const in ceph-osd 
  3# crimson::Operation::dump(ceph::Formatter*) const in ceph-osd 
  4# crimson::osd::PerShardState::dump_ops_in_flight(ceph::Formatter*) const in ceph-osd 
  5# _ZN7seastar3smp9submit_toIZZN7crimson15reactor_map_seqIZNS2_15sharded_map_seqINS_7shardedINS2_3osd13ShardServicesEEEZNKS6_14PGShardManager24invoke_on_each_shard_seqIZNKS2_5admin19DumpInFlightOpsHook4callERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN5boost7variantISJ_JbldSt6vectorISJ_SaISJ_EESM_IlSaIlEESM_IdSaIdEEEEESt4lessIvESaISt4pairIKSJ_ST_EEESt17basic_string_viewIcSH_EON4ceph6buffer7v15_2_04listEEUlRKT_E_JEEENS_6futureIvEEOS1A_EUlRKS7_E_EEDaRS1A_OT0_EUlvE_EEDaS1G_ENUlS1A_E_clIjEEDaS1A_EUlvE_EENS_8futurizeINSt13invoke_resultIS1A_JEE4typeEE4typeEjNS_21smp_submit_to_optionsES1G_ in ceph-osd 
  6# crimson::admin::DumpInFlightOpsHook::call(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, std::basic_string_view<char, std::char_traits<char> >, ceph::buffer::v15_2_0::list&&) const in ceph-osd 
  7# crimson::admin::AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, ceph::buffer::v15_2_0::list&&) in ceph-osd 
  8# crimson::admin::AdminSocket::execute_line(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, seastar::output_stream<char>&) in ceph-osd 
  9# 0x00000000019BE53C in ceph-osd 
 10# 0x0000000002EA2FE8 in ceph-osd 
 11# 0x0000000002EA3404 in ceph-osd 
 12# 0x0000000002EE4A36 in ceph-osd 
 13# 0x0000000002EE577D in ceph-osd 
 14# 0x0000000002E338ED in ceph-osd 
 15# 0x0000000002E34175 in ceph-osd 
 16# main in ceph-osd 
 17# __libc_start_main in /lib64/libc.so.6 
 18# _start in ceph-osd 
 </pre>

Back