Project

General

Profile

Actions

Bug #65632

open

crimson osd crashes due to daggling pointers of operation blockers

Added by Xuehan Xu 11 days ago. Updated 10 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

There are time gaps between the destruction of OSDMapBlockers and OSDMapBlockers unreferencing from BlockingEvents. If a operation dump is called during this interval, a SIGSEGV would be triggerred.

DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): starting start_pg_operation
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation in await_active stage
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation active, entering await_map
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - peering_event(id=291, detail=PeeringEvent(from=1 pgid=2.2 sent=238 requested=238 evt=epoch_sent: 238 epoch_requested: 238 MNotifyRec 2.2 from 1 notify: (query:238 sent:238 2.2( empty local-lis/les=216/217 n=0 ec=13/13 lis/c=216/216 les/c/f=217/217/0 sis=237) ([216,236] all_participants={0, 1, 2} intervals=[([216,234] acting={0, 1, 2})])) features: 0x3f03cffffffdffff +create_info)): start_pg_operation await_map stage
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - evt epoch is 238, i have 237, will wait
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - OSD::committed_osd_maps: osd.2: committed_osd_maps: broadcasting osdmaps up to 238 epoch to pgs
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=1.0 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=292, detail=PGAdvanceMap(pg=1.0 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.0 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=293, detail=PGAdvanceMap(pg=2.0 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.1 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=294, detail=PGAdvanceMap(pg=2.1 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.2 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=295, detail=PGAdvanceMap(pg=2.2 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.3 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=296, detail=PGAdvanceMap(pg=2.3 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.4 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=297, detail=PGAdvanceMap(pg=2.4 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.5 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=298, detail=PGAdvanceMap(pg=2.5 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.6 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=299, detail=PGAdvanceMap(pg=2.6 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=2.7 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=300, detail=PGAdvanceMap(pg=2.7 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.0 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=301, detail=PGAdvanceMap(pg=3.0 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.3 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=302, detail=PGAdvanceMap(pg=3.3 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.4 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=303, detail=PGAdvanceMap(pg=3.4 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.5 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=304, detail=PGAdvanceMap(pg=3.5 from=4294967295 to=238))
DEBUG 2024-04-23 02:54:56,906 [shard 0:main] osd - pg_advance_map(id=0, detail=PGAdvanceMap(pg=3.6 from=4294967295 to=238)): created
INFO  2024-04-23 02:54:56,906 [shard 0:main] osd - PerShardState::start_operation, pg_advance_map(id=305, detail=PGAdvanceMap(pg=3.6 from=4294967295 to=238))

DEBUG 2024-04-23 02:54:56,912 [shard 0:main] osd - PGShardManager::broadcast_map_to_pgs broadcasted up to 238
INFO  2024-04-23 02:54:56,912 [shard 0:main] osd - OSD::committed_osd_maps: osd.2: now active
DEBUG 2024-04-23 02:54:56,912 [shard 0:main] osd - AdminSocket::handle_client: incoming asok string: {"prefix": "dump_ops_in_flight"}
Segmentation fault on shard 0.
Backtrace:
 0# crimson::Blocker::dump(ceph::Formatter*) const in ceph-osd
 1# crimson::detail::dump_blocking_event(char const*, utime_t const&, crimson::Blocker const*, ceph::Formatter*) in ceph-osd
 2# crimson::osd::PeeringEvent<crimson::osd::RemotePeeringEvent>::dump_detail(ceph::Formatter*) const in ceph-osd
 3# crimson::Operation::dump(ceph::Formatter*) const in ceph-osd
 4# crimson::osd::PerShardState::dump_ops_in_flight(ceph::Formatter*) const in ceph-osd
 5# _ZN7seastar3smp9submit_toIZZN7crimson15reactor_map_seqIZNS2_15sharded_map_seqINS_7shardedINS2_3osd13ShardServicesEEEZNKS6_14PGShardManager24invoke_on_each_shard_seqIZNKS2_5admin19DumpInFlightOpsHook4callERKSt3mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN5boost7variantISJ_JbldSt6vectorISJ_SaISJ_EESM_IlSaIlEESM_IdSaIdEEEEESt4lessIvESaISt4pairIKSJ_ST_EEESt17basic_string_viewIcSH_EON4ceph6buffer7v15_2_04listEEUlRKT_E_JEEENS_6futureIvEEOS1A_EUlRKS7_E_EEDaRS1A_OT0_EUlvE_EEDaS1G_ENUlS1A_E_clIjEEDaS1A_EUlvE_EENS_8futurizeINSt13invoke_resultIS1A_JEE4typeEE4typeEjNS_21smp_submit_to_optionsES1G_ in ceph-osd
 6# crimson::admin::DumpInFlightOpsHook::call(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, std::basic_string_view<char, std::char_traits<char> >, ceph::buffer::v15_2_0::list&&) const in ceph-osd
 7# crimson::admin::AdminSocket::execute_command(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, ceph::buffer::v15_2_0::list&&) in ceph-osd
 8# crimson::admin::AdminSocket::execute_line(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, seastar::output_stream<char>&) in ceph-osd
 9# 0x00000000019BE53C in ceph-osd
10# 0x0000000002EA2FE8 in ceph-osd
11# 0x0000000002EA3404 in ceph-osd
12# 0x0000000002EE4A36 in ceph-osd
13# 0x0000000002EE577D in ceph-osd
14# 0x0000000002E338ED in ceph-osd
15# 0x0000000002E34175 in ceph-osd
16# main in ceph-osd
17# __libc_start_main in /lib64/libc.so.6
18# _start in ceph-osd
Actions #1

Updated by Xuehan Xu 11 days ago

  • Description updated (diff)
Actions #2

Updated by Xuehan Xu 10 days ago

  • Pull request ID set to 57069
Actions

Also available in: Atom PDF