Project

General

Profile

Actions

Bug #62857

closed

crimson osd fails to reboot

Added by Xuehan Xu 7 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

#7  __pthread_kill_implementation (no_tid=0, signo=6, threadid=139684504424128) at ./nptl/pthread_kill.c:44
#8  __pthread_kill_internal (signo=6, threadid=139684504424128) at ./nptl/pthread_kill.c:78
#9  __GI___pthread_kill (threadid=139684504424128, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#10 0x00007f0ad5d4a476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#11 0x00007f0ad5d307f3 in __GI_abort () at ./stdlib/abort.c:79
#12 0x0000556b8f90539b in ceph::__ceph_assert_fail (assertion=0x556b91622cfe "i->version > other.tail", file=0x556b91620d31 "/home/xuxuehan/src/ceph/src/osd/osd_types.cc", line=5256, 
    func=0x556b91624a1d "void _handle_dups(crimson::common::CephContext *, pg_log_t &, const pg_log_t &, unsigned int)") at /home/xuxuehan/src/ceph/src/crimson/common/assert.cc:27
#13 0x0000556b8f9052ca in ceph::__ceph_assert_fail (ctx=...) at /home/xuxuehan/src/ceph/src/crimson/common/assert.cc:14
#14 0x0000556b8ffda1b6 in _handle_dups (cct=<optimized out>, target=..., other=..., maxdups=<optimized out>) at /home/xuxuehan/src/ceph/src/osd/osd_types.cc:5256
#15 0x0000556b8ffd91f1 in pg_log_t::copy_after (this=0x44a17, cct=0x7f0ad5098030, other=..., v=...) at /home/xuxuehan/src/ceph/src/osd/osd_types.cc:5287
#16 0x0000556b8fd01b02 in PeeringState::fulfill_log (this=0x7f0acc161d48, from=..., query=..., query_epoch=<optimized out>) at /home/xuxuehan/src/ceph/src/osd/PeeringState.cc:3159
#17 0x0000556b8fd2dae9 in PeeringState::Stray::react (this=this@entry=0x7f0ac707aa50, query=...) at /home/xuxuehan/src/ceph/src/osd/PeeringState.cc:6611
#18 0x0000556b8fd7f6ec in boost::statechart::custom_reaction<MQuery>::react<PeeringState::Stray, boost::statechart::event_base, void const*> (stt=..., evt=..., eventType=<optimized out>) at boost/include/boost/statechart/custom_reaction.hpp:42
#19 boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<MQuery>, boost::statechart::custom_reaction<MLogRec>, boost::statechart::custom_reaction<MInfoRec>, boost::statechart::custom_reaction<PeeringState::ActMap>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> > (stt=..., evt=..., eventType=0x556b92258110 <boost::statechart::detail::id_holder<MQuery>::idProvider_>) at boost/include/boost/statechart/simple_state.hpp:814
#20 0x0000556b8fd4c7a3 in boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react<boost::mpl::list<boost::statechart::custom_reaction<MQuery>, boost::statechart::custom_reaction<MLogRec>, boost::statechart::custom_reaction<MInfoRec>, boost::statechart::custom_reaction<PeeringState::ActMap>, boost::statechart::custom_reaction<RecoveryDone>, boost::statechart::transition<PeeringState::DeleteStart, PeeringState::ToDelete, boost::statechart::detail::no_context<PeeringState::DeleteStart>, &boost::statechart::detail::no_context<PeeringState::DeleteStart>::no_function>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> > (this=0x44a17, evt=..., eventType=0x6) at boost/include/boost/statechart/simple_state.hpp:850
#21 boost::statechart::simple_state<PeeringState::Stray, PeeringState::Started, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl (this=0x44a17, evt=..., eventType=0x6)
    at boost/include/boost/statechart/simple_state.hpp:489
#22 0x0000556b8fa85ade in boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<boost::statechart::none>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>::operator() (this=<optimized out>)
    at boost/include/boost/statechart/state_machine.hpp:87
#23 boost::statechart::null_exception_translator::operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<boost::statechart::none>, boost::statechart::detail::rtti_policy>, boost::statechart::event_base, void const*>, boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::exception_event_handler> (this=0x7f0acc1634b0, action=...)
    at boost/include/boost/statechart/null_exception_translator.hpp:33
#24 boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::send_event (this=this@entry=0x7f0acc163440, evt=...)
    at boost/include/boost/statechart/state_machine.hpp:885
#25 0x0000556b8fa859c2 in boost::statechart::state_machine<PeeringState::PeeringMachine, PeeringState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event (this=0x44a17, this@entry=0x7f0acc163440, evt=...)
    at boost/include/boost/statechart/state_machine.hpp:275
#26 0x0000556b8fa5ae9d in PeeringState::handle_event (this=0x7f0acc161d48, evt=..., rctx=<optimized out>) at /home/xuxuehan/src/ceph/src/osd/PeeringState.h:1753
#27 crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10::operator()() const (this=0x7f0acbd7f070) at /home/xuxuehan/src/ceph/src/crimson/osd/pg.cc:677
#28 std::__invoke_impl<void, crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10>(std::__invoke_other, crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&) (__f=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#29 std::__invoke<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10>(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#30 std::invoke<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10>(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:97
#31 crimson::interruptible::non_futurized_call_with_interruption<crimson::osd::IOInterruptCondition, crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, , void>(seastar::lw_shared_ptr<crimson::osd::IOInterruptCondition>, crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&) (interrupt_condition=..., func=...) at /home/xuxuehan/src/ceph/src/crimson/common/interruptible_future.h:297
#32 crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}::operator()() (this=0x7f0acbd7f070) at /home/xuxuehan/src/ceph/src/crimson/common/interruptible_future.h:1429
#33 std::__invoke_impl<void, crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}>(std::__invoke_other, crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}&&) (__f=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#34 std::__invoke<crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}>(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&) (__fn=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#35 std::__apply_impl<crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}, std::tuple<>>(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&, std::tuple<>&&, std::integer_sequence<unsigned long>) (__f=..., __t=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:1854
#36 std::apply<crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}, std::tuple<> >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&, std::tuple<>&&) (__f=..., __t=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/tuple:1865
#37 seastar::futurize<void>::apply<crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>::async<crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10, crimson::interruptible::interruptible_future_detail<crimson::osd::IOInterruptCondition, seastar::future<void> > >(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&)::{lambda()#1}>(crimson::osd::PG::do_peering_event(PGPeeringEvent&, PeeringCtx&)::$_10&&, std::tuple<>&&) (func=..., args=...)
    at /home/xuxuehan/src/ceph/src/seastar/include/seastar/core/future.hh:2111

Here is the analysis of the issue:

1. PG.X is shared by OSD1 and OSD2, and PeeringState::pg_trim_to is, say, 15'50
2. OSD1 is down.
3. OSD2 updates PeeringState::pg_trim_to to a new value, say 15'100,and update pg_info_t::log_tail and PGIndexedLog::tail accordingly
4. OSD1 is up, and OSD2 is primary; OSD2 reset PeeringState::pg_trim_to during the peering process, and found that there's no need to further trim pglog, so PeeringState::pg_trim_to remains zero;
5. A new modification is issued to an object in PG.X, and OSD2 issues an repop to OSD1 in which it enclosed PG.X's pg info(in which log_tail is the new 15'100);
6. Since the pg_trim_to in the repop is zero while pg_info_t::log_tail is 15'100, OSD1 will not trim its pglog, while updates its pg info's log_tail to 15'100, which leads to a mismatch log bound;
7. Before pglog of PG.X is trimmed on OSD2 again, OSD1 goes down;
8. OSD1 goes up again, and has a "log bound mismatch" pglog;
9. During the peering process of PG.X, OSD2 asks OSD1 for its pglog, and the assertion "ceph_assert(i->version > other.tail);" in "_handle_dups()" is triggerred, since OSD1 never trimmed its pglog and the real tail remains 15'50

The following are the relative logs that proves the above analysis:

1. OSD0 goes down at 2023-09-15 15:16:16 and gets up at 2013-09-15 15:19:21:

DEBUG 2023-09-15 15:16:16,915 [shard 0] osd -  pg_epoch 213 pg[1.79( v 213'1539 (0'0,213'1539] local-lis/les=141/142 n=0 ec=11/11 lis/c=141/141 les/c/f=142/142/0 sis=141) [0,2,1] r=0 lpr=141 luod=213'1540 lua=96'1284 crt=213'1540 mlcod 213'1539 active+clean  ObjectContextLoader::with_head_obc: released object 1:9e5d0fe8:::rbd_data.1041a505820.0000000000000084:head
DEBUG 2023-09-15 15:16:16,915 [shard 0] osd - 0x7ffa33afd4f0 ClientRequest::with_pg_int: client_request(id=85166, detail=m=[osd_op(client.4298.0:18368 1.79 1:9e5d0fe8:::rbd_data.1041a505820.0000000000000084:head {write 2322432~4096 in=4096b} snapc 0={} ondisk+write+known_if_redirected+supports_pool_eio e213) v8]).0: after process*
DEBUG 2023-09-15 15:16:16,915 [shard 0] osd - 0x7ffa33afd4f0 ClientRequest::~ClientRequest: client_request(id=85142, detail=m=[osd_op(client.4298.0:18338 1.3c 1:3c9a5e39:::rbd_data.1041a505820.0000000000000019:head {write 1122304~4096 in=4096b} snapc 0={} ondisk+write+known_if_redirected+supports_pool_eio e213) v8]): destroying
DEBUG 2023-09-15 15:19:21,103 [shard 0] osd - open_all_classes
DEBUG 2023-09-15 15:19:21,103 [shard 0] osd - open_all_classes found timeindex
DEBUG 2023-09-15 15:19:21,103 [shard 0] osd - _get_class adding new class name timeindex 0x7f09698d1dc0
DEBUG 2023-09-15 15:19:21,103 [shard 0] osd - _load_class timeindex from /home/xuxuehan/src/ceph/build/lib/libcls_timeindex.so
INFO  2023-09-15 15:19:21,104 [shard 0] objclass - <cls> /home/xuxuehan/src/ceph/src/cls/timeindex/cls_timeindex.cc:247: Loaded timeindex class!
DEBUG 2023-09-15 15:19:21,104 [shard 0] osd - register_class timeindex status 3

2. PG 1.5 move pg_trim_to forward from 15'700 to 15'1200 at 2013-09-15 15:17:35:

DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - trim proposed trim_to = 15'1200
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 215'3545, trimmed: , trimmed_dups: , clear_divergent_priors: 0
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - _write_log_and_missing clearing up to 0'0 dirty_to_dups=0'0 dirty_from_dups=4294967295'18446744073709551615 write_from_dups=4294967295'18446744073709551615 trimmed_dups.size()=0
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - _write_log_and_missing going to encode log.dups.size()=1200
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - _write_log_and_missing 1st round encoded log.dups.size()=1200
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - _write_log_and_missing 2st round encoded log.dups.size()=1200
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - end of _write_log_and_missing
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd - final snapset 9=[]:{} in 1:a1bf3231:::rbd_data.1041a505820.00000000000000e2:head
DEBUG 2023-09-15 15:17:35,472 [shard 0] osd -  pg_epoch 215 pg[1.5( v 215'3545 (15'1200,215'3545] local-lis/les=214/215 n=0 ec=11/11 lis/c=214/123 les/c/f=215/124/0 sis=214) [2,1] r=0 lpr=214 pi=[123,214)/1 luod=215'3544 lua=213'3048 crt=215'3546 lcod 215'3543 mlcod 215'3543 active+undersized  ReplicatedBackend::_submit_transaction: object 1:a1bf3231:::rbd_data.1041a505820.00000000000000e2:head, osdop_param(req_id=(client.4298.0:39146), at_version=(215'3546), pg_trim_to=(15'1200), min_last_complete_ondisk=(215'3543), last_complete=(215'3544), user_at_version=(3545), user_modify=(0), clean_regions=(clean_offsets: [], clean_omap: 1, new_object: 0), )

3. After peering OSD2 decides that there's no need to further move pg_trim_to of PG 1.5 forward, leaving pg_trim_to at 0'0

DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 216'3597 (15'1200,216'3597] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 crt=216'3598 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::calc_trim_to_aggressive approx pg log length =  2397
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 216'3597 (15'1200,216'3597] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 crt=216'3598 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::calc_trim_to_aggressive num_to_trim =  54
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 216'3597 (15'1200,216'3597] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 crt=216'3598 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::append_log append_log log((15'1200,216'3597], crt=216'3598) {221'3598 (216'3598) modify   1:a01515da:::rbd_data.1041a505820.00000000000000c5:head by client.4304.0:601 2023-09-15T15:19:51.738268+0800 0 ObjectCleanRegions clean_offsets: [0~3710976,3715072~18446744073705836543], clean_omap: 1, new_object: 0}
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 221'3598 (15'1200,221'3598] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 luod=216'3597 lua=216'3597 crt=216'3598 lcod 216'3597 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::add_log_entry add_log_entry 221'3598 (216'3598) modify   1:a01515da:::rbd_data.1041a505820.00000000000000c5:head by client.4304.0:601 2023-09-15T15:19:51.738268+0800 0 ObjectCleanRegions clean_offsets: [0~3710976,3715072~18446744073705836543], clean_omap: 1, new_object: 0
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 221'3598 (15'1200,221'3598] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 luod=216'3597 lua=216'3597 crt=221'3599 lcod 216'3597 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::append_log approx pg log length =  2398
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 221'3598 (15'1200,221'3598] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 luod=216'3597 lua=216'3597 crt=221'3599 lcod 216'3597 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::append_log dups pg log length =  1200
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 221'3598 (15'1200,221'3598] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 luod=216'3597 lua=216'3597 crt=221'3599 lcod 216'3597 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped PeeringState::append_log transaction_applied = 1
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - trim proposed trim_to = 0'0
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 221'3598, trimmed: , trimmed_dups: , clear_divergent_priors: 0
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - _write_log_and_missing clearing up to 0'0 dirty_to_dups=0'0 dirty_from_dups=4294967295'18446744073709551615 write_from_dups=4294967295'18446744073709551615 trimmed_dups.size()=0
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - _write_log_and_missing going to encode log.dups.size()=1200
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - _write_log_and_missing 1st round encoded log.dups.size()=1200
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - _write_log_and_missing 2st round encoded log.dups.size()=1200
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - end of _write_log_and_missing
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd - final snapset 10=[]:{10: [10]} in 1:a01515da:::rbd_data.1041a505820.00000000000000c5:head
DEBUG 2023-09-15 15:19:52,130 [shard 0] osd -  pg_epoch 221 pg[1.5( v 221'3598 (15'1200,221'3598] local-lis/les=218/219 n=3 ec=11/11 lis/c=218/123 les/c/f=219/124/0 sis=218) [2,0,1]/[2,1] async=[0] r=0 lpr=218 pi=[123,218)/1 luod=216'3597 lua=216'3597 crt=221'3599 lcod 216'3597 mlcod 0'0 active+recovery_wait+undersized+degraded+remapped  ReplicatedBackend::_submit_transaction: object 1:a01515da:::rbd_data.1041a505820.00000000000000c5:head, osdop_param(req_id=(client.4304.0:601), at_version=(221'3599), pg_trim_to=(0'0), min_last_complete_ondisk=(0'0), last_complete=(216'3597), user_at_version=(3598), user_modify=(0), clean_regions=(clean_offsets: [], clean_omap: 1, new_object: 0), )

4. OSD0 goes down at 15:23:39 and up at 15:31:31:

DEBUG 2023-09-15 15:23:39,147 [shard 0] osd - end of _write_log_and_missing
DEBUG 2023-09-15 15:23:39,147 [shard 0] osd - PG::handle_rep_op: do_transaction...
DEBUG 2023-09-15 15:23:39,149 [shard 0] osd -  pg_epoch 294 pg[1.27( v 294'2852 (15'500,294'2852] local-lis/les=269/270 n=8 ec=11/11 lis/c=269/269 les/c/f=270/270/0 sis=269) [1,2,0] r=2 lpr=269 luod=0'0 lua=240'2634 crt=294'2853 lcod 294'2851 mlcod 294'2853 active PeeringState::update_last_complete_ondisk updating last_complete_ondisk to: 294'2852
DEBUG 2023-09-15 15:23:39,155 [shard 0] osd -  pg_epoch 294 pg[1.4( v 294'2369 (0'0,294'2369] local-lis/les=282/283 n=30 ec=11/11 lis/c=282/282 les/c/f=283/283/0 sis=282) [1,0,2] r=1 lpr=282 luod=0'0 lua=228'2355 crt=294'2370 lcod 294'2367 mlcod 294'2370 active PeeringState::update_last_complete_ondisk updating last_complete_ondisk to: 294'2369
DEBUG 2023-09-15 15:31:31,706 [shard 0] osd - open_all_classes
DEBUG 2023-09-15 15:31:31,706 [shard 0] osd - open_all_classes found timeindex
DEBUG 2023-09-15 15:31:31,706 [shard 0] osd - _get_class adding new class name timeindex 0x7f0ad50d1dc0
DEBUG 2023-09-15 15:31:31,706 [shard 0] osd - _load_class timeindex from /home/xuxuehan/src/ceph/build/lib/libcls_timeindex.so

5. OSD0 found PG 1.5's log bound mismatch:

INFO  2023-09-15 15:31:42,277 [shard 0] osd - PerShardState::start_operation, peering_event(id=255, detail=PeeringEvent(from=0 pgid=1.5 sent=294 requested=294 evt=epoch_sent: 294 epoch_requested: 294 Initialize))
DEBUG 2023-09-15 15:31:42,277 [shard 0] osd - 0x0 LocalPeeringEvent::start: peering_event(id=255, detail=PeeringEvent(from=0 pgid=1.5 sent=294 requested=294 evt=epoch_sent: 294 epoch_requested: 294 Initialize)): start
DEBUG 2023-09-15 15:31:42,277 [shard 0] osd - 0x0 PeeringEvent<T>::with_pg: start
DEBUG 2023-09-15 15:31:42,277 [shard 0] osd - 0x7f0ac6f05a90 PeeringEvent<T>::with_pg: 0x7f0ac6f05a90 peering_event(id=255, detail=PeeringEvent(from=0 pgid=1.5 sent=294 requested=294 evt=epoch_sent: 294 epoch_requested: 294 Initialize)): pg present
INFO  2023-09-15 15:31:42,277 [shard 0] osd - load_pgs: loaded  pg_epoch 294 pg[1.5( v 294'3631 (15'1200,294'3631] local-lis/les=255/256 n=3 ec=11/11 lis/c=255/255 les/c/f=256/256/0 sis=255) [2,0,1] r=1 lpr=0 (log bound mismatch, actual=[15'701,294'3631]) crt=294'3631 lcod 0'0 mlcod 0'0 unknown
ERROR 2023-09-15 15:31:42,277 [shard 0] none - Falling back to public interface

6. OSD0 fails due to the above error:

ERROR 2023-09-15 15:31:48,084 [shard 0] none - /home/xuxuehan/src/ceph/src/osd/osd_types.cc:5256 : In function 'void _handle_dups(crimson::common::CephContext *, pg_log_t &, const pg_log_t &, unsigned int)', ceph_assert(%s)
i->version > other.tail

Actions #1

Updated by Xuehan Xu 7 months ago

  • Description updated (diff)
Actions #2

Updated by Xuehan Xu 7 months ago

  • Pull request ID set to 53520
Actions #3

Updated by Samuel Just 7 months ago

I'm not sure step 5 in your chain above is correct. MOSDRepOp (messages/MOSDRepOp.h) doesn't actually contain an info, and receiving a repop doesn't cause the replica to unconditionally update its info. The call sequence on the replica side in classic would be:

- ReplicatedBackend::do_repop
- PrimaryLogPG::log_operation
- PeeringState::append_log

I don't see a way for PeeringState::append_log to update pg_info_t::log_tail except via the call to PGLog::trim.

Actions #4

Updated by Xuehan Xu 7 months ago

Samuel Just wrote:

I'm not sure step 5 in your chain above is correct. MOSDRepOp (messages/MOSDRepOp.h) doesn't actually contain an info, and receiving a repop doesn't cause the replica to unconditionally update its info. The call sequence on the replica side in classic would be:

- ReplicatedBackend::do_repop
- PrimaryLogPG::log_operation
- PeeringState::append_log

I don't see a way for PeeringState::append_log to update pg_info_t::log_tail except via the call to PGLog::trim.

repop doesn't update replicas' in-memory pg info, but the primary OSD encapsulates the dirty pg info in the transaction(PG::prepare_write()) which is further put into the repop and updates the on-disk pg info of replicas, so when the replica OSD reboots, it will read an inconsistent pg info from disk. Am I right?

Actions #5

Updated by Samuel Just 7 months ago

I suspect the problem is with the message sent by osd.2 to activate osd.0 (likely a MOSDPGLog, might be a MOSDPGInfo) in the interval starting with epoch 219 rather than with a subsequent MOSDRepOp. Do you have the peering logs from those two osds during that interval?

Actions #6

Updated by Samuel Just 7 months ago

No. The portion of the transaction that gets sent to replicas shouldn't include the updates to pg info or to the log. See ReplicatedBackend::submit_transaction -- issue_op gets called prior to log_operation. This is quite important for several reasons:
- replicas can have logs longer than the primary
- replicas might be backfilling and therefore have several pg_info_t members different from the primary
- we don't update the in-memory copy from that transaction, so it would be really bad if we were updating the ondisk version inconsistently

If crimson is actually sending the metadata updates as part of the transaction, that's going to be the problem.

Actions #7

Updated by Xuehan Xu 7 months ago

Samuel Just wrote:

No. The portion of the transaction that gets sent to replicas shouldn't include the updates to pg info or to the log. See ReplicatedBackend::submit_transaction -- issue_op gets called prior to log_operation. This is quite important for several reasons:
- replicas can have logs longer than the primary
- replicas might be backfilling and therefore have several pg_info_t members different from the primary
- we don't update the in-memory copy from that transaction, so it would be really bad if we were updating the ondisk version inconsistently

If crimson is actually sending the metadata updates as part of the transaction, that's going to be the problem.

I see, it looks to me that crimson does send the metadata updates as part of the transaction, will correct it, thanks:-)

Actions #8

Updated by Xuehan Xu 7 months ago

  • Pull request ID changed from 53520 to 53521
Actions #9

Updated by Matan Breizman 2 months ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF