Actions
Bug #47133
closedthrash test failed in PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&)
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
DEBUG 2020-08-25 13:21:55,885 [shard 0] filestore - do_transaction DEBUG 2020-08-25 13:21:55,886 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] GOT AckFrame: seq=692 DEBUG 2020-08-25 13:21:55,886 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #682 === osd_repop(client.4229.0:2380 3.2 e22/21) v3 (112) DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - replicated_request(id=1134, detail=RepRequest(from=0 req=osd_repop(client.4229.0:2380 3.2 e22/21 3:49de99f6:::smithi14420596-532:head v 22'99, mlcod=22'99) v3 )) start DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - pg_epoch 22 pg[3.2( v 22'98 (0'0,22'98] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/ 1 luod=0'0 lua=19'97 crt=22'98 mlcod 22'98 active pruub 45.379686226s@ append_log log((0'0,22'98], crt=22'98) {22'99 (22'98) modify 3:49de99f6:::smithi14420596-532:head by client.4229.0:2380 2020-08-25T 13:21:54.095732+0000 0 ObjectCleanRegions clean_offsets: [0~1956409,2512586~18446744073707039029], clean_omap: 1, new_object: 0} DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/ 1 luod=0'0 lua=19'97 crt=22'98 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ add_log_entry 22'99 (22'98) modify 3:49de99f6:::smithi14420596-532:head by client.4229.0:2380 2020-08-25T13:21:54.095732 +0000 0 ObjectCleanRegions clean_offsets: [0~1956409,2512586~18446744073707039029], clean_omap: 1, new_object: 0 DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/ 1 luod=0'0 lua=19'97 crt=22'99 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ append_log approx pg log length = 99 DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/ 1 luod=0'0 lua=19'97 crt=22'99 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ append_log transaction_applied = 1 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - trim proposed trim_to = 0'0 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 22'99, trimmed: , trimmed_dups: , clear_divergent_priors: 0 DEBUG 2020-08-25 13:21:55,887 [shard 0] filestore - do_transaction DEBUG 2020-08-25 13:21:55,887 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #683 === MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 (107) DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - background_recovery_sub(id=0, detail=MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3): start DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - handle_push_reply: MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - _handle_push_reply, soid 3:eb464a4a:::smithi14420596-514:head, from 0 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - on_peer_recover: 3:eb464a4a:::smithi14420596-514:head, 19'98 on 0 ERROR 2020-08-25 13:21:55,887 [shard 0] none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-4670-ge307b9aefb4/rpm/el8/BUILD/ceph-16.0.0-4670-ge307b9aefb4/src/osd/osd_types.h:4816 : In function 'void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]', ceph_assert(%s) p->second.need <= v || p->second.is_delete() Aborting on shard 0. Backtrace: Backtrace: [Backtrace #0] void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/util/backtrace.hh:59 non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:730 (inlined by) print_with_backtrace at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:751 print_with_backtrace at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:758 non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3517 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3499 (inlined by) _FUN at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3495 .annobin_funlockfile.c_end at funlockfile.c:? __GI___libc_sigaction at :? __isoc99_fscanf.cold.0 at isoc99_fscanf.c:? ceph::__ceph_assert_fail(char const*, char const*, int, char const*) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/assert.cc:27 ceph::__ceph_assert_fail(ceph::assert_data const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/assert.cc:14 PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/osd/osd_types.h:4816 (inlined by) PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/osd/PeeringState.cc:3948 PGRecovery::on_peer_recover(pg_shard_t, hobject_t const&, ObjectRecoveryInfo const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/pg_recovery.cc:389 operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:845 (inlined by) ReplicatedRecoveryBackend::_handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:848 operator()<PushOp> at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:877 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696 (inlined by) do_with_impl<PushOp, ReplicatedRecoveryBackend::handle_push_reply(Ref<MOSDPGPushReply>)::<lambda(auto:125&)> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:106 ReplicatedRecoveryBackend::handle_push_reply(boost::intrusive_ptr<MOSDPGPushReply>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:141 (inlined by) ReplicatedRecoveryBackend::handle_push_reply(boost::intrusive_ptr<MOSDPGPushReply>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:878 ReplicatedRecoveryBackend::handle_recovery_op(boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:1091 operator()<boost::intrusive_ptr<crimson::osd::PG>, boost::intrusive_ptr<crimson::osd::RecoverySubRequest> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:24 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696 (inlined by) do_with_impl<boost::intrusive_ptr<crimson::osd::PG>, const boost::intrusive_ptr<crimson::osd::RecoverySubRequest>, crimson::osd::RecoverySubRequest::start()::<lambda(Ref<crimson::osd::PG>)>::<lambda(auto:84&, auto:85&)> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:106 crimson::osd::RecoverySubRequest::start() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:141 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:25 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490 (inlined by) crimson::osd::RecoverySubRequest::start() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:26 crimson::osd::OSD::handle_recovery_subreq(crimson::net::Connection*, boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/shard_services.h:100 (inlined by) crimson::osd::OSD::handle_recovery_subreq(crimson::net::Connection*, boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:1152 operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:628 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/gate.hh:126 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/gate.hh:144 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/gated.h:37 (inlined by) crimson::osd::OSD::ms_dispatch(crimson::net::Connection*, boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:655 operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/chained_dispatchers.cc:9 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:611 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:668 (inlined by) ChainedDispatchers::ms_dispatch(crimson::net::Connection*, boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/chained_dispatchers.cc:10 operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1929 crimson::net::ProtocolV2::read_message(utime_t) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:95 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1684 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1694 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1995 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490 (inlined by) crimson::net::ProtocolV2::read_message(utime_t) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1930 operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1978 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1979 crimson::net::ProtocolV2::read_message(utime_t) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:2011 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:561 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:262 seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2192 non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2608 non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2591 (inlined by) seastar::reactor::run() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2763 seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/app-template.cc:207 main at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/main.cc:147 check_one_fd at check_fds.c:? _start at ??:?
/a/kchai-2020-08-25_13:01:58-crimson-rados:thrash-wip-47031-kefu-distro-basic-smithi
Files
Updated by Kefu Chai over 3 years ago
void got(const hobject_t& oid, eversion_t v) {
std::map<hobject_t, item>::iterator p = missing.find(oid);
ceph_assert(p != missing.end());
ceph_assert(p->second.need <= v || p->second.is_delete()); // <<<<
got(p);
}
void PeeringState::on_peer_recover(
pg_shard_t peer,
const hobject_t &soid,
const eversion_t &version)
{
pl->publish_stats_to_osd();
// done!
peer_missing[peer].got(soid, version); /// <<<
missing_loc.add_location(soid, peer);
}
Updated by Kefu Chai over 3 years ago
in ReplicatedRecoveryBackend::prep_push()
, we have
TRACE 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing {3:eb464a4a:::smithi14420596-514:head=20'99(19'98) flags = none clean_offsets: [0~1936439,2575948~18446744073706975667], clean_omap: 1, new_object: 0}
20'99 (19'98)
so osd.3 didn't have 20'99, what it had was 19'98.
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head, 20'99
in ReplicatedRecoveryBackend::build_push_op()
, we have
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - build_push_op 3:eb464a4a:::smithi14420596-514:head @19'98
so we actually pushed a stale version to peer while expect a response of newer version.
Updated by Kefu Chai over 3 years ago
- File ceph-osd.3.log.bz2 ceph-osd.3.log.bz2 added
Updated by Kefu Chai over 3 years ago
DEBUG 2020-08-25 13:21:53,834 [shard 0] ms - [osd.3(client) v2:172.21.15.168:6804/18752 >> client.4229 172.21.15.144:0/641626027@36694] <== #555 === osd_op(client.4229.0:2301 3.7 3.525262d7 (undecoded) on disk+write+known_if_redirected e16) v8 (42) DEBUG 2020-08-25 13:21:53,834 [shard 0] osd - client_request(id=556, detail=osd_op(client.4229.0:2301 3.7 3.525262d7 (undecoded) ondisk+write+known_if_redirected e16) v8): start DEBUG 2020-08-25 13:21:53,834 [shard 0] osd - get_or_load_head_obc: cache miss on 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:53,834 [shard 0] filestore - get_attrs //// start serving client request, reading OI DEBUG 2020-08-25 13:21:53,835 [shard 0] filestore - do_transaction DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - load_metadata: object 3:eb464a4a:::smithi14420596-514:head doesn't exist, returning empty metadata DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - get_or_load_head_obc: loaded obs 3:eb464a4a:::smithi14420596-514:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - get_or_load_head_obc: returning obc 3:eb464a4a:::smithi14420596-514:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:eb464a4a:::smithi14420596-514:head //// seems the object didn't exist by then DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op write DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - handling op write on object 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op stat DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - handling op stat on object 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - stat os.oi.size=1420379, os.oi.mtime=0.000000 DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat out=16b} snapc 0={} ondisk+write+known_if _redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head all operations successful DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat out=16b} snapc 0={} ondisk+write+known_if _redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head submitting txn ///// submitting transaction to object store DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - pg_epoch 19 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [3,1,0] r=0 lpr=14 luod=19'97 lua=0'0 crt=19'98 lcod 19'97 mlcod 19'97 active+clean append_log transaction_applied = 1 ///// local transaction applied DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] connect to existing DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] --> #645 === osd_repop(client.4229.0:2301 3.7 e19/14 3:eb464a4a:::smith i14420596-514:head v 19'98, mlcod=19'98) v3 (112) DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] connect to existing DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] --> #779 === osd_repop(client.4229.0:2301 3.7 e19/14 3:eb464a4a:::smith i14420596-514:head v 19'98, mlcod=19'98) v3 (112) ///// sending repop to replica osds: osd.0 and osd.1 DEBUG 2020-08-25 13:21:53,850 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #640 === osd_repop_reply(client.4229.0:2301 3.7 e19/14) v2 (113) ///// osd.0 replied DEBUG 2020-08-25 13:21:53,857 [shard 0] osd - pg_advance_map(id=111, detail=PGAdvanceMap(pg=3.7 from=19 to=20)): start ... INFO 2020-08-25 13:21:53,864 [shard 0] osd - Entering state: Started/Stray ... DEBUG 2020-08-25 13:21:53,864 [shard 0] osd - pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.038597247s) [0] r=-1 lpr=20 pi=[14,20)/1 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ activate_map ... DEBUG 2020-08-25 13:21:53,864 [shard 0] osd - pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ enter Started/Stray ... DEBUG 2020-08-25 13:21:53,864 [shard 0] osd - pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ activate_map: Not dirtying info: last_persisted is 19 while current is 20 /////// handling a new osdmap /////// pg 3.7 epoch advanced to 20, and osd.3 changed from the primary to a stray osd of that PG. DEBUG 2020-08-25 13:21:53,865 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] <== #785 === osd_repop_reply(client.4229.0:2301 3.7 e19/14) v2 (113) ////// osd.1 replied DEBUG 2020-08-25 13:21:53,867 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op write DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - handling op write on object 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head all operations successful DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head submitting txn /////// another write op in the request DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - pg_epoch 20 pg[3.7( v 20'99 (0'0,20'99] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1 crt=20'99 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ append_log transaction_applied = 1 ... DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 20'99, trimmed: , trimmed_dups: , clear_divergent_priors: 0 ///// transaction submitted. TRACE 2020-08-25 13:21:53,868 [shard 0] osd - process_op: dropping misdirected op DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - operation abort, up primary changed //// started to drop ops DEBUG 2020-08-25 13:21:54,886 [shard 0] osd - merge_log point (usually last shared) is 19'98 (0'0) modify 3:eb464a4a:::smithi14420596-514:head by client.4229.0:2301 2020-08-25T13:21:53.832878+0000 0 ObjectCleanRegions clean_offsets: [0~621991,1420379~18446744073708131236], clean_omap: 1, new_object: 0 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - handle_peering_op on 3.7 from 0 ///// osd.0 was the new primary. INFO 2020-08-25 13:21:55,872 [shard 0] osd - Entering state: Started/Primary/Active/Recovering ... DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - do_peering_event handling epoch_sent: 22 epoch_requested: 22 RemoteRecoveryReserved for pg: 3.7 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - pg_epoch 22 pg[3.7( v 20'99 (0'0,20'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.026434876s) [3,0] r=0 lpr=21 pi=[14,21)/ 1 crt=20'99 lcod 19'97 mlcod 0'0 active+recovery_wait+undersized+degraded+wait pruub 45.350179339s@1 exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.001189 2 0.000088 //// start the recovery DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - background_recovery(id=0, detail=BackgroundRecovery(3.7)): start DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing 1 objects TRACE 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing {3:eb464a4a:::smithi14420596-514:head=20'99(19'98) flags = none clean_offsets: [0~1936439,2575948~18446744073706975667], clean_omap: 1, new_object: 0} /// i should push the missing object to osd.0 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: recover_object_replicas(3:eb464a4a:::smithi14420596-514:head) DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - recover_object: 3:eb464a4a:::smithi14420596-514:head, 20'99 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - get_or_load_head_obc: found 3:eb464a4a:::smithi14420596-514:head in cache DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - recover_object: loaded obc: 3:eb464a4a:::smithi14420596-514:head DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head, 20'99 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - calc_head_subsets 3:eb464a4a:::smithi14420596-514:head data_subset [1936439~639509] DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head to 0 DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - build_push_op 3:eb464a4a:::smithi14420596-514:head @19'98 ... DEBUG 2020-08-25 13:21:55,872 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] --> #687 === MOSDPGPush(3.7 22/21 {PushOp(3:eb464a4a:::smithi14420596-5 14:head, version: 19'98, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(3:eb464a4a:::smithi14420596-514:head@19'98, size: 25 75948, copy_subset: [1936439~639509], clone_subset: {}, snapset: 0={}:{}, object_exist: 1), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:2575948, data_complete:true, omap_recovered_to: , omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true, error:false))}) v4 (105) ///// and i pushed 19'98, even the latest version was 20'99 DEBUG 2020-08-25 13:21:55,887 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #683 === MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 (107) /// and osd.0 replied. hmmm, DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - background_recovery_sub(id=0, detail=MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3): start DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - handle_push_reply: MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - _handle_push_reply, soid 3:eb464a4a:::smithi14420596-514:head, from 0 DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - on_peer_recover: 3:eb464a4a:::smithi14420596-514:head, 19'98 on 0 //// the version is older than 20'99!!!
- process a request targeting PG 3.7
- handle the first write op in it, and submitted the transaction for the op
- advance osdmap, and osd.3 transited from primary to a stray OSD of PG 3.7. eversion was 20'99
- handle the second write op in it, and submitted the transaction for the op. OI 's version must have been 20'99.
- osd.0 is now the new primary, and it starts the recovery
- it misses smithi14420596-514@20'99, so let's push the missing object to it
- i only have smithi14420596-514@19'98, but anyway....
- osd.0 replied with the version of 19'98, and it's less than 20'99!!!
Updated by Kefu Chai over 3 years ago
- Status changed from New to Fix Under Review
- Assignee set to Xuehan Xu
- Pull request ID set to 36896
Updated by Kefu Chai over 3 years ago
- Status changed from Fix Under Review to Resolved
Actions