Project

General

Profile

Actions

Bug #47133

closed

thrash test failed in PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&)

Added by Kefu Chai over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

DEBUG 2020-08-25 13:21:55,885 [shard 0] filestore - do_transaction
DEBUG 2020-08-25 13:21:55,886 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] GOT AckFrame: seq=692
DEBUG 2020-08-25 13:21:55,886 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #682 === osd_repop(client.4229.0:2380 3.2 e22/21) v3 (112)
DEBUG 2020-08-25 13:21:55,886 [shard 0] osd - replicated_request(id=1134, detail=RepRequest(from=0 req=osd_repop(client.4229.0:2380 3.2 e22/21 3:49de99f6:::smithi14420596-532:head v 22'99, mlcod=22'99) v3
)) start
DEBUG 2020-08-25 13:21:55,886 [shard 0] osd -  pg_epoch 22 pg[3.2( v 22'98 (0'0,22'98] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/
1 luod=0'0 lua=19'97 crt=22'98 mlcod 22'98 active pruub 45.379686226s@ append_log log((0'0,22'98], crt=22'98) {22'99 (22'98) modify   3:49de99f6:::smithi14420596-532:head by client.4229.0:2380 2020-08-25T
13:21:54.095732+0000 0 ObjectCleanRegions clean_offsets: [0~1956409,2512586~18446744073707039029], clean_omap: 1, new_object: 0}
DEBUG 2020-08-25 13:21:55,886 [shard 0] osd -  pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/
1 luod=0'0 lua=19'97 crt=22'98 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ add_log_entry 22'99 (22'98) modify   3:49de99f6:::smithi14420596-532:head by client.4229.0:2380 2020-08-25T13:21:54.095732
+0000 0 ObjectCleanRegions clean_offsets: [0~1956409,2512586~18446744073707039029], clean_omap: 1, new_object: 0
DEBUG 2020-08-25 13:21:55,886 [shard 0] osd -  pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/
1 luod=0'0 lua=19'97 crt=22'99 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ append_log approx pg log length =  99
DEBUG 2020-08-25 13:21:55,886 [shard 0] osd -  pg_epoch 22 pg[3.2( v 22'99 (0'0,22'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.052338892s) [0,3] r=1 lpr=21 pi=[14,21)/
1 luod=0'0 lua=19'97 crt=22'99 lcod 22'98 mlcod 22'98 active pruub 45.379686226s@ append_log transaction_applied = 1
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - trim proposed trim_to = 0'0
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 22'99, trimmed: , trimmed_dups: , clear_divergent_priors: 0
DEBUG 2020-08-25 13:21:55,887 [shard 0] filestore - do_transaction
DEBUG 2020-08-25 13:21:55,887 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #683 === MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 (107)
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - background_recovery_sub(id=0, detail=MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3): start
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - handle_push_reply: MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - _handle_push_reply, soid 3:eb464a4a:::smithi14420596-514:head, from 0
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - on_peer_recover: 3:eb464a4a:::smithi14420596-514:head, 19'98 on 0
ERROR 2020-08-25 13:21:55,887 [shard 0] none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-4670-ge307b9aefb4/rpm/el8/BUILD/ceph-16.0.0-4670-ge307b9aefb4/src/osd/osd_types.h:4816 : In function 'void pg_missing_set<TrackChanges>::got(const hobject_t&, eversion_t) [with bool TrackChanges = false]', ceph_assert(%s)
p->second.need <= v || p->second.is_delete()
Aborting on shard 0.
Backtrace:

Backtrace:
[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/util/backtrace.hh:59
non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:730
 (inlined by) print_with_backtrace at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:751
print_with_backtrace at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:758
non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3517
 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3499
 (inlined by) _FUN at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:3495
.annobin_funlockfile.c_end at funlockfile.c:?
__GI___libc_sigaction at :?
__isoc99_fscanf.cold.0 at isoc99_fscanf.c:?
ceph::__ceph_assert_fail(char const*, char const*, int, char const*) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/assert.cc:27
ceph::__ceph_assert_fail(ceph::assert_data const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/assert.cc:14
PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/osd/osd_types.h:4816
 (inlined by) PeeringState::on_peer_recover(pg_shard_t, hobject_t const&, eversion_t const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/osd/PeeringState.cc:3948
PGRecovery::on_peer_recover(pg_shard_t, hobject_t const&, ObjectRecoveryInfo const&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/pg_recovery.cc:389
operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:845
 (inlined by) ReplicatedRecoveryBackend::_handle_push_reply(pg_shard_t, PushReplyOp const&, PushOp*) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:848
operator()<PushOp> at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:877
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696
 (inlined by) do_with_impl<PushOp, ReplicatedRecoveryBackend::handle_push_reply(Ref<MOSDPGPushReply>)::<lambda(auto:125&)> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:106
ReplicatedRecoveryBackend::handle_push_reply(boost::intrusive_ptr<MOSDPGPushReply>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:141
 (inlined by) ReplicatedRecoveryBackend::handle_push_reply(boost::intrusive_ptr<MOSDPGPushReply>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:878
ReplicatedRecoveryBackend::handle_recovery_op(boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/replicated_recovery_backend.cc:1091
operator()<boost::intrusive_ptr<crimson::osd::PG>, boost::intrusive_ptr<crimson::osd::RecoverySubRequest> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:24
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696
 (inlined by) do_with_impl<boost::intrusive_ptr<crimson::osd::PG>, const boost::intrusive_ptr<crimson::osd::RecoverySubRequest>, crimson::osd::RecoverySubRequest::start()::<lambda(Ref<crimson::osd::PG>)>::<lambda(auto:84&, auto:85&)> > at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:106
crimson::osd::RecoverySubRequest::start() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/do_with.hh:141
 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:25
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490
 (inlined by) crimson::osd::RecoverySubRequest::start() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd_operations/recovery_subrequest.cc:26
crimson::osd::OSD::handle_recovery_subreq(crimson::net::Connection*, boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/shard_services.h:100
 (inlined by) crimson::osd::OSD::handle_recovery_subreq(crimson::net::Connection*, boost::intrusive_ptr<MOSDFastDispatchOp>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:1152
operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:628
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/gate.hh:126
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/gate.hh:144
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/common/gated.h:37
 (inlined by) crimson::osd::OSD::ms_dispatch(crimson::net::Connection*, boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/osd.cc:655
operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/chained_dispatchers.cc:9
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:611
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:668
 (inlined by) ChainedDispatchers::ms_dispatch(crimson::net::Connection*, boost::intrusive_ptr<Message>) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/chained_dispatchers.cc:10
operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1929
crimson::net::ProtocolV2::read_message(utime_t) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:95
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1684
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1694
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1995
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490
 (inlined by) crimson::net::ProtocolV2::read_message(utime_t) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1930
operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1978
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490
 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:1979
crimson::net::ProtocolV2::read_message(utime_t) at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:60
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/bits/invoke.h:96
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1685
 (inlined by) ?? at /opt/rh/gcc-toolset-9/root/usr/include/c++/9/tuple:1696
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2028
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1568
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1214
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:1490
 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/net/ProtocolV2.cc:2011
 (inlined by) operator() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:561
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2045
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future.hh:2129
 (inlined by) ?? at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/include/seastar/core/future-util.hh:262
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2192
non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2608
non-virtual thunk to std::_Nested_exception<std::runtime_error>::~_Nested_exception() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2591
 (inlined by) seastar::reactor::run() at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/reactor.cc:2763
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/seastar/src/core/app-template.cc:207
main at /usr/src/debug/ceph-16.0.0-4670.ge307b9aefb4.el8.x86_64/src/crimson/osd/main.cc:147
check_one_fd at check_fds.c:?
_start at ??:?

/a/kchai-2020-08-25_13:01:58-crimson-rados:thrash-wip-47031-kefu-distro-basic-smithi


Files

ceph-osd.3.log.bz2 (319 KB) ceph-osd.3.log.bz2 Kefu Chai, 08/26/2020 04:09 PM
Actions #1

Updated by Kefu Chai over 3 years ago

  void got(const hobject_t& oid, eversion_t v) {
    std::map<hobject_t, item>::iterator p = missing.find(oid);
    ceph_assert(p != missing.end());
    ceph_assert(p->second.need <= v || p->second.is_delete()); // <<<<
    got(p);
  }
void PeeringState::on_peer_recover(
  pg_shard_t peer,
  const hobject_t &soid,
  const eversion_t &version)
{
  pl->publish_stats_to_osd();
  // done!
  peer_missing[peer].got(soid, version); /// <<<
  missing_loc.add_location(soid, peer);
}
Actions #2

Updated by Kefu Chai over 3 years ago

in ReplicatedRecoveryBackend::prep_push(), we have

TRACE 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing {3:eb464a4a:::smithi14420596-514:head=20'99(19'98) flags = none clean_offsets: [0~1936439,2575948~18446744073706975667], clean_omap: 1, new_object: 0}

20'99 (19'98)

so osd.3 didn't have 20'99, what it had was 19'98.

DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head, 20'99

in ReplicatedRecoveryBackend::build_push_op(), we have

DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - build_push_op 3:eb464a4a:::smithi14420596-514:head @19'98

so we actually pushed a stale version to peer while expect a response of newer version.

Actions #4

Updated by Kefu Chai over 3 years ago

DEBUG 2020-08-25 13:21:53,834 [shard 0] ms - [osd.3(client) v2:172.21.15.168:6804/18752 >> client.4229 172.21.15.144:0/641626027@36694] <== #555 === osd_op(client.4229.0:2301 3.7 3.525262d7 (undecoded) on
disk+write+known_if_redirected e16) v8 (42)
DEBUG 2020-08-25 13:21:53,834 [shard 0] osd - client_request(id=556, detail=osd_op(client.4229.0:2301 3.7 3.525262d7 (undecoded) ondisk+write+known_if_redirected e16) v8): start
DEBUG 2020-08-25 13:21:53,834 [shard 0] osd - get_or_load_head_obc: cache miss on 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:53,834 [shard 0] filestore - get_attrs
//// start serving client request, reading OI

DEBUG 2020-08-25 13:21:53,835 [shard 0] filestore - do_transaction
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - load_metadata: object 3:eb464a4a:::smithi14420596-514:head doesn't exist, returning empty metadata

DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - get_or_load_head_obc: loaded obs 3:eb464a4a:::smithi14420596-514:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - get_or_load_head_obc: returning obc 3:eb464a4a:::smithi14420596-514:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) for 3:eb464a4a:::smithi14420596-514:head

//// seems the object didn't exist by then

DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op write
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - handling op write on object 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op stat
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - handling op stat on object 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - stat os.oi.size=1420379, os.oi.mtime=0.000000
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat out=16b} snapc 0={} ondisk+write+known_if
_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head all operations successful
DEBUG 2020-08-25 13:21:53,838 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2301 3.7 3:eb464a4a:::smithi14420596-514:head {write 621991~798388 in=798388b, stat out=16b} snapc 0={} ondisk+write+known_if
_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head submitting txn

///// submitting transaction to object store

DEBUG 2020-08-25 13:21:53,838 [shard 0] osd -  pg_epoch 19 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [3,1,0] r=0 lpr=14 luod=19'97 lua=0'0 crt=19'98
lcod 19'97 mlcod 19'97 active+clean append_log transaction_applied = 1

///// local transaction applied

DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] connect to existing
DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] --> #645 === osd_repop(client.4229.0:2301 3.7 e19/14 3:eb464a4a:::smith
i14420596-514:head v 19'98, mlcod=19'98) v3 (112)
DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] connect to existing
DEBUG 2020-08-25 13:21:53,839 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] --> #779 === osd_repop(client.4229.0:2301 3.7 e19/14 3:eb464a4a:::smith
i14420596-514:head v 19'98, mlcod=19'98) v3 (112)

///// sending repop to replica osds: osd.0 and osd.1

DEBUG 2020-08-25 13:21:53,850 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #640 === osd_repop_reply(client.4229.0:2301 3.7 e19/14) v2 (113)

///// osd.0 replied
DEBUG 2020-08-25 13:21:53,857 [shard 0] osd - pg_advance_map(id=111, detail=PGAdvanceMap(pg=3.7 from=19 to=20)): start
...
INFO  2020-08-25 13:21:53,864 [shard 0] osd - Entering state: Started/Stray
...
DEBUG 2020-08-25 13:21:53,864 [shard 0] osd -  pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.038597247s) [0] r=-1 lpr=20 pi=[14,20)/1
 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ activate_map
...
DEBUG 2020-08-25 13:21:53,864 [shard 0] osd -  pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1
 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ enter Started/Stray
...
DEBUG 2020-08-25 13:21:53,864 [shard 0] osd -  pg_epoch 20 pg[3.7( v 19'98 (0'0,19'98] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1
 crt=19'98 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ activate_map: Not dirtying info: last_persisted is 19 while current is 20

/////// handling a new osdmap
/////// pg 3.7 epoch advanced to 20, and osd.3 changed from the primary to a stray osd of that PG.

DEBUG 2020-08-25 13:21:53,865 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752 >> osd.1 v2:172.21.15.144:6803/18905@59528] <== #785 === osd_repop_reply(client.4229.0:2301 3.7 e19/14) v2 (113)

////// osd.1 replied

DEBUG 2020-08-25 13:21:53,867 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head - handling op write
DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - handling op write on object 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head all operations successful
DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - do_osd_ops: osd_op(client.4229.0:2302 3.7 3:eb464a4a:::smithi14420596-514:head {write 1936439~639509 in=639509b} snapc 0={} ondisk+write+known_if_redirected e16) v8 - object 3:eb464a4a:::smithi14420596-514:head submitting txn
/////// another write op in the request

DEBUG 2020-08-25 13:21:53,868 [shard 0] osd -  pg_epoch 20 pg[3.7( v 20'99 (0'0,20'99] local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=20 pruub=13.032501349s) [0] r=-1 lpr=20 pi=[14,20)/1 crt=20'99 lcod 19'97 mlcod 0'0 unknown NOTIFY pruub 45.350179339s@ append_log transaction_applied = 1
...
DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - write_log_and_missing with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 20'99, trimmed: , trimmed_dups: , clear_divergent_priors: 0
///// transaction submitted.

TRACE 2020-08-25 13:21:53,868 [shard 0] osd - process_op: dropping misdirected op
DEBUG 2020-08-25 13:21:53,868 [shard 0] osd - operation abort, up primary changed

//// started to drop ops

DEBUG 2020-08-25 13:21:54,886 [shard 0] osd - merge_log point (usually last shared) is 19'98 (0'0) modify   3:eb464a4a:::smithi14420596-514:head by client.4229.0:2301 2020-08-25T13:21:53.832878+0000 0 ObjectCleanRegions clean_offsets: [0~621991,1420379~18446744073708131236], clean_omap: 1, new_object: 0

DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - handle_peering_op on 3.7 from 0
///// osd.0 was the new primary.

INFO  2020-08-25 13:21:55,872 [shard 0] osd - Entering state: Started/Primary/Active/Recovering
...
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - do_peering_event handling epoch_sent: 22 epoch_requested: 22 RemoteRecoveryReserved for pg: 3.7
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd -  pg_epoch 22 pg[3.7( v 20'99 (0'0,20'99] local-lis/les=21/22 n=0 ec=14/14 lis/c=21/14 les/c/f=22/15/0 sis=21 pruub=11.026434876s) [3,0] r=0 lpr=21 pi=[14,21)/
1 crt=20'99 lcod 19'97 mlcod 0'0 active+recovery_wait+undersized+degraded+wait pruub 45.350179339s@1 exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.001189 2 0.000088
////  start the recovery

DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - background_recovery(id=0, detail=BackgroundRecovery(3.7)): start
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing 1 objects
TRACE 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: peer osd.0 missing {3:eb464a4a:::smithi14420596-514:head=20'99(19'98) flags = none clean_offsets: [0~1936439,2575948~18446744073706975667], clean_omap: 1, new_object: 0}

/// i should push the missing object to osd.0

DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - start_replica_recovery_ops: recover_object_replicas(3:eb464a4a:::smithi14420596-514:head)
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - recover_object: 3:eb464a4a:::smithi14420596-514:head, 20'99
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - get_or_load_head_obc: found 3:eb464a4a:::smithi14420596-514:head in cache
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - recover_object: loaded obc: 3:eb464a4a:::smithi14420596-514:head
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head, 20'99
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - calc_head_subsets 3:eb464a4a:::smithi14420596-514:head data_subset [1936439~639509]
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - prep_push: 3:eb464a4a:::smithi14420596-514:head to 0
DEBUG 2020-08-25 13:21:55,872 [shard 0] osd - build_push_op 3:eb464a4a:::smithi14420596-514:head @19'98
...
DEBUG 2020-08-25 13:21:55,872 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] --> #687 === MOSDPGPush(3.7 22/21 {PushOp(3:eb464a4a:::smithi14420596-5
14:head, version: 19'98, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 1, recovery_info: ObjectRecoveryInfo(3:eb464a4a:::smithi14420596-514:head@19'98, size: 25
75948, copy_subset: [1936439~639509], clone_subset: {}, snapset: 0={}:{}, object_exist: 1), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:2575948, data_complete:true, omap_recovered_to:
, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:true, error:false))}) v4 (105)
///// and i pushed 19'98, even the latest version was 20'99

DEBUG 2020-08-25 13:21:55,887 [shard 0] ms - [osd.3(cluster) v2:172.21.15.168:6805/18752@62952 >> osd.0 v2:172.21.15.144:6801/18864] <== #683 === MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3 (107)
/// and osd.0 replied. hmmm, 

DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - background_recovery_sub(id=0, detail=MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3): start
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - handle_push_reply: MOSDPGPushReply(3.7 22/21 {PushReplyOp(3:eb464a4a:::smithi14420596-514:head)}) v3
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - _handle_push_reply, soid 3:eb464a4a:::smithi14420596-514:head, from 0
DEBUG 2020-08-25 13:21:55,887 [shard 0] osd - on_peer_recover: 3:eb464a4a:::smithi14420596-514:head, 19'98 on 0
//// the version is older than 20'99!!!
  1. process a request targeting PG 3.7
  2. handle the first write op in it, and submitted the transaction for the op
  3. advance osdmap, and osd.3 transited from primary to a stray OSD of PG 3.7. eversion was 20'99
  4. handle the second write op in it, and submitted the transaction for the op. OI 's version must have been 20'99.
  5. osd.0 is now the new primary, and it starts the recovery
  6. it misses smithi14420596-514@20'99, so let's push the missing object to it
  7. i only have smithi14420596-514@19'98, but anyway....
  8. osd.0 replied with the version of 19'98, and it's less than 20'99!!!
Actions #5

Updated by Kefu Chai over 3 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Xuehan Xu
  • Pull request ID set to 36896
Actions #6

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF