Actions
Bug #20551
closedLOST_REVERT assert during rados bench+thrash in ReplicatedBackend::prepare_pull()
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Backfill/Recovery
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From osd.0 in:
=[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] handle_message: 0x7fe09dab59c0 -9> 2017-07-06 22:26:22.198839 7fe060845700 20 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] do_op: op osd_op(client.4177.0:308 2.7 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head [read 0~1048576] snapc 0=[] RETRY=1 ondis k+retry+read+known_if_redirected e104) v8 -8> 2017-07-06 22:26:22.199584 7fe060845700 20 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] op_has_sufficient_caps pool=2 (unique_pool_0 ) owner=0 need_read_cap=1 need_write_cap=0 classes=[] -> yes -7> 2017-07-06 22:26:22.200415 7fe060845700 10 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] do_op osd_op(client.4177.0:308 2.7 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head [read 0~1048576] snapc 0=[] RETRY=1 ondisk+re try+read+known_if_redirected e104) v8 may_read -> read-ordered flags ondisk+retry+read+known_if_redirected -6> 2017-07-06 22:26:22.201200 7fe060845700 7 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] object 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head v 23'831, recovering. -5> 2017-07-06 22:26:22.202009 7fe060845700 10 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] start_recovery_op 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head -4> 2017-07-06 22:26:22.202777 7fe060845700 10 osd.0 105 start_recovery_op pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi=[15,103)/2 rops=1 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head (1/3 rops) -3> 2017-07-06 22:26:22.203545 7fe060845700 10 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 rops=1 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] recover_object: 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head -2> 2017-07-06 22:26:22.204314 7fe060845700 7 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi=[15,103)/2 rops=1 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] pull 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head v 23'831 on osds 1 from osd.1 -1> 2017-07-06 22:26:22.205080 7fe060845700 10 osd.0 pg_epoch: 104 pg[2.7( v 38'1778 lc 16'4 (0'0,38'1778] local-lis/les=103/104 n=1778 ec=15/15 lis/c 103/0 les/c/f 104/0/0 102/103/103) [0,4] r=0 lpr=103 pi =[15,103)/2 rops=1 crt=38'1778 mlcod 0'0 active+recovery_wait+degraded m=48] pulling soid 2:e042da5c:::benchmark_data_smithi145_3328_object13360:head from osd 1 at version 0'0 rather than at version 23'831 0> 2017-07-06 22:26:22.208929 7fe060845700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-2604-g5511846/rpm/el7/BUILD/ceph-12.0.3-2604-g5511846/src/osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::prepare_pull(eversion_t, const hobject_t&, ObjectContextRef, ReplicatedBackend::RPGHandle* )' thread 7fe060845700 time 2017-07-06 22:26:22.205888 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.3-2604-g5511846/rpm/el7/BUILD/ceph-12.0.3-2604-g551184 6/src/osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().objects.find( soid)->second->reverting_to == v)) ceph version 12.0.3-2604-g5511846 (55118463108180eb1807bbfeebf439fd61e430bb) luminous (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fe082d0fc20] 2: (ReplicatedBackend::prepare_pull(eversion_t, hobject_t const&, std::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x779) [0x7fe082a61459] 3: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::shared_ptr<ObjectContext>, std::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0x25e) [0x7fe082a63f9e] 4: (PrimaryLogPG::recover_missing(hobject_t const&, eversion_t, int, PGBackend::RecoveryHandle*)+0x5be) [0x7fe082910a0e] 5: (PrimaryLogPG::maybe_kick_recovery(hobject_t const&)+0x30d) [0x7fe082911aed] 6: (PrimaryLogPG::wait_for_unreadable_object(hobject_t const&, boost::intrusive_ptr<OpRequest>)+0x44) [0x7fe082911cb4] 7: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x1d7f) [0x7fe082947cbf] 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xeba) [0x7fe08290610a] 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7fe0827a7509] 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7fe082a00167] 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7fe0827d1a0e] 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8e9) [0x7fe082d15769] 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe082d178f0] 14: (()+0x7dc5) [0x7fe07f908dc5] 15: (clone()+0x6d) [0x7fe07e9fc73d]
No patches in this test run touched anything near osd recovery logic, so I think this is a bug in master.
Updated by Kefu Chai over 6 years ago
- Is duplicate of Bug #20694: osd/ReplicatedBackend.cc: 1417: FAILED assert(get_parent()->get_log().get_log().objects.count(soid) && (get_parent()->get_log().get_log().objects.find(soid)->second->op == pg_log_entry_t::LOST_REVERT) && (get_parent()->get_log().get_log().objects.find( s added
Actions