Actions
Bug #12824
closedosd/ReplicatedPG.cc: 10604: FAILED assert(obc) in hit_set_remove_all
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-890> 2015-08-27 16:43:05.198905 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] hit_set_clear -887> 2015-08-27 16:43:05.198925 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:29:27.250560_2015-08-27 16:30:27.253061/head -882> 2015-08-27 16:43:05.198946 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:31:02.655479_2015-08-27 16:36:21.549587/head -878> 2015-08-27 16:43:05.198968 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:36:21.550125_2015-08-27 16:37:21.948796/head -875> 2015-08-27 16:43:05.198986 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:37:21.949499_2015-08-27 16:38:22.215821/head -869> 2015-08-27 16:43:05.199005 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:38:22.216324_2015-08-27 16:39:22.647193/head -867> 2015-08-27 16:43:05.199027 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head -865> 2015-08-27 16:43:05.199045 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head -860> 2015-08-27 16:43:05.199060 7f2d74faf700 10 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_object_context: obc NOT found in cache: 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head -858> 2015-08-27 16:43:05.199099 7f2d74faf700 15 filestore(/var/lib/ceph/osd/ceph-4) getattr 2.2_head/2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head '_' -846> 2015-08-27 16:43:05.199168 7f2d74faf700 10 filestore(/var/lib/ceph/osd/ceph-4) error opening file /var/lib/ceph/osd/ceph-4/current/2.2_head/DIR_2/hit\uset\u2.2\uarchive\u2015-08-27 16:39:48.295838\u2015-08-27 16:40:48.60349 4__head_00000002_.ceph-internal_2 with flags=2: (2) No such file or directory -845> 2015-08-27 16:43:05.199181 7f2d74faf700 10 filestore(/var/lib/ceph/osd/ceph-4) getattr 2.2_head/2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head '_' = -2 -841> 2015-08-27 16:43:05.199186 7f2d74faf700 10 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg raded+remapped+peered] get_object_context: no obc for soid 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head and !can_create -314> 2015-08-27 16:43:05.205060 7f2d74faf700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_remove_all()' thread 7f2d74faf700 time 2015-08-27 16:43:05.199201 osd/ReplicatedPG.cc: 10604: FAILED assert(obc) ceph version 9.0.3-912-gdc3c0ed (dc3c0ed75be348b12969ba703707f3c9501304a2) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f2d907b7e75] 2: (ReplicatedPG::hit_set_remove_all()+0x519) [0x7f2d90450dd9] 3: (ReplicatedPG::on_activate()+0x7ad) [0x7f2d9045169d] 4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xa5) [0x7f2d90396a75] 5: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x179) [0 x7f2d903eb3a9] 6: (boost::statechart::simple_state<PG::RecoveryState::Activating, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xcd) [0x7f2d903ee76d] 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x6b) [0x7f 2d903d5a0b] 8: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ed) [0x7f2d9038187d] 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x277) [0x7f2d9027ed77] 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x28) [0x7f2d902c6928] 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0x7f2d907a9436] 12: (ThreadPool::WorkThread::entry()+0x10) [0x7f2d907aa300] 13: (()+0x7df5) [0x7f2d8e8c6df5] 14: (clone()+0x6d) [0x7f2d8d16f1ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this./a/sage-2015-08-26_09:07:57-rados-wip-sage-testing---basic-multi/1033678
Updated by Kefu Chai over 8 years ago
to reproduce this issue
teuthology-suite --suite rados --filter="rados/thrash/{hobj-sort.yaml 0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overr ides/short_pg_log.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr/simple.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-big.yaml}" --suite-branch master --distro ubuntu --email tchaikov@gmail.com --ceph master --machine-type plana,mira,burnupi
note, we added msgr/* into the suite after the issue was reported, and i put msgr/simple.yaml
here.
Updated by Kefu Chai over 8 years ago
re-tested at http://pulpito.ceph.com/kchai-2015-09-09_04:26:33-rados-master---basic-multi/,
not able to reproduce it
Updated by Kefu Chai over 8 years ago
- Status changed from New to Can't reproduce
Actions