Project

General

Profile

Actions

Bug #12824

closed

osd/ReplicatedPG.cc: 10604: FAILED assert(obc) in hit_set_remove_all

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  -890> 2015-08-27 16:43:05.198905 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] hit_set_clear
  -887> 2015-08-27 16:43:05.198925 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:29:27.250560_2015-08-27 16:30:27.253061/head
  -882> 2015-08-27 16:43:05.198946 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:31:02.655479_2015-08-27 16:36:21.549587/head
  -878> 2015-08-27 16:43:05.198968 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:36:21.550125_2015-08-27 16:37:21.948796/head
  -875> 2015-08-27 16:43:05.198986 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:37:21.949499_2015-08-27 16:38:22.215821/head
  -869> 2015-08-27 16:43:05.199005 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:38:22.216324_2015-08-27 16:39:22.647193/head
  -867> 2015-08-27 16:43:05.199027 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head
  -865> 2015-08-27 16:43:05.199045 7f2d74faf700 20 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_hit_set_archive_object 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head
  -860> 2015-08-27 16:43:05.199060 7f2d74faf700 10 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_object_context: obc NOT found in cache: 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head
  -858> 2015-08-27 16:43:05.199099 7f2d74faf700 15 filestore(/var/lib/ceph/osd/ceph-4) getattr 2.2_head/2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head '_'
  -846> 2015-08-27 16:43:05.199168 7f2d74faf700 10 filestore(/var/lib/ceph/osd/ceph-4) error opening file /var/lib/ceph/osd/ceph-4/current/2.2_head/DIR_2/hit\uset\u2.2\uarchive\u2015-08-27 16:39:48.295838\u2015-08-27 16:40:48.60349
4__head_00000002_.ceph-internal_2 with flags=2: (2) No such file or directory
  -845> 2015-08-27 16:43:05.199181 7f2d74faf700 10 filestore(/var/lib/ceph/osd/ceph-4) getattr 2.2_head/2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head '_' = -2
  -841> 2015-08-27 16:43:05.199186 7f2d74faf700 10 osd.4 pg_epoch: 86 pg[2.2( v 84'7899 (84'7635,84'7899] local-les=84 n=1179 ec=8 les/c 84/84 85/85/85) []/[4] r=0 lpr=85 pi=8-84/13 crt=84'7894 lcod 84'7898 mlcod 0'0 undersized+deg
raded+remapped+peered] get_object_context: no obc for soid 2/00000002:.ceph-internal/hit_set_2.2_archive_2015-08-27 16:39:48.295838_2015-08-27 16:40:48.603494/head and !can_create
  -314> 2015-08-27 16:43:05.205060 7f2d74faf700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_remove_all()' thread 7f2d74faf700 time 2015-08-27 16:43:05.199201
osd/ReplicatedPG.cc: 10604: FAILED assert(obc)

 ceph version 9.0.3-912-gdc3c0ed (dc3c0ed75be348b12969ba703707f3c9501304a2)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f2d907b7e75]
 2: (ReplicatedPG::hit_set_remove_all()+0x519) [0x7f2d90450dd9]
 3: (ReplicatedPG::on_activate()+0x7ad) [0x7f2d9045169d]
 4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xa5) [0x7f2d90396a75]
 5: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, PG::RecoveryState::Activating, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x179) [0
x7f2d903eb3a9]
 6: (boost::statechart::simple_state<PG::RecoveryState::Activating, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
 mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xcd) [0x7f2d903ee76d]
 7: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x6b) [0x7f
2d903d5a0b]
 8: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>, PG::RecoveryCtx*)+0x1ed) [0x7f2d9038187d]
 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x277) [0x7f2d9027ed77]
 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x28) [0x7f2d902c6928]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa76) [0x7f2d907a9436]
 12: (ThreadPool::WorkThread::entry()+0x10) [0x7f2d907aa300]
 13: (()+0x7df5) [0x7f2d8e8c6df5]
 14: (clone()+0x6d) [0x7f2d8d16f1ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
/a/sage-2015-08-26_09:07:57-rados-wip-sage-testing---basic-multi/1033678
Actions #1

Updated by Kefu Chai over 8 years ago

  • Assignee set to Kefu Chai
Actions #2

Updated by Kefu Chai over 8 years ago

to reproduce this issue

teuthology-suite --suite rados --filter="rados/thrash/{hobj-sort.yaml 0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overr
ides/short_pg_log.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr/simple.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-big.yaml}" --suite-branch master
 --distro ubuntu --email tchaikov@gmail.com --ceph master --machine-type plana,mira,burnupi

note, we added msgr/* into the suite after the issue was reported, and i put msgr/simple.yaml here.

Actions #4

Updated by Kefu Chai over 8 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF