Project

General

Profile

Bug #5473

osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on master, cuttlefish

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -1> 2013-06-26 22:57:40.544295 7f2ae7bfb700 -1 osd.0 pg_epoch: 1060 pg[3.38( v 1052'444 (8'25,1052'444] local-les=1060 n=11 ec=7 les/c 1060/1060 1059/1059/1059) [0,2] r=0 lpr=1059 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[25f~1,2e3~1]] trim_objectcould not find coid 290f4338/plana5819529-23/265//3
     0> 2013-06-26 22:57:40.546881 7f2ae7bfb700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f2ae7bfb700 time 2013-06-26 22:57:40.544339
osd/ReplicatedPG.cc: 1379: FAILED assert(0)

 ceph version 0.61.4-31-gb2fb487 (b2fb48762f32279e73feb83b220339fea31275e9)
 1: (ReplicatedPG::trim_object(hobject_t const&)+0x157) [0x5aa7e7]
 2: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x423) [0x5cdeb3]
 3: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x5ffc5c]
 4: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xfb) [0x5ea6db]
 5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x5ea86e]
 6: (ReplicatedPG::snap_trimmer()+0x526) [0x58f046]
 7: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x652e84]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8397e6]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x83b610]
 10: (()+0x7e9a) [0x7f2af9d95e9a]
 11: (clone()+0x6d) [0x7f2af7f28ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was

ubuntu@teuthology:/a/teuthology-2013-06-26_20:00:05-rados-cuttlefish-testing-basic/47353$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 404026df622ab80f5393a2a70590bf0b56c726dc
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: cuttlefish
  ceph:
    conf:
      global:
        ms inject delay max: 1
        ms inject delay probability: 0.005
        ms inject delay type: osd
        ms inject socket failures: 2500
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: b2fb48762f32279e73feb83b220339fea31275e9
  install:
    ceph:
      sha1: b2fb48762f32279e73feb83b220339fea31275e9
  s3tests:
    branch: cuttlefish
  workunit:
    sha1: b2fb48762f32279e73feb83b220339fea31275e9
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000

copied log and contents of osds to teuthology dir


Related issues

Related to Ceph - Bug #5269: osd: EEXIST on mkcoll Resolved 06/06/2013

History

#1 Updated by Sage Weil over 10 years ago

  • Subject changed from osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on cuttlefish to osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on master, cuttlefish

0> 2013-07-14 04:56:57.644145 7f12094ad700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f12094ad700 time 2013-07-14 04:56:57.641479
osd/ReplicatedPG.cc: 1506: FAILED assert(0)

 ceph version 0.66-587-gdf45b16 (df45b167cfe262c46367e812c79e65698804ef5d)
 1: (ReplicatedPG::trim_object(hobject_t const&)+0x150) [0x5fd310]
 2: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x423) [0x62c2f3]
 3: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x65ad0c]
 4: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xfb) [0x6414eb]
 5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x64167e]
 6: (ReplicatedPG::snap_trimmer()+0x516) [0x5df746]
 7: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x6ac3a4]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8576]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x8ba3a0]
 10: (()+0x7e9a) [0x7f121ce4ae9a]
 11: (clone()+0x6d) [0x7f121afddccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/teuthology-2013-07-14_01:00:14-rados-next-testing-basic/66707$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7
machine_type: plana
nuke-on-error: true
overrides:
  admin_socket:
    branch: next
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    fs: xfs
    log-whitelist:
    - slow request
    sha1: df45b167cfe262c46367e812c79e65698804ef5d
  install:
    ceph:
      sha1: df45b167cfe262c46367e812c79e65698804ef5d
  s3tests:
    branch: next
  workunit:
    sha1: df45b167cfe262c46367e812c79e65698804ef5d
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock.check: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000

#2 Updated by Samuel Just over 10 years ago

  • Status changed from 12 to 7

This could be explained by a failure to resurrect a parent pg causing recovery on an hobject to perform writes on an object left in a deleting parent pg collection.

#3 Updated by Samuel Just over 10 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF