Bug #5473
osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on master, cuttlefish
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-1> 2013-06-26 22:57:40.544295 7f2ae7bfb700 -1 osd.0 pg_epoch: 1060 pg[3.38( v 1052'444 (8'25,1052'444] local-les=1060 n=11 ec=7 les/c 1060/1060 1059/1059/1059) [0,2] r=0 lpr=1059 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[25f~1,2e3~1]] trim_objectcould not find coid 290f4338/plana5819529-23/265//3 0> 2013-06-26 22:57:40.546881 7f2ae7bfb700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f2ae7bfb700 time 2013-06-26 22:57:40.544339 osd/ReplicatedPG.cc: 1379: FAILED assert(0) ceph version 0.61.4-31-gb2fb487 (b2fb48762f32279e73feb83b220339fea31275e9) 1: (ReplicatedPG::trim_object(hobject_t const&)+0x157) [0x5aa7e7] 2: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x423) [0x5cdeb3] 3: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x5ffc5c] 4: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xfb) [0x5ea6db] 5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x5ea86e] 6: (ReplicatedPG::snap_trimmer()+0x526) [0x58f046] 7: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x652e84] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8397e6] 9: (ThreadPool::WorkThread::entry()+0x10) [0x83b610] 10: (()+0x7e9a) [0x7f2af9d95e9a] 11: (clone()+0x6d) [0x7f2af7f28ccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
job was
ubuntu@teuthology:/a/teuthology-2013-06-26_20:00:05-rados-cuttlefish-testing-basic/47353$ cat orig.config.yaml kernel: kdb: true sha1: 404026df622ab80f5393a2a70590bf0b56c726dc machine_type: plana nuke-on-error: true overrides: admin_socket: branch: cuttlefish ceph: conf: global: ms inject delay max: 1 ms inject delay probability: 0.005 ms inject delay type: osd ms inject socket failures: 2500 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: b2fb48762f32279e73feb83b220339fea31275e9 install: ceph: sha1: b2fb48762f32279e73feb83b220339fea31275e9 s3tests: branch: cuttlefish workunit: sha1: b2fb48762f32279e73feb83b220339fea31275e9 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000
copied log and contents of osds to teuthology dir
Related issues
History
#1 Updated by Sage Weil over 10 years ago
- Subject changed from osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on cuttlefish to osd/ReplicatedPG.cc: 1379: FAILED assert(0) in trim_object() on master, cuttlefish
0> 2013-07-14 04:56:57.644145 7f12094ad700 -1 osd/ReplicatedPG.cc: In function 'ReplicatedPG::RepGather* ReplicatedPG::trim_object(const hobject_t&)' thread 7f12094ad700 time 2013-07-14 04:56:57.641479
osd/ReplicatedPG.cc: 1506: FAILED assert(0)
ceph version 0.66-587-gdf45b16 (df45b167cfe262c46367e812c79e65698804ef5d) 1: (ReplicatedPG::trim_object(hobject_t const&)+0x150) [0x5fd310] 2: (ReplicatedPG::TrimmingObjects::react(ReplicatedPG::SnapTrim const&)+0x423) [0x62c2f3] 3: (boost::statechart::simple_state<ReplicatedPG::TrimmingObjects, ReplicatedPG::SnapTrimmer, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbc) [0x65ad0c] 4: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xfb) [0x6414eb] 5: (boost::statechart::state_machine<ReplicatedPG::SnapTrimmer, ReplicatedPG::NotTrimming, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x1e) [0x64167e] 6: (ReplicatedPG::snap_trimmer()+0x516) [0x5df746] 7: (OSD::SnapTrimWQ::_process(PG*)+0x14) [0x6ac3a4] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8576] 9: (ThreadPool::WorkThread::entry()+0x10) [0x8ba3a0] 10: (()+0x7e9a) [0x7f121ce4ae9a] 11: (clone()+0x6d) [0x7f121afddccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
job was
ubuntu@teuthology:/a/teuthology-2013-07-14_01:00:14-rados-next-testing-basic/66707$ cat orig.config.yaml kernel: kdb: true sha1: 365b57b1317524bb0cdd15859a224ba1ab58d1d7 machine_type: plana nuke-on-error: true overrides: admin_socket: branch: next ceph: conf: global: ms inject socket failures: 5000 mon: debug mon: 20 debug ms: 20 debug paxos: 20 fs: xfs log-whitelist: - slow request sha1: df45b167cfe262c46367e812c79e65698804ef5d install: ceph: sha1: df45b167cfe262c46367e812c79e65698804ef5d s3tests: branch: next workunit: sha1: df45b167cfe262c46367e812c79e65698804ef5d roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock.check: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000
#2 Updated by Samuel Just over 10 years ago
- Status changed from 12 to 7
This could be explained by a failure to resurrect a parent pg causing recovery on an hobject to perform writes on an object left in a deleting parent pg collection.
#3 Updated by Samuel Just over 10 years ago
- Status changed from 7 to Resolved