Project

General

Profile

Actions

Bug #35955

closed

ceph-objectstore-tool past_intervals broken

Added by Sage Weil over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-09-12 13:57:55.604 7fba7575c700 -1 osd.1 pg_epoch: 638 pg[2.4( v 18'8 (0'0,18'8] local-lis/les=482/483 n=5 ec=15/15 lis/c 482/482 les/c/f 483/483/0 628/628/628) [2,0] r=-1 lpr=637 pi=[584,628)/1 crt=18'8 lcod 0'0 unknown mbc={}] 2.4 past_intervals [584,628) start interval does not contain the required bound [4
83,628) start
2018-09-12 13:57:55.607 7fba7575c700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-3159-g47aa991/rpm/el7/BUILD/ceph-14.0.0-3159-g47aa991/src/osd/PG.cc: In function 'void PG::check_past_interval_bounds(
) const' thread 7fba7575c700 time 2018-09-12 13:57:55.606349
2018-09-12 13:57:55.607 7fba7575c700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-3159-g47aa991/rpm/el7/BUILD/ceph-14.0.0-3159-g47aa991/src/osd/PG.cc: In function 'void PG::check_past_interval_bounds() const' thread 7fba7575c700 time 2018-09-12 13:57:55.606349
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-3159-g47aa991/rpm/el7/BUILD/ceph-14.0.0-3159-g47aa991/src/osd/PG.cc: 932: abort()

 ceph version 14.0.0-3159-g47aa991 (47aa99112f9268c11a435dca151002cf33e5e98f) nautilus (dev)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x82) [0x5630160a042e]
 2: (PG::check_past_interval_bounds() const+0xa57) [0x563016253917]
 3: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x1b2) [0x56301627fa02]
 4: (boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x75) [0x5630162c2835]
 5: (PG::handle_advance_map(std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x23d) [0x56301626bffd]
 6: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*)+0x2d1) [0x5630161df011]
 7: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x9b) [0x5630161e05db]
 8: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x56301642abb0]
 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x4fc) [0x5630161d349c]
 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d6) [0x563016759996]
 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56301675a590]
 12: (()+0x7e25) [0x7fbaa0148e25]

the pg was imported. originally exported from osd.5. last seen in this state:
2018-09-12 13:56:18.136960 7fb75d8b7700 10 osd.5 pg_epoch: 582 pg[2.4( v 18'8 (0'0,18'8] local-lis/les=482/483 n=5 ec=15/15 lis/c 482/482 les/c/f 483/483/0 482/482/407) [5,0] r=0 lpr=482 crt=18'8 mlcod 18'8 active+clean] handle_peering_event: epoch_sent: 582 epoch_requested: 582 NullEvt

/a/sage-2018-09-11_22:11:25-rados-wip-sage-testing-2018-09-11-1316-distro-basic-smithi/3006960

the problem appears to be that the past intervals added by ceph-objectstore-tool don't match the expected bounds, which are based on last_epoch_clean (483).


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #36412: ceph-objectstore-tool import after pg splits which will lost objectsClosed10/12/2018

Actions
Actions #1

Updated by Sage Weil over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil over 5 years ago

  • Status changed from 12 to Resolved
Actions #3

Updated by Sage Weil over 5 years ago

This is fixed for nautilus since the behavior totally changed with https://github.com/ceph/ceph/pull/23985. The problem may still exist in mimic, luminous, etc., but until we reproduce it there I'm not sure if it's the same bug or a regression that happened post-mimic.

Actions #4

Updated by David Zafman over 5 years ago

  • Related to Bug #36412: ceph-objectstore-tool import after pg splits which will lost objects added
Actions

Also available in: Atom PDF