Bug #24597
closedFAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename()
0%
Description
2018-06-20T18:58:36.950 INFO:tasks.ceph.osd.6.smithi143.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-663-g111c515/rpm/el7/BUILD/ceph-14.0.0-663-g111c515/src/os/filestore/FileStore.cc: In function 'int FileStore::_collection_move_rename(const coll_t&, const ghobject_t&, coll_t, const ghobject_t&, const SequencerPosition&, bool)' thread 7f6034592700 time 2018-06-20 18:58:36.961023 2018-06-20T18:58:36.950 INFO:tasks.ceph.osd.6.smithi143.stderr:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.0-663-g111c515/rpm/el7/BUILD/ceph-14.0.0-663-g111c515/src/os/filestore/FileStore.cc: 5524: FAILED assert(0 == "ERROR: source must exist") 2018-06-20T18:58:36.952 INFO:tasks.ceph.osd.6.smithi143.stderr: ceph version 14.0.0-663-g111c515 (111c515ab0294ffe409fcd8555bb98d3e7290a61) nautilus (dev) 2018-06-20T18:58:36.952 INFO:tasks.ceph.osd.6.smithi143.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7f604ddb4cdf] 2018-06-20T18:58:36.952 INFO:tasks.ceph.osd.6.smithi143.stderr: 2: (()+0x28aec7) [0x7f604ddb4ec7] 2018-06-20T18:58:36.952 INFO:tasks.ceph.osd.6.smithi143.stderr: 3: (FileStore::_collection_move_rename(coll_t const&, ghobject_t const&, coll_t, ghobject_t const&, SequencerPosition const&, bool)+0xa7c) [0x55e2f5a1b26c] 2018-06-20T18:58:36.952 INFO:tasks.ceph.osd.6.smithi143.stderr: 4: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*, char const*)+0xe7b) [0x55e2f5a1d40b] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 5: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*, char const*)+0x48) [0x55e2f5a23368] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 6: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x13f) [0x55e2f5a234df] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0x7c7) [0x7f604ddba047] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 8: (ThreadPool::WorkThread::entry()+0x10) [0x7f604ddbb6a0] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 9: (()+0x7e25) [0x7f604a908e25] 2018-06-20T18:58:36.953 INFO:tasks.ceph.osd.6.smithi143.stderr: 10: (clone()+0x6d) [0x7f60499f8bad] 2018-06-20T18:58:36.954 INFO:tasks.ceph.osd.6.smithi143.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
http://pulpito.ceph.com/nojha-2018-06-20_18:20:55-rados:thrash-master-distro-basic-smithi/2684831/
Updated by Josh Durgin almost 6 years ago
- Priority changed from Normal to Urgent
Updated by Josh Durgin almost 6 years ago
- Category set to Correctness/Safety
- Component(RADOS) FileStore added
Updated by Sage Weil almost 6 years ago
I believe this is caused by b50186bfe6c8981700e33c8a62850e21779d67d5, which does
if (roll_forward_to) { pg_log.roll_forward(&rollbacker); }
i.e., rolls forward to log.head instead of *roll_forward_to.
in 12.2.5 this is a backported fix for http://tracker.ceph.com/issues/22050 which is much less severe :)
Updated by Sage Weil almost 6 years ago
- Status changed from New to 12
- Priority changed from Urgent to Immediate
Updated by Josh Durgin almost 6 years ago
Aha, in that case wip-24192 should fix it. Running it through testing again...
Updated by Sage Weil almost 6 years ago
- Status changed from 12 to In Progress
- Backport set to mimic,luminous
Updated by Sage Weil almost 6 years ago
- Related to Bug #23145: OSD crashes during recovery of EC pg added
Updated by Josh Durgin almost 6 years ago
- Has duplicate Bug #24192: cluster [ERR] Corruption detected: object 2:f59d1934:::smithi14913526-5822:head is missing hash_info added
Updated by Sage Weil almost 6 years ago
Factors leading to this:
- ec pool (e.g., rgw workload0
- rados ops that result in pg log 'error' entries (e.g., deleting a non-existent object, due to rgw gc)
- peering (due to osd restarts etc)
A workaround that should work:
- quiesce IO to the EC pool (ceph osd pause/unpause, or pause radsogw processes) prior to restarting/upgrading osds
That will ensure that the last_update for all shards of each PG match and no rollback will be needed (if the pg incorrectly rolled forward too far the rollback won't be possible).
Updated by Sage Weil almost 6 years ago
mimic backport: https://github.com/ceph/ceph/pull/22997
Updated by Sage Weil almost 6 years ago
- Status changed from In Progress to Pending Backport
Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #24890: luminous: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename() added
Updated by Nathan Cutler almost 6 years ago
- Copied to Backport #24891: mimic: FAILED assert(0 == "ERROR: source must exist") in FileStore::_collection_move_rename() added
Updated by Dan van der Ster almost 6 years ago
Could cephfs trigger this issue? There have been two reports of cephfs_metadata pool crc errors on the users ML this week.
Updated by Nathan Cutler almost 6 years ago
- Status changed from Pending Backport to Resolved