Bug #8011
closedosd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
0%
Description
osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
ceph version 0.78-600-g19f50b9 (19f50b9d7bbbb2cce3b599f3ed8a9fa32c3d4e53)
1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x1c86) [0x7f8976]
2: (ReplicatedPG::try_flush_mark_clean(boost::shared_ptr<ReplicatedPG::FlushOp>)+0x72f) [0x7fa5ff]
3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x2da) [0x7fb0ea]
4: (C_Flush::finish(int)+0xa7) [0x856e77]
5: (Context::complete(int)+0x9) [0x66ed59]
6: (Finisher::finisher_thread_entry()+0x1c0) [0x9a7050]
7: (()+0x7e9a) [0x7f14f9c60e9a]
8: (clone()+0x6d) [0x7f14f82213fd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-08_14:01:14-rados:thrash-wip-7891-testing-basic-plana/178972
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-13_09:43:35-rados:thrash-testing-testing-basic-plana/189166
Updated by Sage Weil about 10 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-13_09:43:35-rados:thrash-testing-testing-basic-plana/189348
Updated by Samuel Just about 10 years ago
- Status changed from New to In Progress
- Assignee set to Samuel Just
Updated by Samuel Just about 10 years ago
ReplicatedPG::do_op already does the right thing as far as blocking ops which may flush. What remains is to avoid flushing objects with blocked obcs.
Updated by Samuel Just about 10 years ago
and to check that agent_work also does the right thing
Updated by Sage Weil about 10 years ago
- Source changed from other to Q/A
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-18_21:29:10-rados:thrash-testing-testing-basic-plana/202157
Updated by Sage Weil almost 10 years ago
- Status changed from Resolved to 12
this triggered again on c6ada53a146f3196e11f545cfc968fc21657aec6
0> 2014-05-02 11:26:12.053553 7fd1ee1b2700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7fd1ee1b2700 time 2014-05-02 11:26:12.039772
osd/ReplicatedPG.cc: 5282: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-05-02_02:30:10-rados-master-testing-basic-plana/229437
Updated by Samuel Just almost 10 years ago
- Status changed from 12 to Fix Under Review
Updated by Samuel Just almost 10 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Samuel Just almost 10 years ago
- Status changed from Pending Backport to Resolved
Updated by Sage Weil almost 10 years ago
- Status changed from Resolved to 12
see #8747 for a log of this happening on 0.80.3
Updated by Sage Weil over 9 years ago
- Status changed from 12 to Can't reproduce
Pinged Dmitry to see if he is sitll seeing this or has a log
Updated by Dmitry Smirnov over 9 years ago
- Status changed from Can't reproduce to Resolved
I'm unable to reproduce it any more, assuming fixed.
Updated by Sage Weil over 9 years ago
- Status changed from Resolved to 12
this popped up again: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-17_02:32:01-rados-giant-distro-basic-multi/604740
Updated by Samuel Just over 9 years ago
Urgh, non-blocking flushes do not cause scrub to pause. I think the simplest solution is to fail a non-blocking scrub in try_flush_mark_clean if the object is being scrubbed.
Updated by Samuel Just over 9 years ago
- Status changed from 7 to Pending Backport
- Backport set to giant,firefly
Updated by Sage Weil about 9 years ago
- Status changed from Pending Backport to 12
happened again, i believe with the latest fix applied.
ubuntu@teuthology:/a/teuthology-2015-01-25_23:10:02-knfs-next-testing-basic-multi/722944
Updated by Yuri Weinstein about 9 years ago
Also see in run: http://pulpito.ceph.com/teuthology-2015-01-27_17:13:01-upgrade:firefly-x-next-distro-basic-multi/
Job: ['726068']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-27_17:13:01-upgrade:firefly-x-next-distro-basic-multi/726068/teuthology.log
2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool, bool)' thread 7fd534fd4700 time 2015-01-28 07:11:26.502343 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr:osd/ReplicatedPG.cc: 5943: FAILED assert(soid < scrubber.start || soid >= scrubber.end) 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xadb3df] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 2: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool, bool)+0x1bd3) [0x826f93] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x16d) [0x845d6d] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xa13) [0x846963] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2987) [0x8519a7] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x63f) [0x7ea05f] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17f) [0x6647ef] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x65f) [0x66524f] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x65c) [0xacab4c] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xacd720] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 11: (()+0x7e9a) [0x7fd54fe2de9a] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 12: (clone()+0x6d) [0x7fd54e5d8ccd] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Irek Fasikhov about 9 years ago
Yes, I have the same error.
[root@ceph04 ceph]# ceph -v
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)
Fix will be in version 0.80.9? Thanks
2015-01-28 09:50:11.082466 7fbc1437b700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7fbc1437b700 time 2015-01-28 09:50:10.852829 osd/ReplicatedPG.cc: 5318: FAILED assert(soid < scrubber.start || soid >= scrubber.end) ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x32f5) [0x889275] 2: (ReplicatedPG::finish_promote(int, std::tr1::shared_ptr<OpRequest>, ReplicatedPG::CopyResults*, std::tr1::shared_ptr<ObjectContext>)+0x110f) [0x8903ef] 3: (PromoteCallback::finish(boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type , boost::tuples::null_type, boost::tuples::null_type>)+0x78) [0x8e29b8] 4: (GenContext<boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tupl es::null_type, boost::tuples::null_type> >::complete(boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>)+0x15) [0x8b34f5] 5: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x747) [0x885407] 6: (C_Copyfrom::finish(int)+0xb7) [0x8e2777] 7: (Context::complete(int)+0x9) [0x667209] 8: (Finisher::finisher_thread_entry()+0x1d8) [0x9ed148] 9: (()+0x79d1) [0x7fbc36e5f9d1] 10: (clone()+0x6d) [0x7fbc35dd88fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events ---
Updated by Samuel Just about 9 years ago
Irek, you are seeing the previous incarnation of this bug. The relevant fix has not yet been backported to firefly.
Updated by Samuel Just about 9 years ago
This most recent incarnation is due to oi digests: we block_writes until the end of COMPARE_MAPS. The assert should not fire if !block_writes. Previously, we would always change scrubber.start to be the same as scrubber.end as we changed block_writes.
Updated by Samuel Just about 9 years ago
Also, this is an entirely distinct bug, so I'm making this Pending Backport again and opening a new one.
Updated by Samuel Just about 9 years ago
- Status changed from 12 to Pending Backport
Updated by Irek Fasikhov about 9 years ago
Samuel.
yes, this is another mistake. http://tracker.ceph.com/issues/10433#note-3
Updated by Loïc Dachary about 9 years ago
- Severity changed from 3 - minor to 2 - major
Updated by Loïc Dachary about 9 years ago
- firefly backport https://github.com/ceph/ceph/pull/3943
Updated by Loïc Dachary about 9 years ago
- giant backport https://github.com/ceph/ceph/pull/4053
Updated by Loïc Dachary about 9 years ago
- Status changed from Pending Backport to Resolved