Bug #8011
osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
0%
Description
osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
ceph version 0.78-600-g19f50b9 (19f50b9d7bbbb2cce3b599f3ed8a9fa32c3d4e53)
1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x1c86) [0x7f8976]
2: (ReplicatedPG::try_flush_mark_clean(boost::shared_ptr<ReplicatedPG::FlushOp>)+0x72f) [0x7fa5ff]
3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x2da) [0x7fb0ea]
4: (C_Flush::finish(int)+0xa7) [0x856e77]
5: (Context::complete(int)+0x9) [0x66ed59]
6: (Finisher::finisher_thread_entry()+0x1c0) [0x9a7050]
7: (()+0x7e9a) [0x7f14f9c60e9a]
8: (clone()+0x6d) [0x7f14f82213fd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Related issues
Associated revisions
ReplicatedPG: block scrub on blocked object contexts
Fixes: #8011
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
ReplicatedPG: block scrub on blocked object contexts
Fixes: #8011
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e66f2e36c06ca00c1147f922d3513f56b122a5c0)
ReplicatedPG: block scrub on blocked object contexts
Fixes: #8011
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
ReplicatedPG: block scrub on blocked object contexts
Fixes: #8011
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
ReplicatedPG: block scrub on blocked object contexts
Fixes: #8011
Backport: firefly
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 7411477153219d66625a74c5886530029c516036)
ReplicatedPG: fail a non-blocking flush if the object is being scrubbed
Fixes: #8011
Backport: firefly, giant
Signed-off-by: Samuel Just <sjust@redhat.com>
ReplicatedPG: fail a non-blocking flush if the object is being scrubbed
Fixes: #8011
Backport: firefly, giant
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 9b26de3f3653d38dcdfc5b97874089f19d2a59d7)
ReplicatedPG: fail a non-blocking flush if the object is being scrubbed
Fixes: #8011
Backport: firefly, giant
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 9b26de3f3653d38dcdfc5b97874089f19d2a59d7)
History
#1 Updated by Sage Weil almost 9 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-08_14:01:14-rados:thrash-wip-7891-testing-basic-plana/178972
#2 Updated by Sage Weil almost 9 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-13_09:43:35-rados:thrash-testing-testing-basic-plana/189166
#3 Updated by Sage Weil almost 9 years ago
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-13_09:43:35-rados:thrash-testing-testing-basic-plana/189348
#4 Updated by Samuel Just almost 9 years ago
- Status changed from New to In Progress
- Assignee set to Samuel Just
#5 Updated by Samuel Just almost 9 years ago
ReplicatedPG::do_op already does the right thing as far as blocking ops which may flush. What remains is to avoid flushing objects with blocked obcs.
#6 Updated by Samuel Just almost 9 years ago
and to check that agent_work also does the right thing
#7 Updated by Samuel Just almost 9 years ago
- Status changed from In Progress to 7
#8 Updated by Sage Weil almost 9 years ago
- Source changed from other to Q/A
ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-04-18_21:29:10-rados:thrash-testing-testing-basic-plana/202157
#9 Updated by Samuel Just over 8 years ago
- Status changed from 7 to Resolved
#10 Updated by Sage Weil over 8 years ago
- Status changed from Resolved to 12
this triggered again on c6ada53a146f3196e11f545cfc968fc21657aec6
0> 2014-05-02 11:26:12.053553 7fd1ee1b2700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7fd1ee1b2700 time 2014-05-02 11:26:12.039772
osd/ReplicatedPG.cc: 5282: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-05-02_02:30:10-rados-master-testing-basic-plana/229437
#11 Updated by Samuel Just over 8 years ago
- Status changed from 12 to Fix Under Review
#12 Updated by Samuel Just over 8 years ago
- Status changed from Fix Under Review to Pending Backport
#13 Updated by Samuel Just over 8 years ago
- Status changed from Pending Backport to Resolved
#14 Updated by Sage Weil over 8 years ago
- Status changed from Resolved to 12
see #8747 for a log of this happening on 0.80.3
#15 Updated by Sage Weil over 8 years ago
- Assignee deleted (
Samuel Just)
#16 Updated by Sage Weil over 8 years ago
- Status changed from 12 to Can't reproduce
Pinged Dmitry to see if he is sitll seeing this or has a log
#17 Updated by Dmitry Smirnov over 8 years ago
- Status changed from Can't reproduce to Resolved
I'm unable to reproduce it any more, assuming fixed.
#18 Updated by Sage Weil about 8 years ago
- Status changed from Resolved to 12
this popped up again: ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-11-17_02:32:01-rados-giant-distro-basic-multi/604740
#19 Updated by Samuel Just about 8 years ago
- Assignee set to Samuel Just
#20 Updated by Samuel Just about 8 years ago
Urgh, non-blocking flushes do not cause scrub to pause. I think the simplest solution is to fail a non-blocking scrub in try_flush_mark_clean if the object is being scrubbed.
#21 Updated by Samuel Just about 8 years ago
- Status changed from 12 to 7
#22 Updated by Samuel Just about 8 years ago
- Status changed from 7 to Pending Backport
- Backport set to giant,firefly
#23 Updated by Sage Weil about 8 years ago
- Status changed from Pending Backport to 12
happened again, i believe with the latest fix applied.
ubuntu@teuthology:/a/teuthology-2015-01-25_23:10:02-knfs-next-testing-basic-multi/722944
#24 Updated by Yuri Weinstein about 8 years ago
Also see in run: http://pulpito.ceph.com/teuthology-2015-01-27_17:13:01-upgrade:firefly-x-next-distro-basic-multi/
Job: ['726068']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-27_17:13:01-upgrade:firefly-x-next-distro-basic-multi/726068/teuthology.log
2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool, bool)' thread 7fd534fd4700 time 2015-01-28 07:11:26.502343 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr:osd/ReplicatedPG.cc: 5943: FAILED assert(soid < scrubber.start || soid >= scrubber.end) 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr: ceph version 0.91-388-g5064787 (50647876971a2fe65a02e4de3c0bc62fec4887c4) 2015-01-28T07:11:26.339 INFO:tasks.ceph.osd.0.burnupi16.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xadb3df] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 2: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool, bool)+0x1bd3) [0x826f93] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x16d) [0x845d6d] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xa13) [0x846963] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x2987) [0x8519a7] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x63f) [0x7ea05f] 2015-01-28T07:11:26.340 INFO:tasks.ceph.osd.0.burnupi16.stderr: 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x17f) [0x6647ef] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 8: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x65f) [0x66524f] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x65c) [0xacab4c] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xacd720] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 11: (()+0x7e9a) [0x7fd54fe2de9a] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: 12: (clone()+0x6d) [0x7fd54e5d8ccd] 2015-01-28T07:11:26.341 INFO:tasks.ceph.osd.0.burnupi16.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
#25 Updated by Irek Fasikhov almost 8 years ago
Yes, I have the same error.
[root@ceph04 ceph]# ceph -v
ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)
Fix will be in version 0.80.9? Thanks
2015-01-28 09:50:11.082466 7fbc1437b700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7fbc1437b700 time 2015-01-28 09:50:10.852829 osd/ReplicatedPG.cc: 5318: FAILED assert(soid < scrubber.start || soid >= scrubber.end) ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7) 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x32f5) [0x889275] 2: (ReplicatedPG::finish_promote(int, std::tr1::shared_ptr<OpRequest>, ReplicatedPG::CopyResults*, std::tr1::shared_ptr<ObjectContext>)+0x110f) [0x8903ef] 3: (PromoteCallback::finish(boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type , boost::tuples::null_type, boost::tuples::null_type>)+0x78) [0x8e29b8] 4: (GenContext<boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tupl es::null_type, boost::tuples::null_type> >::complete(boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>)+0x15) [0x8b34f5] 5: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x747) [0x885407] 6: (C_Copyfrom::finish(int)+0xb7) [0x8e2777] 7: (Context::complete(int)+0x9) [0x667209] 8: (Finisher::finisher_thread_entry()+0x1d8) [0x9ed148] 9: (()+0x79d1) [0x7fbc36e5f9d1] 10: (clone()+0x6d) [0x7fbc35dd88fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events ---
#26 Updated by Samuel Just almost 8 years ago
Irek, you are seeing the previous incarnation of this bug. The relevant fix has not yet been backported to firefly.
#27 Updated by Samuel Just almost 8 years ago
This most recent incarnation is due to oi digests: we block_writes until the end of COMPARE_MAPS. The assert should not fire if !block_writes. Previously, we would always change scrubber.start to be the same as scrubber.end as we changed block_writes.
#28 Updated by Samuel Just almost 8 years ago
Making patch.
#29 Updated by Samuel Just almost 8 years ago
Also, this is an entirely distinct bug, so I'm making this Pending Backport again and opening a new one.
#30 Updated by Samuel Just almost 8 years ago
- Status changed from 12 to Pending Backport
#31 Updated by Irek Fasikhov almost 8 years ago
Samuel.
yes, this is another mistake. http://tracker.ceph.com/issues/10433#note-3
#32 Updated by Loïc Dachary almost 8 years ago
- Severity changed from 3 - minor to 2 - major
#33 Updated by Loïc Dachary almost 8 years ago
- firefly backport https://github.com/ceph/ceph/pull/3943
#34 Updated by Loïc Dachary almost 8 years ago
- giant backport https://github.com/ceph/ceph/pull/4053
#35 Updated by Loïc Dachary almost 8 years ago
- Status changed from Pending Backport to Resolved