Bug #8747
closedOSD crash on scrub:osd/ReplicatedPG.cc: 5297: FAILED assert(soid < scrubber.start || soid >= scrubber.end)
0%
Description
On 0.80.1 one OSD crashed several times as follows (full log attached):
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7fb0ee9d0700 time 2014-07-05 06: 21:00.105868 osd/ReplicatedPG.cc: 5297: FAILED assert(soid < scrubber.start || soid >= scrubber.end) ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0xad8) [0x7fb10e0c35a8] 2: (ReplicatedPG::finish_promote(int, std::tr1::shared_ptr<OpRequest>, ReplicatedPG::CopyResults*, std::tr1::shared_ptr<ObjectContext>)+0x48e) [0x7fb10e0c868e] 3: (PromoteCallback::finish(boost::tuples::tuple<int, ReplicatedPG::CopyResults*, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_typ e, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type, boost::tuples::null_type>)+0x64) [0x7fb10e131364] 4: (ReplicatedPG::process_copy_chunk(hobject_t, unsigned long, int)+0x4df) [0x7fb10e0c74cf] 5: (C_Copyfrom::finish(int)+0x12a) [0x7fb10e13125a] 6: (Context::complete(int)+0x9) [0x7fb10df2d559] 7: (Finisher::finisher_thread_entry()+0x1b8) [0x7fb10e2accd8] 8: (()+0x80ca) [0x7fb10d40b0ca] 9: (clone()+0x6d) [0x7fb10b91fffd]
Files
Updated by Dmitry Smirnov almost 10 years ago
- File ceph-osd.0.log.xz ceph-osd.0.log.xz added
Updated by Dmitry Smirnov almost 10 years ago
I use my local build of 0.80.1 with 29ee6faecb9e16c63acae8318a7c8f6b14367af7 (from branch "firefly") applied yet this problem has happened...
Updated by Dmitry Smirnov almost 10 years ago
- Status changed from New to Closed
I found that two OSDs of 12 were running 0.80.1 without backported patch from #8011.
Interesting to note that the affected OSD was patched.
I re-built Ceph from head of "firefly" branch and upgraded the whole cluster.
Since then I could not reproduce the problem.
This bug appears to be fixed so I'm closing it for now.
Updated by Dmitry Smirnov almost 10 years ago
- Status changed from Closed to New
Re-opening as I just reproduced the issue. Sorry.
This happened again (probably) on attempt to repair inconsistent PG.
Please advise.
Updated by Dmitry Smirnov almost 10 years ago
No improvement with 0.80.3 -- I'm still getting those crashes frequently on "deep-scrub" and "repair".
Sometimes two OSD crash simultaneously.
Updated by Dmitry Smirnov almost 10 years ago
Although it takes up to an hour to reproduce I seems to have a reliable way to do so.
I shall be happy to capture detailed logs (e.g. `debug osd = 20, debug filestore = 20, debug ms = 1`) if necessary.
Updated by Samuel Just almost 10 years ago
Yeah, 8011 seems to be less dead then we thought, reopening.
Updated by Dmitry Smirnov over 9 years ago
I can't reproduce any more on 0.80.5 + Firefly HEAD as of 2014-09-16...