Project

General

Profile

Actions

Bug #10433

closed

OSD osd/ReplicatedPG.cc: 5540: FAILED assert(soid < scrubber.start || soid >= scrubber.end)

Added by Laurent GUERBY over 9 years ago. Updated about 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

We just got this on one of our OSD after a few days in production of ceph version 0.87-73-g70a5569 (70a5569e34786d4124e37561473f1aa02c80f779):

2014-12-26 08:54:32.320625 7f116dd9b700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7f116dd9b700 time 2014-12-26 08:54:32.310448
osd/ReplicatedPG.cc: 5540: FAILED assert(soid < scrubber.start || soid >= scrubber.end)

ceph version 0.87-73-g70a5569 (70a5569e34786d4124e37561473f1aa02c80f779)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xbc8056]
2: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x2087) [0x851057]
3: (ReplicatedPG::try_flush_mark_clean(boost::shared_ptr&lt;ReplicatedPG::FlushOp&gt;)+0x730) [0x8519c0]
4: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x24c) [0x85242c]
5: (C_Flush::finish(int)+0x99) [0x8c22c9]
6: (Context::complete(int)+0x9) [0x69b8d9]
7: (Finisher::finisher_thread_entry()+0x158) [0xafc4c8]
8: (()+0x80a4) [0x7f11966b90a4]
9: (clone()+0x6d) [0x7f1194c13ccd]

...
10> 2014-12-26 08:54:32.253186 7f115d40a700 5 - op tracker -- seq: 25018348, time: 2014-12-26 08:54:32.253151, event: header_read, op: osd_sub_op_reply(client.11180247.0:11830200 58.cb 96d664cb/rb.0.10061c.2ae8944a.000000000\
7b5/head//58 [] ondisk, result = 0)
9> 2014-12-26 08:54:32.253188 7f115d40a700 5 - op tracker -- seq: 25018348, time: 2014-12-26 08:54:32.253151, event: throttled, op: osd_sub_op_reply(client.11180247.0:11830200 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b\
5/head//58 [] ondisk, result = 0)
8> 2014-12-26 08:54:32.253200 7f115d40a700 5 - op tracker -- seq: 25018348, time: 2014-12-26 08:54:32.253178, event: all_read, op: osd_sub_op_reply(client.11180247.0:11830200 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b5\
/head//58 [] ondisk, result = 0)
7> 2014-12-26 08:54:32.253202 7f115d40a700 5 - op tracker -- seq: 25018348, time: 0.000000, event: dispatched, op: osd_sub_op_reply(client.11180247.0:11830200 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b5/head//58 [] ond\
isk, result = 0)
6> 2014-12-26 08:54:32.253456 7f115d40a700 1 - 192.168.99.251:6804/8026 <== osd.8 192.168.99.247:6811/16157 1319599 ==== osd_sub_op_reply(client.11180247.0:11830201 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b5/head//58 \
[] ondisk, result = 0) v2 ==== 171+0+0 (1299343257 0 0) 0x488d9180 con 0x2645c940
5> 2014-12-26 08:54:32.253474 7f115d40a700 5 - op tracker -- seq: 25018349, time: 2014-12-26 08:54:32.253432, event: header_read, op: osd_sub_op_reply(client.11180247.0:11830201 58.cb 96d664cb/rb.0.10061c.2ae8944a.000000000\
7b5/head//58 [] ondisk, result = 0)
4> 2014-12-26 08:54:32.253477 7f115d40a700 5 - op tracker -- seq: 25018349, time: 2014-12-26 08:54:32.253433, event: throttled, op: osd_sub_op_reply(client.11180247.0:11830201 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b\
5/head//58 [] ondisk, result = 0)
3> 2014-12-26 08:54:32.253491 7f115d40a700 5 - op tracker -- seq: 25018349, time: 2014-12-26 08:54:32.253454, event: all_read, op: osd_sub_op_reply(client.11180247.0:11830201 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b5\
/head//58 [] ondisk, result = 0)
2> 2014-12-26 08:54:32.253493 7f115d40a700 5 - op tracker -- seq: 25018349, time: 0.000000, event: dispatched, op: osd_sub_op_reply(client.11180247.0:11830201 58.cb 96d664cb/rb.0.10061c.2ae8944a.0000000007b5/head//58 [] ond\
isk, result = 0)
1> 2014-12-26 08:54:32.304854 7f118af6d700 5 - op tracker -- seq: 25018288, time: 2014-12-26 08:54:32.304854, event: journaled_completion_queued, op: osd_op(client.11180247.0:11830225 rb.0.10061c.2ae8944a.0000000013f5 [set-\
alloc-hint object_size 4194304 write_size 4194304,write 557056~20480] 58.f0ed751c ack+ondisk+write+known_if_redirected e30192)
0> 2014-12-26 08:54:32.320625 7f116dd9b700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)' thread 7f116dd9b700 time 2014-12-26 08:54:32.310448
osd/ReplicatedPG.cc: 5540: FAILED assert(soid < scrubber.start || soid >= scrubber.end)

ceph version 0.87-73-g70a5569 (70a5569e34786d4124e37561473f1aa02c80f779)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xbc8056]
2: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bool)+0x2087) [0x851057]
3: (ReplicatedPG::try_flush_mark_clean(boost::shared_ptr&lt;ReplicatedPG::FlushOp&gt;)+0x730) [0x8519c0]
4: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x24c) [0x85242c]
5: (C_Flush::finish(int)+0x99) [0x8c22c9]
6: (Context::complete(int)+0x9) [0x69b8d9]
7: (Finisher::finisher_thread_entry()+0x158) [0xafc4c8]
8: (()+0x80a4) [0x7f11966b90a4]
9: (clone()+0x6d) [0x7f1194c13ccd]

This happened just after the creation of a new openstack cinder volume in an erasure coded pool (4+1), might be related or not.

ceph-report-20141226.txt.gz is attached as the ceph-osd log.


Files

ceph-report-20141226.txt.gz (2.28 MB) ceph-report-20141226.txt.gz Laurent GUERBY, 12/26/2014 12:14 AM
ceph-osd.0.log.gz (239 KB) ceph-osd.0.log.gz Laurent GUERBY, 12/26/2014 12:14 AM
ceph-osd.24.log (976 KB) ceph-osd.24.log Irek Fasikhov, 02/26/2015 01:34 PM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #8011: osd/ReplicatedPG.cc: 5244: FAILED assert(soid < scrubber.start || soid >= scrubber.end)ResolvedSamuel Just04/07/2014

Actions
Actions

Also available in: Atom PDF