Project

General

Profile

Actions

Bug #917

closed

OSD sending out-of-order acks

Added by Brian Chrisman about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was detected by a cfuse ObjectCacher assert.
The relevant messages from the OSD are:

2011-03-24 20:30:19.738078 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4114 ==== osd_op_reply(24066 10000005dcb.00000009 [write 20480~20480] ack = 0) v1 ==== 106+0+0 (366877589 0 0) 0x7f5fd99a41a0 con 0x1014d70

2011-03-24 20:30:19.738282 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4115 ==== osd_op_reply(24068 10000005dcb.00000009 [write 315392~16384] ack = 0) v1 ==== 106+0+0 (2786375306 0 0) 0x7f5fd9486410 con 0x1014d70

2011-03-24 20:30:19.738433 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4116 ==== osd_op_reply(24069 10000005dcb.00000009 [write 339968~8192] ack = 0) v1 ==== 106+0+0 (223975158 0 0) 0x7f5fd920c8c0 con 0x1014d70

2011-03-24 20:30:19.743467 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4117 ==== osd_op_reply(24066 10000005dcb.00000009 [write 20480~20480] ondisk = 0) v1 ==== 106+0+0 (4070484477 0 0) 0x7f5fd99a41a0 con 0x1014d70

2011-03-24 20:30:19.743765 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4118 ==== osd_op_reply(24067 10000005dcb.00000009 [write 303104~8192] ondisk = 0) v1 ==== 106+0+0 (1211671729 0 0) 0x7f5fd9486410 con 0x1014d70

Each of these is a write operation.

Here's the full debug log for the core dump we're seeing from cfuse.
This is on v0.25.1.
I can reproduce this if required.

[write 1204224~57344] ack = 0) v1 ==== 106+0+0 (462844183 0 0)
0x7fc19c0aece0 con 0x238ed80
osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_ack(int, sobject_t, loff_t, uint64_t, tid_t)',
in thread '0x7fc1ced85710'
osdc/ObjectCacher.cc: 673: FAILED assert(ob->last_ack_tid < tid)
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 7: (()+0x77e1) [0x7fc1d21b77e1]
 8: (clone()+0x6d) [0x7fc1d10a351d]
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 7: (()+0x77e1) [0x7fc1d21b77e1]
 8: (clone()+0x6d) [0x7fc1d10a351d]
*** Caught signal (Aborted) **
 in thread 0x7fc1ced85710
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: cfuse() [0x553dcc]
 2: (()+0xf4c0) [0x7fc1d21bf4c0]
 3: (gsignal()+0x35) [0x7fc1d0ff49b5]
 4: (abort()+0x175) [0x7fc1d0ff6195]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fc1d1899aad]
 6: (()+0xbcc36) [0x7fc1d1897c36]
 7: (()+0xbcc63) [0x7fc1d1897c63]
 8: (()+0xbcd5e) [0x7fc1d1897d5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x373) [0x5400c3]
 10: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 11: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 12: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 13: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 14: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 15: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 16: (()+0x77e1) [0x7fc1d21b77e1]
 17: (clone()+0x6d) [0x7fc1d10a351d]

Files

op_submit_24101_and_on.log.bz2 (70.6 KB) op_submit_24101_and_on.log.bz2 Brian Chrisman, 03/25/2011 11:50 AM
op_submit_24067_and_on.log.bz2 (72.4 KB) op_submit_24067_and_on.log.bz2 Brian Chrisman, 03/25/2011 01:49 PM
cfuse_log_8637_and_on.log.bz2 (40.2 KB) cfuse_log_8637_and_on.log.bz2 Brian Chrisman, 03/25/2011 06:21 PM
osd3.last.100klines.log.bz2 (745 KB) osd3.last.100klines.log.bz2 Brian Chrisman, 03/25/2011 06:21 PM
osd_log_00_00_until_last_osd_op.log.bz2 (1.03 MB) osd_log_00_00_until_last_osd_op.log.bz2 Brian Chrisman, 03/28/2011 05:23 PM
osd4_log_00_00_util_last_osd_op.log.bz2 (1.62 MB) osd4_log_00_00_util_last_osd_op.log.bz2 Brian Chrisman, 03/30/2011 12:44 PM
Actions

Also available in: Atom PDF