Project

General

Profile

Actions

Bug #917

closed

OSD sending out-of-order acks

Added by Brian Chrisman about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was detected by a cfuse ObjectCacher assert.
The relevant messages from the OSD are:

2011-03-24 20:30:19.738078 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4114 ==== osd_op_reply(24066 10000005dcb.00000009 [write 20480~20480] ack = 0) v1 ==== 106+0+0 (366877589 0 0) 0x7f5fd99a41a0 con 0x1014d70

2011-03-24 20:30:19.738282 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4115 ==== osd_op_reply(24068 10000005dcb.00000009 [write 315392~16384] ack = 0) v1 ==== 106+0+0 (2786375306 0 0) 0x7f5fd9486410 con 0x1014d70

2011-03-24 20:30:19.738433 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4116 ==== osd_op_reply(24069 10000005dcb.00000009 [write 339968~8192] ack = 0) v1 ==== 106+0+0 (223975158 0 0) 0x7f5fd920c8c0 con 0x1014d70

2011-03-24 20:30:19.743467 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4117 ==== osd_op_reply(24066 10000005dcb.00000009 [write 20480~20480] ondisk = 0) v1 ==== 106+0+0 (4070484477 0 0) 0x7f5fd99a41a0 con 0x1014d70

2011-03-24 20:30:19.743765 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4118 ==== osd_op_reply(24067 10000005dcb.00000009 [write 303104~8192] ondisk = 0) v1 ==== 106+0+0 (1211671729 0 0) 0x7f5fd9486410 con 0x1014d70

Each of these is a write operation.

Here's the full debug log for the core dump we're seeing from cfuse.
This is on v0.25.1.
I can reproduce this if required.

[write 1204224~57344] ack = 0) v1 ==== 106+0+0 (462844183 0 0)
0x7fc19c0aece0 con 0x238ed80
osdc/ObjectCacher.cc: In function 'void
ObjectCacher::bh_write_ack(int, sobject_t, loff_t, uint64_t, tid_t)',
in thread '0x7fc1ced85710'
osdc/ObjectCacher.cc: 673: FAILED assert(ob->last_ack_tid < tid)
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 7: (()+0x77e1) [0x7fc1d21b77e1]
 8: (clone()+0x6d) [0x7fc1d10a351d]
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 7: (()+0x77e1) [0x7fc1d21b77e1]
 8: (clone()+0x6d) [0x7fc1d10a351d]
*** Caught signal (Aborted) **
 in thread 0x7fc1ced85710
 ceph version 0.25.1 (commit:7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
 1: cfuse() [0x553dcc]
 2: (()+0xf4c0) [0x7fc1d21bf4c0]
 3: (gsignal()+0x35) [0x7fc1d0ff49b5]
 4: (abort()+0x175) [0x7fc1d0ff6195]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x7fc1d1899aad]
 6: (()+0xbcc36) [0x7fc1d1897c36]
 7: (()+0xbcc63) [0x7fc1d1897c63]
 8: (()+0xbcd5e) [0x7fc1d1897d5e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x373) [0x5400c3]
 10: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long,
unsigned long)+0x6b0) [0x4ef5a0]
 11: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe30a]
 12: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d6c]
 13: (Client::ms_dispatch(Message*)+0x143) [0x4a4e43]
 14: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
 15: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
 16: (()+0x77e1) [0x7fc1d21b77e1]
 17: (clone()+0x6d) [0x7fc1d10a351d]

Files

op_submit_24101_and_on.log.bz2 (70.6 KB) op_submit_24101_and_on.log.bz2 Brian Chrisman, 03/25/2011 11:50 AM
op_submit_24067_and_on.log.bz2 (72.4 KB) op_submit_24067_and_on.log.bz2 Brian Chrisman, 03/25/2011 01:49 PM
cfuse_log_8637_and_on.log.bz2 (40.2 KB) cfuse_log_8637_and_on.log.bz2 Brian Chrisman, 03/25/2011 06:21 PM
osd3.last.100klines.log.bz2 (745 KB) osd3.last.100klines.log.bz2 Brian Chrisman, 03/25/2011 06:21 PM
osd_log_00_00_until_last_osd_op.log.bz2 (1.03 MB) osd_log_00_00_until_last_osd_op.log.bz2 Brian Chrisman, 03/28/2011 05:23 PM
osd4_log_00_00_util_last_osd_op.log.bz2 (1.62 MB) osd4_log_00_00_util_last_osd_op.log.bz2 Brian Chrisman, 03/30/2011 12:44 PM
Actions #1

Updated by Greg Farnum about 13 years ago

What workload is triggering this? That assert looks familiar to me but a quick search doesn't turn up any references in the tracker.

Actions #2

Updated by Brian Chrisman about 13 years ago

Use case is a little odd, as we're trying to test the limits of the fs and apps stacked atop of it:
linux-samba-remote-client mounting samba-share which is exporting cfuse-mounted-ceph-fs

It's a fairly big stack, but it works fine when we use the kernel client rather than cfuse (our load doesn't hit the kernel-client/osd deadlock).

Since we're hitting an assert, I'm guessing there's some a priori knowledge as to what might cause this.

Actions #3

Updated by Greg Farnum about 13 years ago

  • Assignee set to Greg Farnum
Actions #4

Updated by Brian Chrisman about 13 years ago

Here's the last 100 lines of the log with objecter debug. I'll grab a chunk of the debug file to keep around in case more is helpful.

2011-03-24 20:30:19.739526 7f5fed0ac720 client4156 fill_stat on 100000015d1 snap/dev=head mode 040755 mtime 2011-03-21 17:39:44.000000 ctime 2011-03-24 18:38:23.690451
2011-03-24 20:30:19.739538 7f5fed0ac720 client4156 ll_getattr 100000015d1.head = 0
2011-03-24 20:30:19.739568 7f5fed0ac720 client4156 ll_getattr 1.head
2011-03-24 20:30:19.739588 7f5fed0ac720 client4156 _getattr mask pAsLsXsFs issued=1
2011-03-24 20:30:19.739603 7f5fed0ac720 client4156 fill_stat on 1 snap/dev=head mode 040755 mtime 2011-03-23 22:30:44.274451 ctime 2011-03-23 22:30:44.274451
2011-03-24 20:30:19.739614 7f5fed0ac720 client4156 ll_getattr 1.head = 0
2011-03-24 20:30:19.739642 7f5fed0ac720 client4156 ll_lookup 1.head fs0
2011-03-24 20:30:19.739659 7f5fed0ac720 client4156 _lookup have dn fs0 mds-1 ttl 0.000000 seq 0
2011-03-24 20:30:19.739685 7f5fed0ac720 client4156 _lookup 1.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40755 mtime=2011-03-23 22:30:44.274451) fs0 = 10000000000.head( cap_refs={} open={} ref=3 caps=pAsLsXsFsx mode=40755 mtime=2011-03-24 18:03:22.796402 parent=0x7f5fe0005d40)
2011-03-24 20:30:19.739709 7f5fed0ac720 client4156 fill_stat on 10000000000 snap/dev=head mode 040755 mtime 2011-03-24 18:03:22.796402 ctime 2011-03-24 18:03:22.796402
2011-03-24 20:30:19.739722 7f5fed0ac720 client4156 _ll_get 0x7f5fe00057e0 10000000000 > 3488831
2011-03-24 20:30:19.739734 7f5fed0ac720 client4156 ll_lookup 1.head fs0 -> 0 (10000000000)
2011-03-24 20:30:19.739765 7f5fed0ac720 client4156 ll_getattr 10000000000.head
2011-03-24 20:30:19.739784 7f5fed0ac720 client4156 _getattr mask pAsLsXsFs issued=1
2011-03-24 20:30:19.739800 7f5fed0ac720 client4156 fill_stat on 10000000000 snap/dev=head mode 040755 mtime 2011-03-24 18:03:22.796402 ctime 2011-03-24 18:03:22.796402
2011-03-24 20:30:19.739812 7f5fed0ac720 client4156 ll_getattr 10000000000.head = 0
2011-03-24 20:30:19.739839 7f5fed0ac720 client4156 ll_lookup 10000000000.head buildshare
2011-03-24 20:30:19.739856 7f5fed0ac720 client4156 _lookup have dn buildshare mds-1 ttl 0.000000 seq 0
2011-03-24 20:30:19.739882 7f5fed0ac720 client4156 _lookup 10000000000.head( cap_refs={} open={} ref=3 caps=pAsLsXsFsx mode=40755 mtime=2011-03-24 18:03:22.796402 parent=0x7f5fe0005d40) buildshare = 100000015d0.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40777 mtime=2011-03-24 18:04:46.260149 parent=0x1000350)
2011-03-24 20:30:19.739898 7f5fed0ac720 client4156 fill_stat on 100000015d0 snap/dev=head mode 040777 mtime 2011-03-24 18:04:46.260149 ctime 2011-03-24 18:04:46.260149
2011-03-24 20:30:19.739910 7f5fed0ac720 client4156 _ll_get 0x7f5fe0006480 100000015d0 -> 3488806
2011-03-24 20:30:19.739922 7f5fed0ac720 client4156 ll_lookup 10000000000.head buildshare -> 0 (100000015d0)
2011-03-24 20:30:19.739949 7f5fed0ac720 client4156 ll_getattr 100000015d0.head
2011-03-24 20:30:19.739968 7f5fed0ac720 client4156 _getattr mask pAsLsXsFs issued=1
2011-03-24 20:30:19.739983 7f5fed0ac720 client4156 fill_stat on 100000015d0 snap/dev=head mode 040777 mtime 2011-03-24 18:04:46.260149 ctime 2011-03-24 18:04:46.260149
2011-03-24 20:30:19.739995 7f5fed0ac720 client4156 ll_getattr 100000015d0.head = 0
2011-03-24 20:30:19.740020 7f5fed0ac720 client4156 ll_lookup 100000015d0.head scale-product
2011-03-24 20:30:19.740038 7f5fed0ac720 client4156 _lookup have dn scale-product mds-1 ttl 0.000000 seq 0
2011-03-24 20:30:19.743322 7f5fed0ac720 client4156 _lookup 100000015d0.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40777 mtime=2011-03-24 18:04:46.260149 parent=0x1000350) scale-product = 100000015d1.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40755 mtime=2011-03-21 17:39:44.000000 parent=0xff02c0)
2011-03-24 20:30:19.743395 7f5fed0ac720 client4156 fill_stat on 100000015d1 snap/dev=head mode 040755 mtime 2011-03-21 17:39:44.000000 ctime 2011-03-24 18:38:23.690451
2011-03-24 20:30:19.743414 7f5fed0ac720 client4156 _ll_get 0x7f5fe0007e80 100000015d1 -> 4821973
2011-03-24 20:30:19.743430 7f5fed0ac720 client4156 ll_lookup 100000015d0.head scale-product -> 0 (100000015d1)
2011-03-24 20:30:19.743467 7f5fe9c4c710 -
192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4117 ==== osd_op_reply(24066 10000005dcb.00000009 [write 20480~20480] ondisk = 0) v1 ==== 106+0+0 (4070484477 0 0) 0x7f5fd99a41a0 con 0x1014d70
2011-03-24 20:30:19.743512 7f5fed0ac720 client4156 ll_getattr 100000015d1.head
2011-03-24 20:30:19.743535 7f5fed0ac720 client4156 _getattr mask pAsLsXsFs issued=1
2011-03-24 20:30:19.743551 7f5fed0ac720 client4156 fill_stat on 100000015d1 snap/dev=head mode 040755 mtime 2011-03-21 17:39:44.000000 ctime 2011-03-24 18:38:23.690451
2011-03-24 20:30:19.743563 7f5fed0ac720 client4156 ll_getattr 100000015d1.head = 0
2011-03-24 20:30:19.743582 7f5fe9c4c710 client4156.objecter in handle_osd_op_reply
2011-03-24 20:30:19.743600 7f5fe9c4c710 client4156.objecter handle_osd_op_reply 24066 ondisk v 0'0 in 0.df90
2011-03-24 20:30:19.743631 7f5fe9c4c710 client4156.objecter handle_osd_op_reply safe
2011-03-24 20:30:19.743646 7f5fe9c4c710 client4156.objecter handle_osd_op_reply completed tid 24066
2011-03-24 20:30:19.743664 7f5fe9c4c710 client4156.objecter 39 unacked, 41 uncommitted
2011-03-24 20:30:19.743689 7f5fe9c4c710 client4156.objectcacher bh_write_commit 10000005dcb.00000009/head tid 24066 20480~20480
2011-03-24 20:30:19.743730 7f5fed0ac720 client4156 ll_lookup 100000015d1.head daemons
2011-03-24 20:30:19.743765 7f5fe9c4c710 -- 192.168.98.110:0/14840 <== osd1 192.168.98.109:6804/31545 4118 ==== osd_op_reply(24067 10000005dcb.00000009 [write 303104~8192] ondisk = 0) v1 ==== 106+0+0 (1211671729 0 0) 0x7f5fd9486410 con 0x1014d70
2011-03-24 20:30:19.743787 7f5fed0ac720 client4156 _lookup have dn daemons mds-1 ttl 0.000000 seq 0
2011-03-24 20:30:19.743816 7f5fed0ac720 client4156 _lookup 100000015d1.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40755 mtime=2011-03-21 17:39:44.000000 parent=0xff02c0) daemons = 10000001880.head( cap_refs={} open={} ref=3 caps=pAsLsXsFs mode=40755 mtime=2011-03-21 17:39:43.000000 parent=0x7fb7bd0)
2011-03-24 20:30:19.743833 7f5fed0ac720 client4156 fill_stat on 10000001880 snap/dev=head mode 040755 mtime 2011-03-21 17:39:43.000000 ctime 2011-03-24 18:12:38.997881
2011-03-24 20:30:19.743845 7f5fed0ac720 client4156 _ll_get 0x7f5fe00f8e80 10000001880 > 99723
2011-03-24 20:30:19.743857 7f5fed0ac720 client4156 ll_lookup 100000015d1.head daemons -> 0 (10000001880)
2011-03-24 20:30:19.743884 7f5fe9c4c710 client4156.objecter in handle_osd_op_reply
2011-03-24 20:30:19.743902 7f5fe9c4c710 client4156.objecter handle_osd_op_reply 24067 ondisk v 10'629 in 0.df90
2011-03-24 20:30:19.743913 7f5fe9c4c710 client4156.objecter handle_osd_op_reply ack
2011-03-24 20:30:19.743924 7f5fe9c4c710 client4156.objecter handle_osd_op_reply safe
2011-03-24 20:30:19.743948 7f5fe9c4c710 client4156.objecter handle_osd_op_reply completed tid 24067
2011-03-24 20:30:19.743964 7f5fe9c4c710 client4156.objecter 38 unacked, 40 uncommitted
2011-03-24 20:30:19.744065 7f5fe9c4c710 client4156.objecter Calling onack
>finish with rc 0
2011-03-24 20:30:19.744085 7f5fe9c4c710 client4156.objectcacher bh_write_ack 10000005dcb.00000009/head tid 24067 303104~8192
2011-03-24 20:30:19.744121 7f5fe9c4c710 client4156.objectcacher bh_write_ack clean bh[303104~8192 (8192) v 24067 clean firstbyte=69]
2011-03-24 20:30:19.744201 7f5fe9c4c710 client4156.objectcacher bh_write_ack skipping non-tx bh[311296~4096 (4096) v 23306 clean firstbyte=111]
osdc/ObjectCacher.cc: In function 'void ObjectCacher::bh_write_ack(int, sobject_t, loff_t, uint64_t, tid_t)', in thread '0x7f5fe9c4c710'
osdc/ObjectCacher.cc: 673: FAILED assert(ob->last_ack_tid < tid)
ceph version 0.25.1 (7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long, unsigned long)+0x6b0) [0x4ef5b0]
2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe31a]
3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d7c]
4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e53]
5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
7: (()+0x77e1) [0x7f5feca517e1]
8: (clone()+0x6d) [0x7f5feb93d51d]
ceph version 0.25.1 (7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
1: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long, unsigned long)+0x6b0) [0x4ef5b0]
2: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe31a]
3: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d7c]
4: (Client::ms_dispatch(Message*)+0x143) [0x4a4e53]
5: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
7: (()+0x77e1) [0x7f5feca517e1]
8: (clone()+0x6d) [0x7f5feb93d51d]
  • Caught signal (Aborted) *
    in thread 0x7f5fe9c4c710
    ceph version 0.25.1 (7f4a161e7fd1bb0d0eab3958b427a3c4c585bc05)
    1: cfuse() [0x553ddc]
    2: (()+0xf4c0) [0x7f5feca594c0]
    3: (gsignal()+0x35) [0x7f5feb88e9b5]
    4: (abort()+0x175) [0x7f5feb890195]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x12d) [0x7f5fec133aad]
    6: (()+0xbcc36) [0x7f5fec131c36]
    7: (()+0xbcc63) [0x7f5fec131c63]
    8: (()+0xbcd5e) [0x7f5fec131d5e]
    9: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x373) [0x5400d3]
    10: (ObjectCacher::bh_write_ack(int, sobject_t, long, unsigned long, unsigned long)+0x6b0) [0x4ef5b0]
    11: (ObjectCacher::C_WriteAck::finish(int)+0x5a) [0x4fe31a]
    12: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x93c) [0x4d0d7c]
    13: (Client::ms_dispatch(Message*)+0x143) [0x4a4e53]
    14: (SimpleMessenger::dispatch_entry()+0x882) [0x454b32]
    15: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x44d0ac]
    16: (()+0x77e1) [0x7f5feca517e1]
    17: (clone()+0x6d) [0x7f5feb93d51d]
Actions #5

Updated by Brian Chrisman about 13 years ago

Here's the log from tid (osd tid) 24101 and forward.
Thanks for your assistance.

Actions #6

Updated by Brian Chrisman about 13 years ago

Apologies... didn't track all the way back to the correct transaction.

Actions #7

Updated by Greg Farnum about 13 years ago

  • Subject changed from cfuse core dumping to OSD sending out-of-order acks

Updating bug name/description to match actual problem.

Updated by Brian Chrisman about 13 years ago

I'm attaching a cfuse log and an osd log.
Is the transaction id supposed to be showing up in the osd?
From inspecting the cfuse log, it looked like the transaction it fails on was going to osd3, so I pulled the last 100k lines of that log file.
I'll try to keep this data around for a while.

Actions #9

Updated by Sage Weil about 13 years ago

  • Target version set to v0.27
  • Translation missing: en.field_position set to 544
Actions #10

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_story_points set to 3
  • Translation missing: en.field_position deleted (560)
  • Translation missing: en.field_position set to 343
Actions #11

Updated by Sage Weil about 13 years ago

  • Translation missing: en.field_position deleted (350)
  • Translation missing: en.field_position set to 338
Actions #12

Updated by Brian Chrisman about 13 years ago

I think this catches it... has osd_op's in it and they should stop at the crash.
Log excerpt starts 20min before crash and goes a little ways past last osd_op.

Actions #13

Updated by Greg Farnum about 13 years ago

  • Status changed from New to In Progress

Well, in this log the problem is that osd4 is acking client4168.0:8637 and then committing client4168.0:8636 (without a prior ack of 8636). We could perhaps add some asserts in the ack handling to check for these issues, but I'm not sure if that's appropriate just now.

So we need the logs from osd4 (or, in a future run, whoever is the replica for this primary).

Actions #14

Updated by Brian Chrisman about 13 years ago

attaching osd4 logs from same start time until last osd_op...

Actions #15

Updated by Greg Farnum about 13 years ago

  • Status changed from In Progress to Resolved

In 493e2d952ad24d8c8cab372e942ea3e18169ab4e.

It turns out that repop acks were getting CEPH_MSG_PRIO_HIGH, but the repop commits were neglected. This was breaking their dispatch order in the primary OSD, and thus the order the responses were sent back to the client. Apparently the object cacher was the only module to actually check order (at least in scenarios where this was likely to matter), and I'm relieved to report that this wasn't the result of an actual on-disk ordering problem!

Actions

Also available in: Atom PDF