Bug #13017
closedosd/ReplicatedPG.cc: 2752: FAILED assert(0 == "out of order op")
0%
Description
0> 2015-09-08 19:39:39.089746 7f1930e04700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7f1930e04700 time 2015-09-08 19:39:39.085163 osd/ReplicatedPG.cc: 2752: FAILED assert(0 == "out of order op") ceph version 9.0.3-1504-gbb8c273 (bb8c273e573fb26b41dff093cb4efbda730645d1) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x5608d5e1ad1b] 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1a9e) [0x5608d5a807ae] 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1f1d) [0x5608d5a829ad] 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dd) [0x5608d5a1b7ad] 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x5608d587dbdd] 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x5608d587ddfd] 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x5608d58a29f4] 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x5608d5e0b73f] 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5608d5e0d640] 10: (()+0x8182) [0x7f19460c6182] 11: (clone()+0x6d) [0x7f194440d47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this./var/lib/teuthworker/archive/sage-2015-09-08_17:29:00-rados-wip-sage-testing---basic-multi/1047010
async msgr?
Updated by Haomai Wang over 8 years ago
Eh, I already find this problem and this is the record:
On Sat, Aug 15, 2015 at 11:12 PM, Haomai Wang <haomaiwang@gmail.com> wrote:
Hi,
I'm not sure whether this is a test related bug:
this job(http://pulpito.ceph.com/haomai-2015-08-14_23:38:08-rados-master-distro-basic-multi/1015236/)
enable "osd_debug_op_order", then client.4119 send 4 write ops(tid
are 1538, 1539, 1540, 1541, 1542) to osd.4. Then osd.4 process 4
message and set last_tid=1542 in "debug_op_order".Before client.4119 received op reply, it inject socket error. Then it
will call _kick_requests and resend ops, but these ops's tid wont't
change. So these four ops will send again, then it will hit "out of
order op" assert at osd side.Is this true for reality?
Need to clarify, I check the pglog trim status, it trimed the tid=1538
request so osd won't find this is a dup op using
"PGLog::IndexedLog::get_request".
Ah, yeah, you can ignore that failure, then. In reality the pg logs are
long enough to cover this.
Updated by Yuri Weinstein over 8 years ago
- Related to Bug #13666: "osd/ReplicatedPG.cc: 2818: FAILED assert(0 == "out of order op")" with async msgr added