Project

General

Profile

Actions

Bug #13017

closed

osd/ReplicatedPG.cc: 2752: FAILED assert(0 == "out of order op")

Added by Sage Weil over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2015-09-08 19:39:39.089746 7f1930e04700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7f1930e04700 time 2015-09-08 19:39:39.085163
osd/ReplicatedPG.cc: 2752: FAILED assert(0 == "out of order op")

 ceph version 9.0.3-1504-gbb8c273 (bb8c273e573fb26b41dff093cb4efbda730645d1)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x5608d5e1ad1b]
 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1a9e) [0x5608d5a807ae]
 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x1f1d) [0x5608d5a829ad]
 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x6dd) [0x5608d5a1b7ad]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x5608d587dbdd]
 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x5608d587ddfd]
 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8c4) [0x5608d58a29f4]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x5608d5e0b73f]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5608d5e0d640]
 10: (()+0x8182) [0x7f19460c6182]
 11: (clone()+0x6d) [0x7f194440d47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/var/lib/teuthworker/archive/sage-2015-09-08_17:29:00-rados-wip-sage-testing---basic-multi/1047010

async msgr?


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #13666: "osd/ReplicatedPG.cc: 2818: FAILED assert(0 == "out of order op")" with async msgrResolvedHaomai Wang11/01/2015

Actions
Actions #1

Updated by Haomai Wang over 8 years ago

Eh, I already find this problem and this is the record:

On Sat, Aug 15, 2015 at 11:12 PM, Haomai Wang <> wrote:

Hi,

I'm not sure whether this is a test related bug:

this job(http://pulpito.ceph.com/haomai-2015-08-14_23:38:08-rados-master-distro-basic-multi/1015236/)
enable "osd_debug_op_order", then client.4119 send 4 write ops(tid
are 1538, 1539, 1540, 1541, 1542) to osd.4. Then osd.4 process 4
message and set last_tid=1542 in "debug_op_order".

Before client.4119 received op reply, it inject socket error. Then it
will call _kick_requests and resend ops, but these ops's tid wont't
change. So these four ops will send again, then it will hit "out of
order op" assert at osd side.

Is this true for reality?

Need to clarify, I check the pglog trim status, it trimed the tid=1538
request so osd won't find this is a dup op using
"PGLog::IndexedLog::get_request".

Ah, yeah, you can ignore that failure, then. In reality the pg logs are
long enough to cover this.

Actions #2

Updated by Haomai Wang over 8 years ago

  • Status changed from New to Closed
Actions #3

Updated by Yuri Weinstein over 8 years ago

  • Related to Bug #13666: "osd/ReplicatedPG.cc: 2818: FAILED assert(0 == "out of order op")" with async msgr added
Actions

Also available in: Atom PDF