Project

General

Profile

Actions

Bug #15407

closed

"ReplicatedPG.cc: 2986: FAILED assert(0 == "out of order op")" with short_pg_log

Added by Yuri Weinstein about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, rbd
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run: http://pulpito.ceph.com/teuthology-2016-04-05_22:00:02-rados-jewel-distro-basic-smithi/
Jobs: ['110534', '110572']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-05_22:00:02-rados-jewel-distro-basic-smithi/110534/teuthology.log

2016-04-06T12:25:13.398 INFO:tasks.ceph.osd.5.smithi027.stderr:2016-04-06 19:25:13.396798 7f7905f9e700 -1 osd.5 pg_epoch: 54 pg[6.4( v 54'6743 (54'6600,54'6743] local-les=14 n=22 ec=13 les/c/f 14/14/0 13/13/13) [5,1,3] r=0 lpr=13 crt=54'6739 lcod 54'6742 mlcod 54'6742 active+clean] bad op order, already applied 23910 > this 23404
2016-04-06T12:25:13.400 INFO:tasks.ceph.osd.5.smithi027.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7f7905f9e700 time 2016-04-06 19:25:13.396804
2016-04-06T12:25:13.400 INFO:tasks.ceph.osd.5.smithi027.stderr:osd/ReplicatedPG.cc: 2986: FAILED assert(0 == "out of order op")
2016-04-06T12:25:13.400 INFO:tasks.ceph.osd.5.smithi027.stderr: ceph version 10.1.0-445-g370e4f7 (370e4f773a5347a2d0c0493ccf3dc55909b75bce)
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f7927408955]
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1a74) [0x7f7926f0b174]
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x31be) [0x7f7926f0e8ce]
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x747) [0x7f7926ecb3b7]
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7f7926d817bd]
2016-04-06T12:25:13.401 INFO:tasks.ceph.osd.5.smithi027.stderr: 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x6d) [0x7f7926d81a0d]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x7f7926d86539]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x887) [0x7f79273f88f7]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f79273fa860]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: 10: (()+0x7dc5) [0x7f792512cdc5]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: 11: (clone()+0x6d) [0x7f79237b828d]
2016-04-06T12:25:13.402 INFO:tasks.ceph.osd.5.smithi027.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Related to Ceph - Bug #16975: "ReplicatedPG.cc: 3030: FAILED assert(0 == "out of order op")" in rados-jewel-distro-basic-smithiResolvedJosh Durgin08/09/2016

Actions
Actions #1

Updated by Samuel Just about 8 years ago

sjust@teuthology:/a/teuthology-2016-04-05_22:00:02-rados-jewel-distro-basic-smithi/110572/remote is due to the short_pg_log yaml, annoying.

as is sjust@teuthology:/a/teuthology-2016-04-05_22:00:02-rados-jewel-distro-basic-smithi/110534/remote

So, not a real bug, but the short_pg_log.yaml is making logs short enough that dup detection isn't working. I guess we'll have to make it less extreme...

Actions #2

Updated by Samuel Just about 8 years ago

  • Assignee set to Samuel Just
Actions #3

Updated by Samuel Just about 8 years ago

  • Status changed from New to 7
Actions #4

Updated by Yuri Weinstein about 8 years ago

  • ceph-qa-suite rbd added

In rbd run too
http://pulpito.ceph.com/teuthology-2016-04-15_12:01:09-rbd-jewel-distro-basic-smithi/
Job: ['132128']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2016-04-15_12:01:09-rbd-jewel-distro-basic-smithi/132128/teuthology.log

2016-04-18T15:42:41.165 INFO:tasks.ceph.osd.3.smithi065.stderr:osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7fdc7bfa6700 time 2016-04-18 22:42:41.160785
2016-04-18T15:42:41.165 INFO:tasks.ceph.osd.3.smithi065.stderr:osd/ReplicatedPG.cc: 3024: FAILED assert(0 == "out of order op")
2016-04-18T15:42:41.165 INFO:tasks.ceph.osd.3.smithi065.stderr: ceph version 10.1.2-62-gd0df02d (d0df02dbd694203d98fed1b80593927a7fe3c042)
2016-04-18T15:42:41.166 INFO:tasks.ceph.osd.3.smithi065.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7fdc9c1dc47b]
2016-04-18T15:42:41.166 INFO:tasks.ceph.osd.3.smithi065.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1a15) [0x7fdc9bd02d65]
2016-04-18T15:42:41.166 INFO:tasks.ceph.osd.3.smithi065.stderr: 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x36ee) [0x7fdc9bd0699e]
2016-04-18T15:42:41.166 INFO:tasks.ceph.osd.3.smithi065.stderr: 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x70c) [0x7fdc9bcc2d4c]
2016-04-18T15:42:41.167 INFO:tasks.ceph.osd.3.smithi065.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) [0x7fdc9bb831b5]
2016-04-18T15:42:41.167 INFO:tasks.ceph.osd.3.smithi065.stderr: 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7fdc9bb833dd]
2016-04-18T15:42:41.167 INFO:tasks.ceph.osd.3.smithi065.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x869) [0x7fdc9bb87de9]
2016-04-18T15:42:41.167 INFO:tasks.ceph.osd.3.smithi065.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x877) [0x7fdc9c1cc977]
2016-04-18T15:42:41.167 INFO:tasks.ceph.osd.3.smithi065.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fdc9c1ce8a0]
2016-04-18T15:42:41.168 INFO:tasks.ceph.osd.3.smithi065.stderr: 10: (()+0x8182) [0x7fdc9a49d182]
2016-04-18T15:42:41.168 INFO:tasks.ceph.osd.3.smithi065.stderr: 11: (clone()+0x6d) [0x7fdc985cb47d]
2016-04-18T15:42:41.168 INFO:tasks.ceph.osd.3.smithi065.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-04-18T15:42:41.168 INFO:tasks.ceph.osd.3.smithi065.stderr:2016-04-18 22:42:41.163358 7fdc7bfa6700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 
Actions #5

Updated by Samuel Just almost 8 years ago

  • Subject changed from "ReplicatedPG.cc: 2986: FAILED assert(0 == "out of order op")" in rados-jewel-distro-basic-smithi to "ReplicatedPG.cc: 2986: FAILED assert(0 == "out of order op")" with short_pg_log
Actions #6

Updated by Loïc Dachary almost 8 years ago

http://pulpito.ceph.com/teuthology-2016-07-17_04:20:03-upgrade:jewel-x-master-distro-basic-vps/319485/

2016-07-17T11:14:51.004 INFO:tasks.ceph.osd.0.vpm107.stderr:/srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/osd/ReplicatedPG.cc: 3044: FAILED assert(0 == "out of order op")
2016-07-17T11:14:51.008 INFO:tasks.ceph.osd.0.vpm107.stderr: ceph version v11.0.0-656-g3e57754 (3e5775475baf315885295ae47a0f4a86820bfe0f)
2016-07-17T11:14:51.012 INFO:tasks.ceph.osd.0.vpm107.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xd784f5]
2016-07-17T11:14:51.016 INFO:tasks.ceph.osd.0.vpm107.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x204c) [0x8edb2c]
2016-07-17T11:14:51.023 INFO:tasks.ceph.osd.0.vpm107.stderr: 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x2810) [0x8f0680]
2016-07-17T11:14:51.026 INFO:tasks.ceph.osd.0.vpm107.stderr: 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x777) [0x8abc97]
2016-07-17T11:14:51.073 INFO:tasks.ceph.osd.0.vpm107.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x41d) [0x7620cd]
2016-07-17T11:14:51.088 INFO:tasks.ceph.osd.0.vpm107.stderr: 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> const&)+0x6d) [0x76231d]
2016-07-17T11:14:51.100 INFO:tasks.ceph.osd.0.vpm107.stderr: 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x86c) [0x78386c]
2016-07-17T11:14:51.113 INFO:tasks.ceph.osd.0.vpm107.stderr: 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) [0xd7e167]
2016-07-17T11:14:51.114 INFO:tasks.ceph.osd.0.vpm107.stderr: 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xd802c0]
2016-07-17T11:14:51.147 INFO:tasks.ceph.osd.0.vpm107.stderr: 10: (()+0x7dc5) [0x7fee7bb96dc5]
2016-07-17T11:14:51.198 INFO:tasks.ceph.osd.0.vpm107.stderr: 11: (clone()+0x6d) [0x7fee79a90ced]
2016-07-17T11:14:51.200 INFO:tasks.ceph.osd.0.vpm107.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-07-17T11:14:51.200 INFO:tasks.ceph.osd.0.vpm107.stderr:2016-07-17 18:14:51.958998 7fee5a0f5700 -1 /srv/autobuild-ceph/gitbuilder.git/build/rpmbuild/BUILD/ceph-11.0.0/src/osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7fee5a0f5700 time 2016-07-17 18:14:51.910215

I don't see that short_pg_log is set in this run.
Actions #7

Updated by Samuel Just over 7 years ago

  • Status changed from 7 to Resolved
Actions #8

Updated by Yuri Weinstein over 7 years ago

  • Related to Bug #16975: "ReplicatedPG.cc: 3030: FAILED assert(0 == "out of order op")" in rados-jewel-distro-basic-smithi added
Actions

Also available in: Atom PDF