Project

General

Profile

Actions

Bug #11211

closed

osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)

Added by Sage Weil about 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2015-03-21 07:04:19.095762 7f4c41ed4700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_proxy_read(hobject_t, ceph_tid_t, int)' thread 7f4c41ed4700 time 2015-03-21 07:04:19.094195
osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)

 ceph version 0.93-164-g120300f (120300febebdc34c23fc24633edfdc8faf65e93c)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc6f6b]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x109d) [0x847f0d]
 3: (C_ProxyRead::finish(int)+0xbd) [0x8d930d]
 4: (Context::complete(int)+0x9) [0x6ca639]
 5: (Finisher::finisher_thread_entry()+0x158) [0xaf38d8]
 6: (()+0x8182) [0x7f4c6477f182]
 7: (clone()+0x6d) [0x7f4c62ceb38d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/sage-2015-03-20_17:42:17-rgw-hammer-distro-basic-multi/814306

description: rgw/multifs/{clusters/fixed-2.yaml frontend/apache.yaml fs/ext4.yaml
overrides.yaml rgw_pool_type/ec-cache.yaml tasks/rgw_readwrite.yaml}


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #11375: Proxy_Read BUG in Cachetier when using ecpool as cold-poolDuplicate04/13/2015

Actions
Actions #1

Updated by Zhiqiang Wang about 9 years ago

This is the problem we talked before. Read is not necessarily ordered. So when the proxy read comes back from the base tier, it may be not the first element in the list. I'm going to write a patch to fix this.

Actions #2

Updated by Zhiqiang Wang about 9 years ago

A fix for this is in https://github.com/ceph/ceph/pull/4155. Pls help review, thx!

Actions #3

Updated by Yuri Weinstein about 9 years ago

Run: http://pulpito.ceph.com/teuthology-2015-03-22_23:02:01-rgw-hammer-distro-basic-multi/
Jobs: ['817403', '817431', '817459', '817515', '817543']

Assertion: osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)
ceph version 0.93-167-gc60ba6a (c60ba6af8cf45ce8fccf2df04ac01e76efdf4a96)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc93d5]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x109a) [0x83704a]
 3: (C_ProxyRead::finish(int)+0xed) [0x8c7dad]
 4: (Context::complete(int)+0x9) [0x6be449]
 5: (Finisher::finisher_thread_entry()+0x168) [0xaee748]
 6: (()+0x7df3) [0x7fe43c93cdf3]
 7: (clone()+0x6d) [0x7fe43b41f3dd]

Actions #4

Updated by Sage Weil about 9 years ago

  • Backport set to hammer

this should go to hammer, too; the rgw test suite is triggering it:

ubuntu@teuthology:/a/sage-2015-03-24_11:15:21-rgw-hammer-distro-basic-multi/819683
ubuntu@teuthology:/a/sage-2015-03-24_11:15:21-rgw-hammer-distro-basic-multi/819604

Actions #5

Updated by Kefu Chai about 9 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Yuri Weinstein about 9 years ago

Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-29_14:58:14-rgw-hammer-distro-basic-multi/
Jobs: ['826416', '826417', '826418', '826419']

Assertion: osd/ReplicatedPG.cc: 2063: FAILED assert(op == prdop->op)
ceph version 0.93-203-g59143d1 (59143d141ccb4d36e8ede878c0e317ff8a5f6932)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaea18f]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0xf35) [0x806dd5]
 3: (C_ProxyRead::finish(int)+0xc7) [0x8781b7]
 4: (Context::complete(int)+0x9) [0x68a849]
 5: (Finisher::finisher_thread_entry()+0x160) [0xa16420]
 6: (()+0x7e9a) [0x7f9115c5ce9a]
 7: (clone()+0x6d) [0x7f911440631d]

Actions #7

Updated by Kefu Chai about 9 years ago

the fix has not landed on hammer yet.

Actions #8

Updated by Loïc Dachary about 9 years ago

  • Severity changed from 3 - minor to 2 - major
Actions #9

Updated by Sage Weil about 9 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #12

Updated by Loïc Dachary almost 9 years ago

  • Status changed from Pending Backport to Resolved
  • Regression set to No
Actions #13

Updated by Yuri Weinstein almost 9 years ago

See it in next
Run: http://pulpito.ceph.com/teuthology-2015-05-06_23:02:03-rgw-next-distro-basic-multi/
Job: ['878207']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-05-06_23:02:03-rgw-next-distro-basic-multi/878212/

Assertion: osd/ReplicatedPG.cc: 2072: FAILED assert(op == prdop->op)
ceph version 0.94-917-g2ffb030 (2ffb030099ccccae9b87404b8af2578b1da07506)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xc313cb]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x10c1) [0x86f151]
 3: (C_ProxyRead::finish(int)+0xbd) [0x8ea82d]
 4: (Context::complete(int)+0x9) [0x6f25b9]
 5: (Finisher::finisher_thread_entry()+0x158) [0xb5d938]
 6: (()+0x8182) [0x7f24013e8182]
 7: (clone()+0x6d) [0x7f23ff952efd]
Actions #14

Updated by Yuri Weinstein almost 9 years ago

  • Status changed from Resolved to New
Actions #15

Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to 12

http://pulpito.ceph.com/loic-2015-05-18_00:59:08-rgw-wip-11622-hammer---basic-multi/896515/

2015-05-17 16:09:19.260207 7f53284f9700 20 osd.0 28 share_map_peer 0x39fd2c0 already has epoch 28
2015-05-17 16:09:19.260209 7f53284f9700  1 -- 10.214.131.22:6801/15766 --> 10.214.132.32:6809/7739 -- osd_repop(client.4116.0:36177 11.6 f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 v 28'10937) v1 -- ?+992 0x3e9e200 con 0x39fd2c0
2015-05-17 16:09:19.260228 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10936 (28'7900,28'10936] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log log((28'7900,28'10936], crt=28'10930) [28'10937 (28'10936) modify   f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 by client.4116.0:36177 2015-05-17 16:09:19.259534]
2015-05-17 16:09:19.260247 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] add_log_entry 28'10937 (28'10936) modify   f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 by client.4116.0:36177 2015-05-17 16:09:19.259534
2015-05-17 16:09:19.260263 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log  adding 1 keys
2015-05-17 16:09:19.260274 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log: trimming to 28'10930 entries 
2015-05-17 16:09:19.260294 7f53284f9700 10 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: 0, writeout_from: 28'10937, trimmed: 
2015-05-17 16:09:19.261404 7f5323cf0700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_proxy_read(hobject_t, ceph_tid_t, int)' thread 7f5323cf0700 time 2015-05-17 16:09:19.259907
osd/ReplicatedPG.cc: 2064: FAILED assert(op == prdop->op)

 ceph version 0.94.1-7-g293affe (293affe992118ed6e04f685030b2d83a794ca624)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaebf4f]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0xf35) [0x808d15]
 3: (C_ProxyRead::finish(int)+0xc7) [0x879997]
 4: (Context::complete(int)+0x9) [0x68b5a9]
 5: (Finisher::finisher_thread_entry()+0x160) [0xa18110]
 6: (()+0x7e9a) [0x7f5344a3ce9a]
 7: (clone()+0x6d) [0x7f53431e53fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Actions #16

Updated by Sage Weil almost 9 years ago

  • Status changed from 12 to Resolved

the backport is 0ee022b1ae832c70a80e9d2cdf32403039f3f125 which was not present in that failed run.

Actions

Also available in: Atom PDF