Project

General

Profile

Bug #11211

osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2015-03-21 07:04:19.095762 7f4c41ed4700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_proxy_read(hobject_t, ceph_tid_t, int)' thread 7f4c41ed4700 time 2015-03-21 07:04:19.094195
osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)

 ceph version 0.93-164-g120300f (120300febebdc34c23fc24633edfdc8faf65e93c)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xbc6f6b]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x109d) [0x847f0d]
 3: (C_ProxyRead::finish(int)+0xbd) [0x8d930d]
 4: (Context::complete(int)+0x9) [0x6ca639]
 5: (Finisher::finisher_thread_entry()+0x158) [0xaf38d8]
 6: (()+0x8182) [0x7f4c6477f182]
 7: (clone()+0x6d) [0x7f4c62ceb38d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/sage-2015-03-20_17:42:17-rgw-hammer-distro-basic-multi/814306

description: rgw/multifs/{clusters/fixed-2.yaml frontend/apache.yaml fs/ext4.yaml
overrides.yaml rgw_pool_type/ec-cache.yaml tasks/rgw_readwrite.yaml}


Related issues

Duplicated by Ceph - Bug #11375: Proxy_Read BUG in Cachetier when using ecpool as cold-pool Duplicate 04/13/2015

Associated revisions

Revision 560a5839 (diff)
Added by Zhiqiang Wang about 7 years ago

osd/ReplicatedPG: don't check order in finish_proxy_read

Read doesn't need to be ordered. So when proxy read comes back from base
tier, it's not necessarily at the front of the in progress list.

Fixes: #11211

Signed-off-by: Zhiqiang Wang <>

Revision 0ee022b1 (diff)
Added by Zhiqiang Wang about 7 years ago

osd/ReplicatedPG: don't check order in finish_proxy_read

Read doesn't need to be ordered. So when proxy read comes back from base
tier, it's not necessarily at the front of the in progress list.

Fixes: #11211

Signed-off-by: Zhiqiang Wang <>
(cherry picked from commit 560a5839c0d1852b5816937b845b60390777636c)

History

#1 Updated by Zhiqiang Wang about 7 years ago

This is the problem we talked before. Read is not necessarily ordered. So when the proxy read comes back from the base tier, it may be not the first element in the list. I'm going to write a patch to fix this.

#2 Updated by Zhiqiang Wang about 7 years ago

A fix for this is in https://github.com/ceph/ceph/pull/4155. Pls help review, thx!

#3 Updated by Yuri Weinstein about 7 years ago

Run: http://pulpito.ceph.com/teuthology-2015-03-22_23:02:01-rgw-hammer-distro-basic-multi/
Jobs: ['817403', '817431', '817459', '817515', '817543']

Assertion: osd/ReplicatedPG.cc: 2141: FAILED assert(op == prdop->op)
ceph version 0.93-167-gc60ba6a (c60ba6af8cf45ce8fccf2df04ac01e76efdf4a96)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc93d5]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x109a) [0x83704a]
 3: (C_ProxyRead::finish(int)+0xed) [0x8c7dad]
 4: (Context::complete(int)+0x9) [0x6be449]
 5: (Finisher::finisher_thread_entry()+0x168) [0xaee748]
 6: (()+0x7df3) [0x7fe43c93cdf3]
 7: (clone()+0x6d) [0x7fe43b41f3dd]

#4 Updated by Sage Weil about 7 years ago

  • Backport set to hammer

this should go to hammer, too; the rgw test suite is triggering it:

ubuntu@teuthology:/a/sage-2015-03-24_11:15:21-rgw-hammer-distro-basic-multi/819683
ubuntu@teuthology:/a/sage-2015-03-24_11:15:21-rgw-hammer-distro-basic-multi/819604

#5 Updated by Kefu Chai about 7 years ago

  • Status changed from New to Fix Under Review

#6 Updated by Yuri Weinstein about 7 years ago

Run: http://pulpito.front.sepia.ceph.com/teuthology-2015-03-29_14:58:14-rgw-hammer-distro-basic-multi/
Jobs: ['826416', '826417', '826418', '826419']

Assertion: osd/ReplicatedPG.cc: 2063: FAILED assert(op == prdop->op)
ceph version 0.93-203-g59143d1 (59143d141ccb4d36e8ede878c0e317ff8a5f6932)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaea18f]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0xf35) [0x806dd5]
 3: (C_ProxyRead::finish(int)+0xc7) [0x8781b7]
 4: (Context::complete(int)+0x9) [0x68a849]
 5: (Finisher::finisher_thread_entry()+0x160) [0xa16420]
 6: (()+0x7e9a) [0x7f9115c5ce9a]
 7: (clone()+0x6d) [0x7f911440631d]

#7 Updated by Kefu Chai about 7 years ago

the fix has not landed on hammer yet.

#8 Updated by Loïc Dachary about 7 years ago

  • Severity changed from 3 - minor to 2 - major

#9 Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Pending Backport

#12 Updated by Loïc Dachary about 7 years ago

  • Status changed from Pending Backport to Resolved
  • Regression set to No

#13 Updated by Yuri Weinstein about 7 years ago

See it in next
Run: http://pulpito.ceph.com/teuthology-2015-05-06_23:02:03-rgw-next-distro-basic-multi/
Job: ['878207']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2015-05-06_23:02:03-rgw-next-distro-basic-multi/878212/

Assertion: osd/ReplicatedPG.cc: 2072: FAILED assert(op == prdop->op)
ceph version 0.94-917-g2ffb030 (2ffb030099ccccae9b87404b8af2578b1da07506)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xc313cb]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0x10c1) [0x86f151]
 3: (C_ProxyRead::finish(int)+0xbd) [0x8ea82d]
 4: (Context::complete(int)+0x9) [0x6f25b9]
 5: (Finisher::finisher_thread_entry()+0x158) [0xb5d938]
 6: (()+0x8182) [0x7f24013e8182]
 7: (clone()+0x6d) [0x7f23ff952efd]

#14 Updated by Yuri Weinstein about 7 years ago

  • Status changed from Resolved to New

#15 Updated by Loïc Dachary about 7 years ago

  • Status changed from New to 12

http://pulpito.ceph.com/loic-2015-05-18_00:59:08-rgw-wip-11622-hammer---basic-multi/896515/

2015-05-17 16:09:19.260207 7f53284f9700 20 osd.0 28 share_map_peer 0x39fd2c0 already has epoch 28
2015-05-17 16:09:19.260209 7f53284f9700  1 -- 10.214.131.22:6801/15766 --> 10.214.132.32:6809/7739 -- osd_repop(client.4116.0:36177 11.6 f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 v 28'10937) v1 -- ?+992 0x3e9e200 con 0x39fd2c0
2015-05-17 16:09:19.260228 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10936 (28'7900,28'10936] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log log((28'7900,28'10936], crt=28'10930) [28'10937 (28'10936) modify   f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 by client.4116.0:36177 2015-05-17 16:09:19.259534]
2015-05-17 16:09:19.260247 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] add_log_entry 28'10937 (28'10936) modify   f703f94e/2015-05-17-16-default.4116.1-rwtest/head//11 by client.4116.0:36177 2015-05-17 16:09:19.259534
2015-05-17 16:09:19.260263 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log  adding 1 keys
2015-05-17 16:09:19.260274 7f53284f9700 10 osd.0 pg_epoch: 28 pg[11.6( v 28'10937 (28'7900,28'10937] local-les=26 n=1 ec=25 les/c 26/26 25/25/25) [0,5] r=0 lpr=25 luod=28'10931 lua=28'10931 crt=28'10930 lcod 28'10930 mlcod 28'10930 active+clean] append_log: trimming to 28'10930 entries 
2015-05-17 16:09:19.260294 7f53284f9700 10 write_log with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, dirty_divergent_priors: 0, writeout_from: 28'10937, trimmed: 
2015-05-17 16:09:19.261404 7f5323cf0700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::finish_proxy_read(hobject_t, ceph_tid_t, int)' thread 7f5323cf0700 time 2015-05-17 16:09:19.259907
osd/ReplicatedPG.cc: 2064: FAILED assert(op == prdop->op)

 ceph version 0.94.1-7-g293affe (293affe992118ed6e04f685030b2d83a794ca624)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7f) [0xaebf4f]
 2: (ReplicatedPG::finish_proxy_read(hobject_t, unsigned long, int)+0xf35) [0x808d15]
 3: (C_ProxyRead::finish(int)+0xc7) [0x879997]
 4: (Context::complete(int)+0x9) [0x68b5a9]
 5: (Finisher::finisher_thread_entry()+0x160) [0xa18110]
 6: (()+0x7e9a) [0x7f5344a3ce9a]
 7: (clone()+0x6d) [0x7f53431e53fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#16 Updated by Sage Weil almost 7 years ago

  • Status changed from 12 to Resolved

the backport is 0ee022b1ae832c70a80e9d2cdf32403039f3f125 which was not present in that failed run.

Also available in: Atom PDF