Project

General

Profile

Actions

Bug #50510

open

OSD will return -EAGAIN on balance_reads although it can return the data

Added by Or Friedmann almost 3 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

PrimaryLogPG.cc:

if (!is_primary()) {
if (!recovery_state.can_serve_replica_read(oid)) {
dout(20) << func
<< ": unstable write on replica, bouncing to primary "
<< *m << dendl;
osd->reply_op_error(op, -EAGAIN);
return;
}
dout(20) << func << ": serving replica read on oid " << oid
<< dendl;
}

changing this (!recovery_state.can_serve_replica_read(oid)) to (false) will allow the osd to return the object's data

osd.2 is the primary

The object is replicated and didn't change, it always returns EAGAIN, changing to false returns valid response.

ms log from version with false in this condition:

2021-04-25T17:40:44.664+0300 7f5b117da640 1 -- 192.168.31.119:0/2540045748 --> [v2:192.168.31.119:6808/103538,v1:192.168.31.119:6810/103538] -- osd_op(unknown.0.0:848 7.11 7:890d6271:::a45ef6be-ae65-4c8b-a097-0ae0fa8dbac9.24121.1_hosts:head [getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+balance_reads+known_if_redirected e80) v8 -- 0x7f5bc8019950 con 0x7f5c2c03b190
2021-04-25T17:40:44.664+0300 7f5c3b7fe640 1 -- 192.168.31.119:0/2540045748 <== osd.0 v2:192.168.31.119:6808/103538 354 ==== osd_op_reply(848 a45ef6be-ae65-4c8b-a097-0ae0fa8dbac9.24121.1_hosts [getxattrs out=1250b,stat out=16b,read 0~215 out=215b] v0'0 uv3 ondisk = 0) v8 ==== 278+0+1481 (crc 0 0 0) 0x7f5c34045040 con 0x7f5c2c03b190

ms log from upstream:

2021-04-25T20:57:02.419+0300 7f79a07e8640 1 -- 192.168.31.119:0/1658518928 --> [v2:192.168.31.119:6809/116462,v1:192.168.31.119:6811/116462] -- osd_op(unknown.0.0:307 7.11 7:890d6271:::a45ef6be-ae65-4c8b-a097-0ae0fa8dbac9.24121.1_hosts:head [getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+balance_reads+known_if_redirected e85) v8 -- 0x7f7a00041390 con 0x7f7aac03b3e0
2021-04-25T20:57:02.419+0300 7f7abd7fa640 1 -- 192.168.31.119:0/1658518928 <== osd.0 v2:192.168.31.119:6809/116462 114 ==== osd_op_reply(307 a45ef6be-ae65-4c8b-a097-0ae0fa8dbac9.24121.1_hosts [getxattrs,stat,read 0~4194304] v0'0 uv0 ondisk = 11 ((11) Resource temporarily unavailable)) v8 ==== 278+0+0 (crc 0 0 0) 0x7f7ab8045040 con 0x7f7aac03b3e0
2021-04-25T20:57:02.419+0300 7f7abd7fa640 1 -
192.168.31.119:0/1658518928 --> [v2:192.168.31.119:6808/116460,v1:192.168.31.119:6810/116460] -- osd_op(unknown.0.0:308 7.11 7:890d6271:::a45ef6be-ae65-4c8b-a097-0ae0fa8dbac9.24121.1_hosts:head [getxattrs,stat,read 0~4194304] snapc 0=[] RETRY=1 ondisk+retry+read+known_if_redirected e85) v8 -- 0x7f79e0004000 con 0x140e2d0

Code that is sending balance read flag: https://github.com/pleiadesian/ceph/pull/1/files


Files

osd.2.log (4.16 KB) osd.2.log Replica osd log Zulai Wang, 05/02/2021 01:43 PM
osd.1.log (9.36 KB) osd.1.log Primary osd log Zulai Wang, 05/02/2021 01:43 PM
osd.2.log.nocheck (11.7 KB) osd.2.log.nocheck Replica osd log with false in the condition Zulai Wang, 05/02/2021 01:43 PM
Actions #1

Updated by Neha Ojha almost 3 years ago

  • Status changed from New to Need More Info

Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? That will help us understand whether can_serve_replica_read() is doing the right thing or not.

Updated by Zulai Wang almost 3 years ago

Neha Ojha wrote:

Can you please provide osd logs from the primary and replica with debug_osd=20 and debug_ms=1? That will help us understand whether can_serve_replica_read() is doing the right thing or not.

Osd logs relating to balance_reads are in the attachment. (osd.1 is the primary, osd.2 is the replica)

  • Replica osd log is in osd.2.log. It shows that osd.2 returns EAGAIN.
  • Primary osd log is in osd.1.log. It shows that osd.1 receives RETRY and returns the object's data.
  • Replica osd log when changing (!recovery_state.can_serve_replica_read(oid)) to (false) is in osd.2.log.nocheck. It shows that replica can also return the object's data.
Actions #3

Updated by Neha Ojha almost 3 years ago

The following indicates that that it is not safe to do a balanced read from the secondary at this time. Making the "can_serve_replica_read" check a no-op is dangerous, we not only want to read from the replica, we want to read the correct version of the object.

2021-05-02T21:01:03.330+0800 7fe07fdbe700 20 osd.2 pg_epoch: 71 pg[4.f( v 25'2 (0'0,25'2] local-lis/les=24/25 n=2 ec=24/24 lis/c=24/24 les/c/f=25/25/0 sis=24) [1,2,0] r=1 lpr=24 luod=0'0 lua=0'0 crt=25'2 lcod 25'1 mlcod 0'0 active mbc={}] do_op: unstable write on replica, bouncing to primary osd_op(client.4397.0:4 4.f 4:f4c53578:::zonegroups_names.default:head [read 0~0] snapc 0=[] ondisk+read+balance_reads+known_if_redirected e71) v8
2021-05-02T21:01:03.330+0800 7fe07fdbe700  1 -- [v2:127.0.0.1:6818/19144,v1:127.0.0.1:6819/19144] --> 127.0.0.1:0/233630719 -- osd_op_reply(4 zonegroups_names.default [read 0~0] v0'0 uv0 ondisk = -11 ((11) Resource temporarily unavailable)) v8 -- 0x56551641dd40 con 0x56551665fc00
Actions #4

Updated by Samuel Just almost 3 years ago

If the replica has a log entry for write on the object more recent than last_complete_ondisk (iirc), it will bounce the read back to the primary since its local state is unstable and might be rolled back in a future peering round.

Actions

Also available in: Atom PDF