Project

General

Profile

Bug #1563

OSD isn't prioritizing data with waiting ops during transfer

Added by Greg Farnum over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ajm reported on irc that his MDS was stuck in replay, and it turned out to be because it was waiting for a read response to come back from an OSD. The OSD in question had died and been replaced (with a completely empty store), but it still should have been fetching requested data early while transferring the PG over.

History

#1 Updated by Sage Weil over 9 years ago

  • translation missing: en.field_position set to 1

#2 Updated by Adam Jacob Muller over 9 years ago

log file from the affected OSD: http://adam.gs/osd.5.log.1317056213.bz2

md5(osd.5.log.1317056213.bz2) = c162d9359f861f43fc63f38a8444052a
bytes(osd.5.log.1317056213.bz2) = 209269740

#3 Updated by Greg Farnum over 9 years ago

Copied locally to kai:~gregf/logs/ajm_osd_not_prioritizing_reads.log

#4 Updated by Sage Weil over 9 years ago

  • Status changed from New to Closed

2011-09-26 12:36:16.150430 7f66b08b4700 -- 64.188.54.43:6800/4514 <== mds0 64.188.54.36:6800/10043 6 ==== osd_op(mds0.55:40 200.0009335c [read 0~4194304] 1.2363) v2 ==== 127+0+0 (4098147964 0 0) 0x7f6673ca7bb0 con 0x7606c1e0

is the only blocked read, because

2011-09-26 12:36:16.405345 7f66b08b4700 osd5 55528 pg[1.363( v 48421'44428 lc 0'0 (48419'44424,48421'44428]+backlog n=2090 ec=2 les/c 55528/51135 55516/55516/55516) [5,6] r=0 mlcod 0'0 !hml active m=2090 u=1] missing 200.0009335c/head v 48421'44428, is unfound.

and that object is never found. The problem isn't prioritizing, it's in handling unfound/lost objects.

Also available in: Atom PDF