Bug #1563
closed
OSD isn't prioritizing data with waiting ops during transfer
Added by Greg Farnum over 12 years ago.
Updated over 12 years ago.
Description
ajm reported on irc that his MDS was stuck in replay, and it turned out to be because it was waiting for a read response to come back from an OSD. The OSD in question had died and been replaced (with a completely empty store), but it still should have been fetching requested data early while transferring the PG over.
- Translation missing: en.field_position set to 1
Copied locally to kai:~gregf/logs/ajm_osd_not_prioritizing_reads.log
- Status changed from New to Closed
2011-09-26 12:36:16.150430 7f66b08b4700 -- 64.188.54.43:6800/4514 <== mds0 64.188.54.36:6800/10043 6 ==== osd_op(mds0.55:40 200.0009335c [read 0~4194304] 1.2363) v2 ==== 127+0+0 (4098147964 0 0) 0x7f6673ca7bb0 con 0x7606c1e0
is the only blocked read, because
2011-09-26 12:36:16.405345 7f66b08b4700 osd5 55528 pg[1.363( v 48421'44428 lc 0'0 (48419'44424,48421'44428]+backlog n=2090 ec=2 les/c 55528/51135 55516/55516/55516) [5,6] r=0 mlcod 0'0 !hml active m=2090 u=1] missing 200.0009335c/head v 48421'44428, is unfound.
and that object is never found. The problem isn't prioritizing, it's in handling unfound/lost objects.
Also available in: Atom
PDF