OSD isn't prioritizing data with waiting ops during transfer
ajm reported on irc that his MDS was stuck in replay, and it turned out to be because it was waiting for a read response to come back from an OSD. The OSD in question had died and been replaced (with a completely empty store), but it still should have been fetching requested data early while transferring the PG over.
#4 Updated by Sage Weil over 9 years ago
- Status changed from New to Closed
2011-09-26 12:36:16.150430 7f66b08b4700 -- 18.104.22.168:6800/4514 <== mds0 22.214.171.124:6800/10043 6 ==== osd_op(mds0.55:40 200.0009335c [read 0~4194304] 1.2363) v2 ==== 127+0+0 (4098147964 0 0) 0x7f6673ca7bb0 con 0x7606c1e0
is the only blocked read, because
2011-09-26 12:36:16.405345 7f66b08b4700 osd5 55528 pg[1.363( v 48421'44428 lc 0'0 (48419'44424,48421'44428]+backlog n=2090 ec=2 les/c 55528/51135 55516/55516/55516) [5,6] r=0 mlcod 0'0 !hml active m=2090 u=1] missing 200.0009335c/head v 48421'44428, is unfound.
and that object is never found. The problem isn't prioritizing, it's in handling unfound/lost objects.