Bug #11334
Updated by Sage Weil about 9 years ago
log shows
<pre>
2015-04-04 20:21:19.807227 7f628f9e9700 1 -- 10.214.136.136:6800/24788 <== mds.0 10.214.134.10:6800/9790 125281 ==== osd_op(mds.0.122:43197967 1000a878d0c.00000000 [create 0~0,setxattr parent (342)] 0.47732881 ondisk+write+known_if_redirected e406836) v5 ==== 217+0+348 (1980847806 0 3494425867) 0x201c9500 con 0x1e92ac60
2015-04-04 20:21:19.807306 7f628f9e9700 1 -- 10.214.136.136:6800/24788 --> 10.214.133.104:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0,osdmap=406836}) v2 -- ?+0 0x38d7400 con 0x1f163c60
</pre>
and then starts complaining about a blocked op. culprit seems to be this code:
<pre>
bool OSD::dispatch_op_fast(OpRequestRef& op, OSDMapRef& osdmap)
{
if (is_stopping()) {
// we're shutting down, so drop the op
return true;
}
epoch_t msg_epoch(op_required_epoch(op));
if (msg_epoch > osdmap->get_epoch()) {
Session *s = static_cast<Session*>(op->get_req()->
get_connection()->get_priv());
if (s) {
s->received_map_lock.Lock();
epoch_t received_epoch = s->received_map_epoch;
s->received_map_lock.Unlock();
if (received_epoch < msg_epoch) {
osdmap_subscribe(msg_epoch, false);
}
s->put();
}
return false;
}
</pre>
which, as far as i can tell, will just drop the message on the floor?