Project

General

Profile

Bug #11334

Updated by Sage Weil about 9 years ago

log shows 
 <pre> 
 2015-04-04 20:21:19.807227 7f628f9e9700    1 -- 10.214.136.136:6800/24788 <== mds.0 10.214.134.10:6800/9790 125281 ==== osd_op(mds.0.122:43197967 1000a878d0c.00000000 [create 0~0,setxattr parent (342)] 0.47732881 ondisk+write+known_if_redirected e406836) v5 ==== 217+0+348 (1980847806 0 3494425867) 0x201c9500 con 0x1e92ac60 
 2015-04-04 20:21:19.807306 7f628f9e9700    1 -- 10.214.136.136:6800/24788 --> 10.214.133.104:6789/0 -- mon_subscribe({monmap=6+,osd_pg_creates=0,osdmap=406836}) v2 -- ?+0 0x38d7400 con 0x1f163c60 
 </pre> 
 and then starts complaining about a blocked op.    culprit seems to be this code: 

 <pre> 
 bool OSD::dispatch_op_fast(OpRequestRef& op, OSDMapRef& osdmap) 
 { 
   if (is_stopping()) { 
     // we're shutting down, so drop the op 
     return true; 
   } 

   epoch_t msg_epoch(op_required_epoch(op)); 
   if (msg_epoch > osdmap->get_epoch()) { 
     Session *s = static_cast<Session*>(op->get_req()-> 
				        get_connection()->get_priv()); 
     if (s) { 
       s->received_map_lock.Lock(); 
       epoch_t received_epoch = s->received_map_epoch; 
       s->received_map_lock.Unlock(); 
       if (received_epoch < msg_epoch) { 
	 osdmap_subscribe(msg_epoch, false); 
       } 
       s->put(); 
     } 
     return false; 
   } 
 </pre> 
 which, as far as i can tell, will just drop the message on the floor?

Back