Project

General

Profile

Bug #12338

Updated by Loïc Dachary over 8 years ago

https://github.com/ceph/ceph/pull/5760 Faced an issue with (0.87.2) cluster which went into a warn state with slow requests 

 $ sudo ceph health detail 
 HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests 
 1 ops are blocked > 16777.2 sec 
 1 ops are blocked > 16777.2 sec on osd.1 
 1 osds have slow requests 

 Looking at the current ops_in_flight for the relevant osd; 

 $ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight 
 { "ops": [ 
         { "description": "osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)", 
           "initiated_at": "2015-07-15 15:29:48.082861", 
           "age": "11147.873450", 
           "duration": "0.000000", 
           "type_data": [ 
                 "no flag points reached", 
                 { "client": "client.4209", 
                   "tid": 1}, 
                 [ 
                     { "time": "2015-07-15 15:29:48.082861", 
                       "event": "initiated"}]]}], 
   "num_ops": 1} 

 Below is the osd log at startup  

 2015-07-15 15:29:47.745050 7f71f070c700    0 osd.1 0 ignoring osdmap until we have initialized 
 2015-07-15 15:29:47.745110 7f71f070c700    0 osd.1 0 ignoring osdmap until we have initialized 
 2015-07-15 15:29:47.745260 7f7201be3900    0 osd.1 0 done with init, starting boot process 
 2015-07-15 15:29:48.109484 7f71f070c700    0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for clients 
 2015-07-15 15:29:48.109517 7f71f070c700    0 osd.1 17 crush map has features 1107558400 was 33825281, adjusting msgr requires for mons 
 2015-07-15 15:29:48.109542 7f71f070c700    0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for osds 
 2015-07-15 15:30:18.741842 7f71f74e4700    0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.658855 secs 
 2015-07-15 15:30:18.744257 7f71f74e4700    0 log_channel(default) log [WRN] : slow request 30.658855 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached 
 2015-07-15 15:30:48.751010 7f71f74e4700    0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 60.668101 secs 
 2015-07-15 15:30:48.751066 7f71f74e4700    0 log_channel(default) log [WRN] : slow request 60.668101 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached 
 2015-07-15 15:31:48.765077 7f71f74e4700    0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 120.682157 secs 

Back