Project

General

Profile

Actions

Bug #13611

closed

slow request > 30000 seconds on unloaded disk?

Added by Laurent GUERBY over 8 years ago. Updated about 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  1. ceph health detail
    ...
    1 ops are blocked > 67108.9 sec on osd.37
  1. grep slow ceph-osd.37.log
    2015-10-27 06:50:25.404157 7f3858154700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30720.747651 secs
    2015-10-27 06:50:25.404169 7f3858154700 0 log_channel(cluster) log [WRN] : slow request 30720.747651 seconds old, received at 2015-10-26 22:18:24.656474: osd_op(client.31827417.0:1337 rbd_data.b48170470de472.000000000000528a [read 2117632~16384] 81.2cce92ba ack+read+known_if_redirected e93978) currently reached_pg

The disk for osd.37 %util is pretty low:

10/27/2015 09:53:42 AM
avg-cpu: %user %nice %system %iowait %steal %idle
49.63 0.00 6.51 4.50 0.00 39.35

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 7.60 5.00 26.30 0.02 0.26 17.99 0.35 11.74 3.92 13.23 5.00 15.64

So this is likely a bug, is there a way to clean those spurious blocked op?

Actions #1

Updated by Samuel Just over 8 years ago

Restarting the osd reporting it will clean it up.

Actions #2

Updated by Sage Weil about 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF