Project

General

Profile

Actions

Bug #12338

closed

long standing slow requests: connection->session->waiting_for_map->connection ref cycle

Added by Abhishek Lekshmanan almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Faced an issue with (0.87.2) cluster which went into a warn state with slow requests

$ sudo ceph health detail
HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests
1 ops are blocked > 16777.2 sec
1 ops are blocked > 16777.2 sec on osd.1
1 osds have slow requests

Looking at the current ops_in_flight for the relevant osd;

$ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight { "ops": [ { "description": "osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)",
"initiated_at": "2015-07-15 15:29:48.082861",
"age": "11147.873450",
"duration": "0.000000",
"type_data": [
"no flag points reached", { "client": "client.4209",
"tid": 1},
[ { "time": "2015-07-15 15:29:48.082861",
"event": "initiated"}]]}],
"num_ops": 1}

Below is the osd log at startup

2015-07-15 15:29:47.745050 7f71f070c700 0 osd.1 0 ignoring osdmap until we have initialized
2015-07-15 15:29:47.745110 7f71f070c700 0 osd.1 0 ignoring osdmap until we have initialized
2015-07-15 15:29:47.745260 7f7201be3900 0 osd.1 0 done with init, starting boot process
2015-07-15 15:29:48.109484 7f71f070c700 0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for clients
2015-07-15 15:29:48.109517 7f71f070c700 0 osd.1 17 crush map has features 1107558400 was 33825281, adjusting msgr requires for mons
2015-07-15 15:29:48.109542 7f71f070c700 0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for osds
2015-07-15 15:30:18.741842 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.658855 secs
2015-07-15 15:30:18.744257 7f71f74e4700 0 log_channel(default) log [WRN] : slow request 30.658855 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached
2015-07-15 15:30:48.751010 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 60.668101 secs
2015-07-15 15:30:48.751066 7f71f74e4700 0 log_channel(default) log [WRN] : slow request 60.668101 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached
2015-07-15 15:31:48.765077 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 120.682157 secs


Files

osd.conf (26.9 KB) osd.conf Abhishek Lekshmanan, 07/15/2015 07:37 PM
osd-debug.tar.xz (306 KB) osd-debug.tar.xz Abhishek Lekshmanan, 07/16/2015 12:08 PM

Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #12843: long standing slow requests: connection->session->waiting_for_map->connection ref cycleResolvedLoïc Dachary07/15/2015Actions
Actions

Also available in: Atom PDF