Project

General

Profile

Actions

Bug #12338

closed

long standing slow requests: connection->session->waiting_for_map->connection ref cycle

Added by Abhishek Lekshmanan almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Faced an issue with (0.87.2) cluster which went into a warn state with slow requests

$ sudo ceph health detail
HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests
1 ops are blocked > 16777.2 sec
1 ops are blocked > 16777.2 sec on osd.1
1 osds have slow requests

Looking at the current ops_in_flight for the relevant osd;

$ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight { "ops": [ { "description": "osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)",
"initiated_at": "2015-07-15 15:29:48.082861",
"age": "11147.873450",
"duration": "0.000000",
"type_data": [
"no flag points reached", { "client": "client.4209",
"tid": 1},
[ { "time": "2015-07-15 15:29:48.082861",
"event": "initiated"}]]}],
"num_ops": 1}

Below is the osd log at startup

2015-07-15 15:29:47.745050 7f71f070c700 0 osd.1 0 ignoring osdmap until we have initialized
2015-07-15 15:29:47.745110 7f71f070c700 0 osd.1 0 ignoring osdmap until we have initialized
2015-07-15 15:29:47.745260 7f7201be3900 0 osd.1 0 done with init, starting boot process
2015-07-15 15:29:48.109484 7f71f070c700 0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for clients
2015-07-15 15:29:48.109517 7f71f070c700 0 osd.1 17 crush map has features 1107558400 was 33825281, adjusting msgr requires for mons
2015-07-15 15:29:48.109542 7f71f070c700 0 osd.1 17 crush map has features 1107558400, adjusting msgr requires for osds
2015-07-15 15:30:18.741842 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.658855 secs
2015-07-15 15:30:18.744257 7f71f74e4700 0 log_channel(default) log [WRN] : slow request 30.658855 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached
2015-07-15 15:30:48.751010 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 60.668101 secs
2015-07-15 15:30:48.751066 7f71f74e4700 0 log_channel(default) log [WRN] : slow request 60.668101 seconds old, received at 2015-07-15 15:29:48.082861: osd_op(client.4209.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) currently no flag points reached
2015-07-15 15:31:48.765077 7f71f74e4700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 120.682157 secs


Files

osd.conf (26.9 KB) osd.conf Abhishek Lekshmanan, 07/15/2015 07:37 PM
osd-debug.tar.xz (306 KB) osd-debug.tar.xz Abhishek Lekshmanan, 07/16/2015 12:08 PM

Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #12843: long standing slow requests: connection->session->waiting_for_map->connection ref cycleResolvedLoïc Dachary07/15/2015Actions
Actions #1

Updated by Samuel Just almost 9 years ago

  • Subject changed from long standing slow requests to firefly: long standing slow requests
  • Priority changed from Normal to Urgent
Actions #2

Updated by Samuel Just almost 9 years ago

  • Subject changed from firefly: long standing slow requests to giant (and probably master): long standing slow requests
Actions #3

Updated by Abhishek Lekshmanan almost 9 years ago

Adding the osd conf dump

Actions #4

Updated by Samuel Just almost 9 years ago

We don't add an event while the message is in waiting_on_map, so the op could conceivably be stuck there. If we then leaked or misplaced the session, we might wind up with a hung event like this. ms_handle_reset is a bit suspicious in that it kill wstate and resets the con without holding the lock. We also don't poison the session, so I wonder whether ms_fast_dispatch might wind up locking a session which had just been reset and had its queues cleared.

Actions #5

Updated by Guang Yang almost 9 years ago

We had a similar issue on our cluster, haven't got a chance to deep dive yet, but it looks like things were stuck at rw lock:

-bash-4.1$ grep "6DoSVqrpTCKeAHs4" ceph-osd.247.log 
2015-07-09 19:31:49.011833 7f095925a700  0 log_channel(default) log [WRN] : slow request 1920.459265 seconds old, received at 2015-07-09 18:59:48.552476: osd_op(client.2142301.0:8557201 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [create 0~0,setxattr user.rgw.idtag (24),writefull 0~896319,setxattr user.rgw.manifest (479),setxattr user.rgw.acl (181),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33)] 3.51b4e8c7 ondisk+write e46485) currently started
2015-07-09 19:32:49.026461 7f095925a700  0 log_channel(default) log [WRN] : slow request 1920.467781 seconds old, received at 2015-07-09 19:00:48.558589: osd_op(client.2032794.0:7893357 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [getxattrs,stat] 3.51b4e8c7 ack+read e46485) currently waiting for rw locks
2015-07-09 19:34:28.049169 7f095925a700  0 log_channel(default) log [WRN] : slow request 1920.715449 seconds old, received at 2015-07-09 19:02:27.333634: osd_op(client.813161.0:122910876 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [getxattrs,stat] 3.51b4e8c7 ack+read e46485) currently waiting for rw locks
2015-07-09 20:03:49.060487 7f095925a700  0 log_channel(default) log [WRN] : slow request 3840.507902 seconds old, received at 2015-07-09 18:59:48.552476: osd_op(client.2142301.0:8557201 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [create 0~0,setxattr user.rgw.idtag (24),writefull 0~896319,setxattr user.rgw.manifest (479),setxattr user.rgw.acl (181),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33)] 3.51b4e8c7 ondisk+write e46485) currently started
2015-07-09 20:04:49.078760 7f095925a700  0 log_channel(default) log [WRN] : slow request 3840.520086 seconds old, received at 2015-07-09 19:00:48.558589: osd_op(client.2032794.0:7893357 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [getxattrs,stat] 3.51b4e8c7 ack+read e46485) currently waiting for rw locks
2015-07-09 20:06:28.105693 7f095925a700  0 log_channel(default) log [WRN] : slow request 3840.771987 seconds old, received at 2015-07-09 19:02:27.333634: osd_op(client.813161.0:122910876 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [getxattrs,stat] 3.51b4e8c7 ack+read e46485) currently waiting for rw locks
2015-07-09 21:07:49.118604 7f095925a700  0 log_channel(default) log [WRN] : slow request 7680.566018 seconds old, received at 2015-07-09 18:59:48.552476: osd_op(client.2142301.0:8557201 default.20903.501_WGFw208pbVg3fbDNLT15x_WQHShABwc_6DoSVqrpTCKeAHs4 [create 0~0,setxattr user.rgw.idtag (24),writefull 0~896319,setxattr user.rgw.manifest (479),setxattr user.rgw.acl (181),setxattr user.rgw.content_type (25),setxattr user.rgw.etag (33)] 3.51b4e8c7 ondisk+write e46485) currently started
Actions #6

Updated by Abhishek Lekshmanan almost 9 years ago

Reproduced this in another vm environment. Here 2 osds had slow requests, unfortunately only one had debug level set at the time of slow request, hopefully this will be helpful.

Here is the dump_ops_in_flight

sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight
{ "ops": [
        { "description": "osd_op(client.4440.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)",
          "initiated_at": "2015-07-16 11:22:46.304211",
          "age": "1412.058120",
          "duration": "0.000000",
          "type_data": [
                "no flag points reached",
                { "client": "client.4440",
                  "tid": 1},
                [
                    { "time": "2015-07-16 11:22:46.304211",
                      "event": "initiated"}]]},
        { "description": "osd_op(client.4516.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)",
          "initiated_at": "2015-07-16 11:22:46.304480",
          "age": "1412.057851",
          "duration": "0.000000",
          "type_data": [
                "no flag points reached",
                { "client": "client.4516",
                  "tid": 1},
                [
                    { "time": "2015-07-16 11:22:46.304480",
                      "event": "initiated"}]]},
        { "description": "osd_op(client.4767.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17)",
          "initiated_at": "2015-07-16 11:23:00.126294",
          "age": "1398.236037",
          "duration": "0.000000",
          "type_data": [
                "no flag points reached",
                { "client": "client.4767",
                  "tid": 1},
                [
                    { "time": "2015-07-16 11:23:00.126294",
                      "event": "initiated"}]]},
        { "description": "osd_op(client.4839.0:1 default.region [getxattrs,stat] 10.9a566808 ack+read+known_if_redirected e17)",
          "initiated_at": "2015-07-16 11:23:16.198332",
          "age": "1382.163999",
          "duration": "0.000000",
          "type_data": [
                "no flag points reached",
                { "client": "client.4839",
                  "tid": 1},
                [
                    { "time": "2015-07-16 11:23:16.198332",
                      "event": "initiated"}]]},
        { "description": "osd_op(client.4667.0:1 default.region [getxattrs,stat] 10.9a566808 ack+read+known_if_redirected e18)",
          "initiated_at": "2015-07-16 11:23:50.339441",
          "age": "1348.022890",
          "duration": "0.000000",
          "type_data": [
                "no flag points reached",
                { "client": "client.4667",
                  "tid": 1},
                [
                    { "time": "2015-07-16 11:23:50.339441",
                      "event": "initiated"}]]}],
  "num_ops": 5}

This was the log at that point

2015-07-16 11:22:44.873351 7f68bf7da700  1 -- 10.0.0.54:6800/5734 <== mon.1 10.0.0.57:6789/0 10 ==== mon_check_map_ack(handle=1 version=16) v2 ==== 24+0+0 (3847591702 0 0) 0x4d1fdc0 con 0x4cf0dc0
2015-07-16 11:22:44.873392 7f68b67c8700 10 osd.1 0 _maybe_boot mon has osdmaps 1..16
2015-07-16 11:22:44.873406 7f68b67c8700 10 osd.1 0 _send_boot
2015-07-16 11:22:44.873411 7f68b67c8700 10 osd.1 0  new session (outgoing) 0x4d60780 con=0x4cf0160 addr=10.0.0.54:6801/5734
2015-07-16 11:22:44.873436 7f68b67c8700 10 osd.1 0  client_addr 10.0.0.54:6800/5734, cluster_addr 10.0.0.54:6801/5734, hb_back_addr 10.0.0.54:6802/5734, hb_front_addr 10.0.0.54:6803/5734
2015-07-16 11:22:44.970378 7f68b67c8700 10 osd.1 0 _collect_metadata {arch=x86_64,back_addr=10.0.0.54:6801/5734,ceph_version=ceph version 0.87.2 (87a7cec9ab11c677de2ab23a7668a77d2f5b955e),cpu=Intel Xeon E312xx (Sandy Bridge),distro=Ubuntu,distro_codename=trusty,distro_description=Ubuntu 14.04.2 LTS,distro_version=14.04,filestore_backend=xfs,filestore_f_type=0x58465342,front_addr=10.0.0.54:6800/5734,hb_back_addr=10.0.0.54:6802/5734,hb_front_addr=10.0.0.54:6803/5734,hostname=stmonleader1-testjenkins-puppet-rjil-gate-1802,kernel_description=#74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015,kernel_version=3.13.0-45-generic,mem_swap_kb=0,mem_total_kb=3790348,os=Linux,osd_data=/var/lib/ceph/osd/ceph-1,osd_journal=/var/lib/ceph/osd/ceph-1/journal,osd_objectstore=filestore}
2015-07-16 11:22:44.970442 7f68b67c8700  1 -- 10.0.0.54:6800/5734 --> 10.0.0.57:6789/0 -- osd_boot(osd.1 booted 0 v0) v5 -- ?+0 0x4e4a000 con 0x4cf0dc0
2015-07-16 11:22:46.303248 7f68aaeb0700  1 -- 10.0.0.54:6800/5734 >> :/0 pipe(0x4d3b8c0 sd=34 :6800 s=0 pgs=0 cs=0 l=0 c=0x4cf1b80).accept sd=34 10.0.0.57:35811/0
2015-07-16 11:22:46.303358 7f68aafb1700  1 -- 10.0.0.54:6800/5734 >> :/0 pipe(0x4d3b340 sd=33 :6800 s=0 pgs=0 cs=0 l=0 c=0x4cf18c0).accept sd=33 10.0.0.59:33603/0
2015-07-16 11:22:46.303764 7f68aaeb0700 10 osd.1 0  new session 0x4d60c80 con=0x4cf1b80 addr=10.0.0.57:0/1001932
2015-07-16 11:22:46.303833 7f68aaeb0700 10 osd.1 0  session 0x4d60c80 client.radosgw.gateway has caps osdcap[grant(rwx)] 'allow rwx'
2015-07-16 11:22:46.304007 7f68aafb1700 10 osd.1 0  new session 0x4d60f00 con=0x4cf18c0 addr=10.0.0.59:0/1028611
2015-07-16 11:22:46.304048 7f68aafb1700 10 osd.1 0  session 0x4d60f00 client.radosgw.gateway has caps osdcap[grant(rwx)] 'allow rwx'
2015-07-16 11:22:46.304275 7f68aaeb0700  1 -- 10.0.0.54:6800/5734 <== client.4440 10.0.0.57:0/1001932 1 ==== osd_op(client.4440.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) v4 ==== 203+0+0 (928176555 0 0) 0x4d61400 con 0x4cf1b80
2015-07-16 11:22:46.304389 7f68aaeb0700  1 -- 10.0.0.54:6800/5734 --> 10.0.0.57:6789/0 -- mon_subscribe({monmap=4+,osd_pg_creates=0,osdmap=17}) v2 -- ?+0 0x4d643c0 con 0x4cf0dc0
2015-07-16 11:22:46.304513 7f68aafb1700  1 -- 10.0.0.54:6800/5734 <== client.4516 10.0.0.59:0/1028611 1 ==== osd_op(client.4516.0:1 default.region [getxattrs,stat] 10.9a566808 RETRY=1 ack+retry+read+known_if_redirected e17) v4 ==== 203+0+0 (928176555 0 0) 0x4d61900 con 0x4cf18c0
2015-07-16 11:22:46.383313 7f68bf7da700  1 -- 10.0.0.54:6800/5734 <== mon.1 10.0.0.57:6789/0 11 ==== osd_map(16..17 src has 1..17) v3 ==== 1515+0+0 (507967949 0 0) 0x4e46240 con 0x4cf0dc0
2015-07-16 11:22:46.383357 7f68bf7da700 10 osd.1 0 do_waiters -- start
2015-07-16 11:22:46.383362 7f68bf7da700 10 osd.1 0 do_waiters -- finish
2015-07-16 11:22:46.383363 7f68bf7da700 20 osd.1 0 _dispatch 0x4e46240 osd_map(16..17 src has 1..17) v3
2015-07-16 11:22:46.383401 7f68bf7da700  1 -- 10.0.0.54:6800/5734 --> 10.0.0.57:6789/0 -- mon_subscribe({monmap=4+,osd_pg_creates=0,osdmap=0}) v2 -- ?+0 0x4d64780 con 0x4cf0dc0
2015-07-16 11:22:46.383417 7f68bf7da700  3 osd.1 0 handle_osd_map epochs [16,17], i have 0, src has [1,17]
2015-07-16 11:22:46.383421 7f68bf7da700 10 osd.1 0 handle_osd_map message skips epochs 1..15
2015-07-16 11:22:46.383438 7f68bf7da700  1 -- 10.0.0.54:6800/5734 --> 10.0.0.57:6789/0 -- mon_subscribe({monmap=4+,osd_pg_creates=0,osdmap=1}) v2 -- ?+0 0x4d645a0 con 0x4cf0dc0
2015-07-16 11:22:46.383443 7f68bf7da700 10 osd.1 0 do_waiters -- start
2015-07-16 11:22:46.383445 7f68bf7da700 10 osd.1 0 do_waiters -- finish

Attaching the log file in the next comment.. as tracker seems to error out while attaching..

Actions #8

Updated by Samuel Just over 8 years ago

Actually, that log isn't long enough. pg 10.8 was still being created when the log cuts off, so the request was supposed to be waiting. Can I get the same log, but for another 10 minutes?

Actions #9

Updated by Abhishek Lekshmanan over 8 years ago

Updating some info on the sequence of operations before the error:
While bringing up a 3-replica ceph cluster, we have brought up a mon (bootstrap role) (maybe couple), 2 osds (in 2 different vms); meanwhile parallely rgw tries to start; hence the requests for the object appear, before the pgs have peered. Once the third osd comes up all slow requests dissappear except these ones that seemingly have occured before pgs may have peered.

Actions #10

Updated by Abhishek Lekshmanan over 8 years ago

Seems affecting hammer as well, have a slow request that is ~1.5 hours old now.

Actions #11

Updated by Samuel Just over 8 years ago

  • Subject changed from giant (and probably master): long standing slow requests to long standing slow requests: connection->session->waiting_for_map->connection ref cycle
  • Status changed from New to 7
  • Assignee set to Samuel Just
Actions #12

Updated by Samuel Just over 8 years ago

  • Backport set to hammer
Actions #13

Updated by Samuel Just over 8 years ago

wip-12338-hammer should fix it (based on v0.94.2) if you'd be willing to try it.

Actions #15

Updated by Abhishek Lekshmanan over 8 years ago

Tried the patch.. seemed to solve the issue.. can we have a master pr & a hammer backport

Actions #16

Updated by Guang Yang over 8 years ago

Hi Abhishek,
When you saw this issue occurred, did you see the upper layer (e.g. radosgw) hung at the same time?

Actions #17

Updated by Abhishek Lekshmanan over 8 years ago

Guang: no, this request was sort of lingering because the client (radosgw) which started the request restarted.. so the radosgw that is running came up after the slow request was observed.. essentially the slow request is kind of not owned by any client.. which is why it never cleared. Sam might be able to explain better.

Actions #18

Updated by Sage Weil over 8 years ago

  • Status changed from 7 to Pending Backport
Actions #20

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
  • Status changed from Pending Backport to In Progress
  • Assignee changed from Samuel Just to Loïc Dachary
Actions #21

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
  • Status changed from In Progress to Pending Backport
  • Assignee changed from Loïc Dachary to Samuel Just
Actions #22

Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF